6 Steps to Optimize Your Site for Search Engine Bots & Spiders

Many website owners are concerned with one thing and one thing only – getting to the top of Google for their keywords. After all, who doesn’t want boatloads of free organic search traffic?

Unfortunately, business owners are often too focused on SEO in terms of keyword optimization, content quality, and other relevant ranking signals to the Google search algorithm, and they forget to start with an understanding of step one: how the search engines see and crawl their site.

And of course, understanding how the system works will give you a distinct advantage over your competition.

The Crawling Process

Google’s servers use software that essentially scans the Internet and feeds website data through a complex algorithm to determine the usefulness of websites to their visitors. The bots literally copy  your pages (called caching) and drill down into your content by sorting through and removing code, comparing data among your pages, and employ other analytics strategies. In turn, the Google algorithm then indexes these pages and that information and ranks them accordingly.

The very first step in this process is the Google bots (sometimes referred to as spiders) crawling your website and its related links to build an accurate digital picture of your website and its quality.

When the Google bots come a crawlin’, you had better make darn sure that you don’t inhibit their ability to crawl your site. If your website isn’t optimized for the crawlers, you’re introducing unneeded roadblocks that will severely diminish the effectiveness of your SEO efforts.

The following are some techniques and considerations that will help you roll out the red carpet for the Google bots.

1. Create a Sitemap

The very first and easiest thing you can do to improve the ‘crawlability’ of your website is to create a sitemap. The site map will help the bots crawl your pages and ensure that portions of your website aren’t left out of the indexing process. WordPress offers many tools to help you build a sitemap. In fact, there are many easy to use plugins to assist you with this task.

It should be stated here that there are two "known" types of sitemaps.  One is a page on a website that basically lists all the links to that site. Done correctly, these links can be categorized into each section of the site or done in alphabetical order.  I've never really seen anyone actually use one of these pages, but they are out there.

If your navigation is done correctly, you shouldn't need that type of sitemap.

The second type of sitemap is an XML sitemap.  This is a file that is typically dynamically created by a script or program. The file is structured in such a way that the search engines can easily digest.  In a matter of seconds, they can have a map to every page on your site.

This is big, because it means they don't have to stumble through your site to learn about it.  You've handed over the map.  This is the type of sitemap you really should create for your site.

We recommend using Yoast SEO, a free plugin that does a lot of "SEO stuff" for your site.  But it also can create dynamic sitemaps that the search engines will love to crawl.

We also recommend getting a little bit of an advantage by telling Google how to find your sitemap easily. This is done in two places: your robots.txt file and Google Search Console.

Add your sitemap to your robots file

Using the Yoast plugin, click here to find your sitemap link:

In the left column, click on SEO -> XML Sitemaps

If not enabled, enable and save. Once enabled, right-click and copy the link address.

Then in the left column, click on SEO -> Tools

Then click on 'File editor'

Find the robots.txt file and add this line (with the link you just copied above)

Add your sitemap to Google Search Console

Submit that same link to Google Search Console here:

2. Avoid Duplicate Content

There is a lot of conflicting information online about duplicate content penalties.  Some say it's real, others say it doesn't really hurt you.  We believe you should always make an effort to avoid duplicate content, especially regarding your informational content.

Some websites seek to fill up there pages and load their site with keywords by essentially rewriting the same article with different phrasing (called spinning). Not only is this a big no-no because it does little to improve your visitors experience on your website, but it also decreases the Google bot’s ability to crawl your site.  The days of tricking the search bots with slightly different words are long gone.

However, in some cases, duplicate content is unintentional and not malicious in nature. For example, you may post a downloadable form of information that you have posted elsewhere on your site or reformat information for your users. This isn’t an intentionally malicious attempt to boost ranking signals, but it does still create problems for crawlers. Instead of intentionally blocking the crawler from accessing these types of duplicate content, Google recommends to mark them as duplicates with rel=“canonical” element or by utilizing 301 redirects to avoid potential crawler problems.

You can set a canonical tag in Yoast here:

We do this with our clients from time to time. If one of their vendors has a great article that we think would be of value to our audience, we'll republish it (with permission) on the client's site, and then add the canonical tag to keep the client from being penalized.

This is also a good time to bring up a big mistake that business owners make - copying their vendor's product content. Especially those in ecommerce need to be careful. If you're adding products and using the manufacturer's description, then you're probably shooting yourself in the foot. You're doing the same thing tens if not hundreds, if not thousands of other vendors are doing as well!  You'll never rank a product page like that.  Take the time to write an original description that uses the keywords you want to rank for.

3. Structuring: Some Pages Simply Shouldn’t Be Crawled

Some pages are absolutely useless to boost ranking signals and aren’t intended to be viewed and accessed by your visitors. Pages that are only accessible by the website admin and other behind-the-scenes folders and directories shouldn’t be crawled, and you can block the crawler from these directories by editing the robots.txt file in the root directory. To find out more about creating this file, simply follow Google’s tutorial.

In addition to these admin-related pages, we usually recommend that some pages on your site not be indexed because they are not information that boosts nor helps the site in any way, they're just there for the humans (especially the attorney-type humans!).  For instance, your privacy policy, terms of service, shipping policies, etc should be blocked with a noindex tag.

If you're using Yoast, here's how you'd set those pages to noindex.

4. Monitor Your Crawl Rate

Most people are unaware that they have direct control over the frequency that their website is crawled by Google. If Google is crawling your site too frequently and eating up bandwidth or you feel that they are not crawling your site as much as you would like, you can edit the crawl rate in Google Search Console (formerly called Google Webmaster Tools).

Note: Google is pretty smart about how/when to crawl your site, and they do have a recommended setting for the crawl rate. This should only be changed if you're concerned that they aren't getting your information in a quick or timely manner.

5. Shoot a Flare for the Attention of the Bots

Unbeknownst to many website admins, there are numerous tools – called pinging tools – that ping the crawlers to attract their attention.

Let’s say, for example, you just made a large update to your site and posted a lot of great, high quality content. Instead of needing to wait around for the next crawling interval, you can instead proactively cause your pages to get crawled with a pinging service.

You need to know, though, that WordPress will send out pings by default whenever you update your site. Many webmasters are of the opinion that the more pinging services you employ the faster you will become indexed. You can find a list of ping services on the WordPress website.

Google also has a little-known pinging service of their own. Inside Google Search Console you can submit a page to be reindexed. This typically works very well.

There are three steps to use Google's "fetch" service

Step 1: Ask them to crawl the page

Step 2: Request indexing

Step 3: Choose method

This will typically get your site indexed within 24 hours, and usually less.

Note: As you'll see when clicking one of the two radio buttons at the bottom of that box there are limits on the amount of fetches you can do each month. They are 500/mo for single URLs, and 10 for multiple URLs. We recommend using the 'Crawl only this URL' option for each new piece of content you add to your site.

6. On-site/Internal Links are Crucial

You should already be creating on-site links for the benefit of your visitors, but don’t forget that on-site links also help the Google bots find all of your content, too.

For example, think of each link in a blog post that links to another on-site post as a sort of bridge that helps the crawler navigate your site. If some of your pages are inaccessible due to a broken link or are ‘island pages’ that don’t link to other on-site resources, Google bots are going to have a much harder time crawling that information.

Want Rankings? This is Mission Critical!

Optimizing your website for the Google bots is mission critical. After spending so much time creating valuable content and building your site, it seems a waste if you haven’t done everything in your power to create visibility and ‘crawlability’ for the Google bots.

If you want to learn more about optimizing all portions of your WordPress website for search, check out Red Canoe Elite. With over 100 videos on SEO and Digital Marketing, this members-only area will help you get your website to the top of the rankings!

One last thing - a lot of the steps above referenced Google Search Console. We also recommend doing these same steps on Bing's Webmaster Tools service as well.

About 

Will Hanke owns Saint Louis' top independent Internet Marketing firm, Red Canoe Media. In addition to helping some of St Louis' most recognizable brands with their online marketing strategy, Will also is an Amazon bestselling author, speaker and teacher.

One Comment

  1. Kara Gamber April 26, 2016 at 12:07 pm

    Great blog. Very clear, easy to understand and follow. Thank you!

Leave A Comment

s2Member®