SEO Tips: When to worry about crawl budget?
Crawl budget. What is it? What does it mean to your site’s SEO performance? Should you be paying attention to it? How do you optimise crawl budget? How many questions can we ask about it?
In this article we’ll discuss what ‘crawl budget’ is, whether you need to act on it and our recommendations to make sure your site has full visibility on search engines.
What is crawl budget?
First, let’s see exactly what Google has to say about Crawl Budget:
“Crawl budget is not something most publishers have to
worry about. If new pages tend to be crawled the same day they’re published,
crawl budget is not something webmasters need to focus on. Likewise, if a site
has fewer than a few thousand URLs, most of the time it will be crawled
efficiently.
Prioritising what to crawl, when, and how much resource the server hosting the
site can allocate to crawling is more important for bigger sites, or those that
auto-generate pages based on URL parameters, for example.”
They go on to describe how crawl budget is the term used for an amalgamation of crawl rate limit and crawl demand. Both of which we will discuss. However, if you are a skim reader, then we can sum up crawl budget like this:
Crawl budget refers to the potential amount of website pages crawled by the Google bot (or a search engine bot) in a given time frame, without affecting the site experience for other users. How much of the site is crawled is dependent on the ease a bot can crawl your site. This may be influenced by site speed and server errors.
A lot of SEO experts won’t ever mention crawl budget as an area to concentrate on. According to Google, crawl budget isn’t a ranking factor, so why does it matter?! In short, if you’re pages aren’t crawled and indexed, you won’t show in search results for those pages.
Making sure your pages can be crawled and indexed is a basic first step in SEO but could easily be missed due to issues with crawl budget.
Building on this, whether pages are indexed, and the rate of index is calculated by ‘crawl rate limit’, this may be reached faster on sites that are slower or have server errors.
What is the crawl rate limit?
The main job of Googlebot is to crawl URLs on any given site while making sure it doesn’t degrade the experience of other users on the site. A crawl rate limit is in place to cap the number of pages fetched simultaneously, so it doesn’t impact on a site’s performance for a real-world user.
What is the crawl demand?
To answer this where better to turn than the horse’s mouth… or Googles own post in this case.
“Even if the crawl rate limit isn’t reached, if there’s no demand from indexing, there will be low activity from Googlebot. The two factors that play a significant role in determining crawl demand are:
- Popularity: URLs that are more popular on the Internet tend to be crawled more often to keep them fresher in our index.
- Staleness: our systems attempt to prevent URLs from becoming stale in the index.”
Additionally, site-wide events like changes to site structure may trigger an increase in crawl demand to re-index the content under the new URLs.
When you put both crawl rate and crawl demand together, we get Google’s definition. Basically, crawl budget is the number of your URLs Googlebot can and wants to crawl.
So, when should you worry about crawl budget?
If your site is relatively fast and doesn’t have a lot of pages (1,000+) to index and your pages are already indexed, your crawl budget shouldn’t be a worry.
It may be worth investigating your crawl budget if:
- Your website is slow to load. A free tool to find this out is PageSpeed Insights to show how fast your URLs can be crawled.
- You’ve got thousands of URLs to crawl on your website.
- You bulk add hundreds of pages that you want to get indexed quickly.
- You’ve got a lot of redirects.
- You use URL parameters throughout your website. For example, you are using site search parameters, product filters or blog tags.
How can you make sure Google can crawl and index my site?
To maximise the amount of crawl budget and increase your crawl rate limit, we’d recommend following these best practices:
Use your robots.txt file
Has your website got multiple pages that are for internal use or resources from your CMS that don’t need to be accessible by the public? Adding URLs that you want to block from your robot.txt file will reduce the number of pages that need to be crawled. If there are pages that aren’t indexed, that you want to index you could also add these to the allow list for important pages.
Use internal links
To a Googlebot, it can be a bit of a minefield to understand what pages to index and what the priority pages are. Internal links and positioning of these links are vital for a crawler to understand the priority of a crawl and what pages matter. Most websites will have a flat website architecture, which means your home page will link out to a group of categories and sub-categories below them, meaning Google can scan down from the most important pages to sub-categories.
We’d recommend avoiding having orphan pages on your site. These are pages that are added to your website but have no internal links to them. An example of this would be having a new category on your website but forgetting to add this to your site navigation, meaning a user or bot would struggle to get to this page without actively searching it out.
Reduce the number of errors on site
4xx and 5xx errors aren’t great for your user experience, and they eat up crawl budget. Fixing issues by making sure any pages are redirected that should be and disallowing any pages that you don’t want indexing (using your robots.txt) will reduce the number of pages bots will crawl that are irrelevant to users. With server errors, these could occur for several reasons but raising this with your webmaster / hosting company can improve your site health and customer experience along with your crawl budget.
Avoid redirect chains
A common occurrence with websites that have changing products and services is that pages will need to be redirected from time-to-time. As the structure of a website can go through multiple iterations throughout its lifetime, it can be easy to fall into the trap of redirecting from a page that has been redirected to in the past, creating a chain. These chains slow the experience of the user and bot as it will have to crawl through the redirects to get to the right page.
How can you check if you are going over your crawl budget regularly?
Crawl stats is one of the legacy tools available in Google Search Console, which can help you understand how frequently Google is crawling your pages on-site and how much data and time is spent on your site daily. If you’re a little more technical (or ask your webmaster nicely!), you can also cross-reference with server logs to see how often Google’s crawlers are hitting your site.
Based on your average pages crawls per day, you can build an average monthly figure of pages that would realistically be. So, if you average 100 crawls per day, your potential budget for that month would be 3000 pages. However, this is not a static tool as your average will change over time depending on the factors above, such as server response and site speed. Using this tool, you can see potential dips and increases in crawls.
So now you know…
The above tips are just a few recommendations to improve your crawl budget. If you’re feeling your not getting full visibility of your website on search engines or you to feel that you’re not getting enough from your SEO and organic performance, make sure to use a reputable agency and speak to Converted for a completely free audit.