From time to time the question as to how Google handles 404 and 410 error codes comes up. So it’s good to understand the differences. Google’s John Mueller answered an interesting question about web pages that no longer exist and how a web publisher should handle it.
How Google Handles 404/410 Status Codes
In a recent Webmaster Hangout, Google’s John Mueller received the following question:
“If a 404 error goes to a page that doesn’t exist, should I make them a 410?”
John Mueller answered:
“From our point of view, in the mid term/long term, a 404 is the same as a 410 for us. So in both of these cases, we drop those URLs from our index.
We generally reduce crawling a little bit of those URLs so that we don’t spend too much time crawling things that we know don’t exist.
The subtle difference here is that a 410 will sometimes fall out a little bit faster than a 404. But usually, we’re talking on the order of a couple days or so.
So if you’re just removing content naturally, then that’s perfectly fine to use either one. If you’ve already removed this content long ago, then it’s already not indexed so it doesn’t matter for us if you use a 404 or 410.”
That is a very helpful answer about how Google handles the 404 and 410 status codes. Of particular interest is that by using the 410 status code, you can actually speed up the process of Google removing the web page from it’s index.
Speeding up page removal can be especially useful after a hacking event where a hacker posted thousands of spam pages. Those are the kinds of pages a web publisher would not want associated with their site. John Mueller confirmed that a 410 status code can speed up the process of removing those unwanted pages from Google’s index.
Official 410 Status Code Specifications
Here is the official specification for the 410 status code:
“…Clients with link editing capabilities SHOULD delete references to the Request-URI after user approval… The 410 response is primarily intended to assist the task of web maintenance by notifying the recipient that the resource is intentionally unavailable and that the server owners desire that remote links to that resource be removed.
It is not necessary to mark all permanently unavailable resources as “gone” or to keep the mark for any length of time — that is left to the discretion of the server owner.”
The part about “should” delete is particularly important. It doesn’t specify that clients “must” delete the references, only that they should. Google has historically been good about dropping the 410 status code pages and conforming to the official specifications.
Previous Google Guidance on 404 and 410
John Mueller’s response is similar to the guidance offered by Matt Cutts when he used to work for Google. Matt offered that Google will respond to a 404 status code by waiting 24 hours before beginning the process of removing a page from it’s index.
This is to fail-proof Google’s index against a mistake or an accidental event at the website. For example, web servers can go down or a site migration may take longer than expected. Google’s John Mueller did not comment if this is still the case. Here’s what Matt Cutts’ guidance was:
“It turns out webmasters shoot themselves in the foot pretty often. Pages go missing. People misconfigure sites. Sites go down. People block GoogleBot by accident.
So if you look at the entire web, the crawl team has to design to be robust against that. So with 404s, along with I think 401s and maybe 403s, if we see a page and we get a 404, we are going to protect that page for 24 hours in the crawling system.
So we sort of wait and we say, well maybe that was a transient 404. Maybe it wasn’t really intended to be a page not found.
And so in the crawling system it’ll be protected for 24 hours.
If we see a 410, then the crawling system says, Ok, we assume the webmaster knows what they’re doing… so they immediately convert that 410 into an error rather than protecting it for 24 hours.
We’ll still go back and recheck and make sure those pages are really gone or maybe the pages have come back alive again.
And I wouldn’t rely on the assumption that that behavior will always be exactly the same.
And so if a page is gone it’s fine to serve a 404. If you know it’s gone for real, it’s fine to serve a 410.
But we’ll design our crawling system to be robust. But if your site goes down or if you get hacked or whatever, we try to make sure that we can still find the good content whenever it’s available.
Watch Google’s John Mueller discuss 404 & 410 Status Codes here.
Watch Matt Cutt’s discuss 404 and 410 Status Codes here.
Images by Shutterstock, Modified by Author