Removing Junk from the SERPs

Joined
Mar 27, 2015
Messages
837
Likes
1,486
Degree
3
The site I bought is a true cluster fuck of indexed tags, super thin content and pdfs. In short, it is a mess.

Several weeks ago I set all the tag and archive pages to no index follow. More recently I set them all to no archive.

Thousands of empty pages with an amz affiliate disclaimer just littering the SERPs.

So I checked today and they are ALL STILL THERE.... watttttt.

In your experience how long does it take G to drop the junk after no index has been applied? New sitemap was also submitted and crawled weeks ago too.
 
Actually I wrote a post about this a while back. Here: sunsetted SERPWoo forum on July 31, 2016

TLDR: took over 18+ months, I gave up checking and finally 3 years later all the pages were no longer indexed from the serpwoo.com/forum
 
I had a bunch of crap (20,000+ pages) showing in the Serps due to a Yoast bug a couple years back. It was fucked, hah. Once fixed, it took at least 6 months for that stuff to begin getting removed. Around a year or 2 for it all to completely fall off.
 
You can speed it up by uploading a temporary sitemap featuring all the pages. You'll just want to remember to remove it. In fact, when enough pages were removed from the index, I'd take that sitemap down and upload another version with all of the unindexed pages removed from the sitemap. Each time you upload the sitemap it spurs them to crawl it.

My experience is that it's logarithmic. An example of this would be that it takes the same amount of time to remove the first 50% of the pages as it does the next 25%, then the next 12.5%, and so forth. So while it can take a long time to get it all completely gone, the bulk of it will be removed sooner than later, which is good for your Panda score.
 
Showing a 410 should help remove the pages quicker. You could try force indexing/crawling the pages to see if that hurries things up. I've heard the imaginatively named "thebestindexer.com" works at the moment.
 
anyone else tried a 410 and seen it work faster than 301s/404s ?
 
I have. I've also 301ed several pages to a single page giving the 410 response and seen pages drop out of the index reasonably quickly. 6 weeks(ish) but I wasn't checking regularly.
 
anyone else tried a 410 and seen it work faster than 301s/404s ?
In my case, I feel like a 410 takes one single crawl since it means "purposefully and permanently gone" where as 404 means "whoops, it's not found, not sure why, could come back, who knows" and tends to take multiple crawls to trust that error code.

I don't know what Google has to say on the topic. I feel like in one hand they say both codes work the same on their end, but in actual experience I feel like I've seen significantly faster results with 410.
 
I don't know what Google has to say on the topic. I feel like in one hand they say both codes work the same on their end, but in actual experience I feel like I've seen significantly faster results with 410.
faster than a 301 also? 301 says "permanently moved" which in theory should mean the same thing to google as a 410, right?
 
faster than a 301 also? 301 says "permanently moved" which in theory should mean the same thing to google as a 410, right?
No. @Stones is talking specifically about using a 301 redirect to a 410'd page. It works in this context to deindex a page because it points to a 410 response code.

A 301 itself is only that, the pointer. If you 301 a URL to another 200 status URL (meaning it's fine, loads fine, etc.) then you'll pass over all of the positive ranking signals from the 301'd page to the 200 page.

In the same vein, if you 301 a URL to a 410 or 404 page, you've basically "broken" the original page by pointing it to a broken page.

I may be misunderstanding what you're saying, in which case I apologize. 301-ing a URL to a 410 will be no faster than a regular 410 because either way what you're waiting for is Google to recrawl the URL at least once.

So the real question of speed is getting recrawled. If it's just a handful of pages you can request them to be crawled in Search Console. If it's a bunch you can upload them in a temporary sitemap, which will help push things along.
 
I may be misunderstanding what you're saying, in which case I apologize. 301-ing a URL to a 410 will be no faster than a regular 410 because either way what you're waiting for is Google to recrawl the URL at least once.
got it.
I was actually wondering if I'm trying to get a page removed from the index, if a regular 410 is faster to get google to remove than a 301 redirect
 
You can speed it up by uploading a temporary sitemap featuring all the pages. You'll just want to remember to remove it. In fact, when enough pages were removed from the index, I'd take that sitemap down and upload another version with all of the unindexed pages removed from the sitemap. Each time you upload the sitemap it spurs them to crawl it.

By taking it down and back up you mean under a different URL or within search console?

My experience is that it's logarithmic. An example of this would be that it takes the same amount of time to remove the first 50% of the pages as it does the next 25%, then the next 12.5%, and so forth. So while it can take a long time to get it all completely gone, the bulk of it will be removed sooner than later, which is good for your Panda score.

This is an interesting observation. I wonder what the reason behind this is.
 
By taking it down and back up you mean under a different URL or within search console?
I do both. I'll create a manual XML sitemap and upload it at the root like /TEMPsitemap1.xml and if I later take it down and upload it with only the remaining URLs that need to be crawled, i'll call it /TEMPsitemap2.xml and so forth. I want Google to understand it's "new" and to re-initiate crawling again.

The easiest way to get this done, by the way, is to export the URL list from the Coverage Report in Search Console for the ones you want deindexed and filter from there. They provide thorough lists.

This is an interesting observation. I wonder what the reason behind this is.
If I had to guess, it's an increasing distrust in the URLs (especially if they're all in a sitemap together) so less and less crawl budget gets afforded it. Can't say for sure but that makes sense.

That's why I'm stressing the sitemaps should be temporary and while I'll even take one down and remove the deindexed URLs and reupload with just the ones still left to be processed. Google CAN (and has said so) that they can lose faith in your sitemaps. The sitemap trick is used to speed up the process but shouldn't go on for too long. A few months has been fine for me.
 
I do both. I'll create a manual XML sitemap and upload it at the root like /TEMPsitemap1.xml and if I later take it down and upload it with only the remaining URLs that need to be crawled, i'll call it /TEMPsitemap2.xml and so forth. I want Google to understand it's "new" and to re-initiate crawling again.

The easiest way to get this done, by the way, is to export the URL list from the Coverage Report in Search Console for the ones you want deindexed and filter from there. They provide thorough lists.


If I had to guess, it's an increasing distrust in the URLs (especially if they're all in a sitemap together) so less and less crawl budget gets afforded it. Can't say for sure but that makes sense.

That's why I'm stressing the sitemaps should be temporary and while I'll even take one down and remove the deindexed URLs and reupload with just the ones still left to be processed. Google CAN (and has said so) that they can lose faith in your sitemaps. The sitemap trick is used to speed up the process but shouldn't go on for too long. A few months has been fine for me.

Awesome, that's really good to know. Thank you for sharing! My gf's blog has hundreds of tag and image pages with little value in the index. I need to take care of this soon to get it ranking better again.
 
Back