Page Clean Ups & External 404s - Partial Kitchen Sink

Joined
Oct 23, 2020
Messages
124
Likes
124
Degree
1
I have some questions that are likely applicable to the Kitchen Sink Method, but the site isn't declining and I wanted to focus on a couple of specific items. I know some masterminds like @Ryuzaki and @CCarter can probably help with some feedback, so I figured I'd post.

I have a client site (wordpress) that's ranking well, but I believe there's an opportunity to really boost site performance and rankings by cleaning up non-ranking pages/posts and fixing external 404 links from news posts

Non-Ranking Page Clean Up To Improve Performance

I'm debating whether or not I should clean up URLs for the greater good of a site. Many of the old URLs are very short posts and I'm considering reducing the size of the site to improve crawl budget. I'm wondering if I can get some feedback on best practices here from some people like Ryuzaki and others.

Here are the facts:

5,774 URLs pulled from Screaming Frog crawl

2,098 indexable URLs

3,676 non-indexable URLs - 3,050 of the URLs are /tag/ pages, which are non-indexable. FYI, these were previously indexed.

3,120 URLs are currently indexed in Google, some of which are /tag/ previously set to index, but since changed to noindex.

2,105 pages/posts/events currently in the sitemap

1,270 pages showing rankings in Ahrefs

2,334 pages showing up in GSC over the last 3 months

230 of these URLs in Ahrefs are /tag/ pages that have been since set to noindex and removed from sitemap.xml

706 of these URLs in GSC are /tag/ pages that have since been set to noindex and removed from sitemap.xml

887 /tag/ pages indexed in Google

I'm considering cleaning things up to reduce site size in hopes that it boosts the number of keywords and organic traffic in general. In order to do this I'm considering:

1. Deleting all tags to kill /tag/ pages (But not sure if I should do this or not and if show how to best handle i.e. 301 somewhere or let 404). Or is setting them to noindex enough?

2. Remove thin content pages not ranking - there are 1,420 pages with less than 500 word count.

3. Lastly, these news posts have TONS of dead external links that are all dofollow for the most part. I'm trying to find a fast way to delete those 404ing and nofollow those remaining.

This is as far as I've gotten but I feel like this might be a good first stab at trying to improve site experience and rankings. I'd love to hear from some people to get a few ideas/thoughts.

FYI, I'm considering this also because the Kitchen Sink Method helped me recover my affiliate site and I figured this step would only improve rankings overall as a first step towards cleaning things up even if not needed.

Looking forward to hearing from others.
 
1. Definitely kill /tag/ pages. I'd delete them entirely and let them 404. Don't block through robots.txt so these 404s are discovered and eventually deindexed (may take some time)
2. Check rankings and traffic first. If they are thin but ranking, maybe you should optimize them further? If they get internal/referral/other traffic, maybe they serve a purpose? If neither, 404 them.
3. If using WP use Broken Link Checker by WPMU Dev. Quick and easy solution.

Expect no immediate results. May take 1-2 core updates to make a difference, if that.
 
My current site came with a ton of extremely thin pages. I was planning to delete them all and redirect to relevant pages but on closer inspection it seemed that many had incredible incoming links pointing at them, Oprah.com, MSN.com etc.

I kept them added a little more to freshen up and passed the link juice to other newer pages on my site.

Worth checking for incoming links before going on a mass delete.
 
1. Deleting all tags to kill /tag/ pages (But not sure if I should do this or not and if show how to best handle i.e. 301 somewhere or let 404). Or is setting them to noindex enough?
I would eradicate the entire tag system. They may be set to noindex but you still have double the amount of crap to crawl due to these tags. And page rank is flowing into them. Complete waste of everyone's time and resources.

2. Remove thin content pages not ranking - there are 1,420 pages with less than 500 word count.
If they're not ranking or bringing in traffic, you can probably glance at them and determine if they have any chance at ranking (as in, are they optimized for search terms that get traffic). If not, I wouldn't bother trying to optimize them. I'd get rid of them. It doesn't mean they were low quality, but they can be dragging down your site. If they aren't contributing to the bottom line, I see no reason to keep them around, when there is something to be gained by getting rid of them.

It's sounding like you could go from almost 6,000 URLs to about 500 worthwhile pages? That would remove a lot of endless, worthless, bottomless pits for googlebot to get trapped in, and would quit diluting your page rank for no good reason.
 
Woah, so all you guys recommend to delete all tags? Any plugin that can completely disable them?

I have noindex currently, and just checked and they make up 0.2% of my pageviews.
 
Here's an alternative option to deleting your tags:

Settings -> Reading -> Set Blog Posts Show at Most 200 Posts

You'll have much fewer tag pages this way since each page of tags will have 200 posts. I think the WordPress default setting is like 10 posts per tag page. Unless your site is an absolute unit, this will greatly help with your situation while still allowing you to maintain having tags.

That's what I do at least. Tags help me structure the site so I like having them.
 
Here's an alternative option to deleting your tags:

Settings -> Reading -> Set Blog Posts Show at Most 200 Posts

You'll have much fewer tag pages this way since each page of tags will have 200 posts. I think the WordPress default setting is like 10 posts per tag page. Unless your site is an absolute unit, this will greatly help with your situation while still allowing you to maintain having tags.

That's what I do at least. Tags help me structure the site so I like having them.
Thanks for that, @Ryuzaki would be quite interested in your thoughts.
 
The problem, as I've seen it a million times, is that people tend to use categories correctly. They take their time, think about what categories should exist, and then place posts in them sensibly.

Tags, on the other hand, are usually thought up on the fly. "Hmmm, this post is about flashlights. Let's tag it with: flashlights, light, survival, emergency, tools, accessories, nightstand, camping"

And of those 8 tags for that post, 6 didn't exist before. And it doesn't end up mattering if your tag pages show 200 posts or 20 posts. It doesn't deflate the number of tag pages in your indexation.

I realize I'm talking to guys on BuSo who aren't complete morons, but 99.9% of tag uses I've ever seen happen as I've described above. Completely willy nilly.

OP is telling us this is the case. Of the 5,774 crawlable pages, 3,050 of them are tags. 2,105 are "legit" pages in the sitemap. That's how quickly tags get out of hand for most people.

And not a single one of those tag pages is going to have original content on them. They'll be full of post titles and post descriptions that exist elsewhere on the site. That's not a big deal but I'm a huge believer in indexation quality, and I don't just mean quantity but the quality of what's on each and every indexed page.

I can think of ways I could use tags that wouldn't become a mess. If you can do it and you think it helps your users (you can actually check this) then by all means tag away. I just think the safest blanket advice we could give is to delete all tags and tag pages and never look back. I think the data would support that if you looked at user metrics and Google metrics on sites that tag correctly.
 
I think the data would support that if you looked at user metrics and Google metrics on sites that tag correctly.

I agree, you almost never see tag showing up in the SERPs unless it's a relatively weak SERP overall.
 
Cool, gonna delete. Will post back here if I see any noticable changes.
 
OP is telling us this is the case. Of the 5,774 crawlable pages, 3,050 of them are tags. 2,105 are "legit" pages in the sitemap. That's how quickly tags get out of hand for most people.
The content was published by someone who had no idea what they were doing is what it feels like and it's so old it's unreal. 2013 legacy tags used once, just one time unbelievable. What were they thinking?

They weren't.

Time to burn those URLs to the ground. One other massive problem I have with industry updates with news source links is that those news outlets often kill the old articles. You're then left with a massive number of posts with external 404s because someone thought regurgitating news from sources and slapping 15 single used tags on a post about blue widgets was a great idea. SMH.
 
2. Check rankings and traffic first. If they are thin but ranking, maybe you should optimize them further? If they get internal/referral/other traffic, maybe they serve a purpose? If neither, 404 them.
Not just noindex them @illmasterj?

If they're not ranking or bringing in traffic, you can probably glance at them and determine if they have any chance at ranking (as in, are they optimized for search terms that get traffic). If not, I wouldn't bother trying to optimize them. I'd get rid of them. It doesn't mean they were low quality, but they can be dragging down your site. If they aren't contributing to the bottom line, I see no reason to keep them around, when there is something to be gained by getting rid of them.
Get rid of with 404 or noindex? What if you were a news site for example, and there was new stories that linked internally to these poor performing pages? Still nuke (possibly bad user experience for a minscule percentage of overall site visitors) or noindex? And then would you bother to clean up the potentially hundreds of remaining posts that happen to internally link to the nuked page? @Ryuzaki
 
Not just noindex them @illmasterj?
I like simplicity. noindex may be fine but you will still end up with internal links leading to that page and run the risk of somehow removing that tag in future.

If the page serves a purpose, keep it, but if it doesn't and it's not performing, be ruthless.
 
Get rid of with 404 or noindex? What if you were a news site for example, and there was new stories that linked internally to these poor performing pages? Still nuke (possibly bad user experience for a minscule percentage of overall site visitors) or noindex? And then would you bother to clean up the potentially hundreds of remaining posts that happen to internally link to the nuked page? @Ryuzaki
Yeah, you pretty much nailed my thinking. I'd 404 them (or if I was in a hurry I'd 410 them, but that's extra work).

I would either completely nuke them (404 /410), and I would crawl the site and find the dead internal links and clean them up. Or I'd find similar content that I intended to keep to 301 redirect them to, and even then I'd go point the internal links to the new posts. There's no escaping that part. I'm doing all of that strictly for Google to save them resources. Less technical debt = better rankings.

I like simplicity. noindex may be fine but you will still end up with internal links leading to that page and run the risk of somehow removing that tag in future.
All of this plus you're leaking page rank out to a bunch of pages you don't even want anyways, when you could clean all that up and preserve the "average page rank levels" by flowing the juice back into the pages that matter.
 
Exactly as @Ryuzaki says. If you are concerned @Darth , remove (404) a single page that you know has a lot of internal back links.

Then go and install the Broken Link Checker plugin, or run Screaming Frog or a similar tool on your site. You'll see how quickly you can identity those links and remove them.

It may seem like a big time cost but it really isn't.
 
Thanks both @illmasterj and @Ryuzaki. So basically 404 what I have found, and then use a tool like screaming frog to find other pages I am keeping that happen to link to those 404 pages and remove them (or link elsewhere). Does screaming frog do that?

Also understand I can find other pages to 301 these nuked URLs too. Either way, it seems noindex is not worthwhile.

Not worried about nuking pages with incoming links thanks, I have that covered.
 
Does screaming frog do that?
Pretty much any "auditor" tool will pick up 404s. It's one of the most basic "things to fix for SEO" on just about any SEO audit.

Screaming Frog, Website Auditor, Ahrefs Site Audit, Semrush, Serpwoo, etc... They'll all do it.
 
Back