Internal link audit?

Potatoe

BuSo Pro
Joined
Jan 4, 2016
Messages
737
Likes
1,115
Degree
3
Hey there, I'm going over an old site and touching up the on-page.

Does anyone know of a site or app that I can pop an article's URL into, one by one as I go over them and fix the on-page, that'll tell me which other pages on my site already link to them contextually?

So I could enter domain.com/red-widgets in, and it would spit out a list of pages on the same domain that link to the /red-widgets page?

Whether it can do this in bulk or one at a time, either is fine since I'm going over every page manually already.

Thanks!
 
Have you tried screamingfrog.co.uk(go to seo spider > download)? There is a tab 'In links' at the bottom that shows internal links pointing to the url.
 
Xenu Link Sleuth and Integrity are two more great spidering options in addition to Screaming Frog. Integrity is Mac only.
 
As far as Xenu goes, there's a couple often missed options that can be particularly useful. The standard HTML report, IMO, much of the time is not all that useful. Presentation of data, especially at scale, is just as important as the data itself (if not more so). On the File menu, after running a crawl, there's:
  • Export Page Map to TAB Separated File
  • Export to GraphViz File
With the first, you get a simple list you can throw into Excel and convert with text-to-columns, to begin getting a sense of your link structure.

With the second option, .gv files, you can either use GraphViz itself, or possibly an easier and a bit more user-friendly option might be Gephi. Load that up, select a type of visualization (like Force Directed), and now you can actually visualize your link structure! This can be incredibly useful for pinpointing problem areas and deficiencies. If a picture is worth a thousand words, visualizing link structure of a complex site is worth a thousand Excel hours.
 
As far as Xenu goes, there's a couple often missed options that can be particularly useful. The standard HTML report, IMO, much of the time is not all that useful. Presentation of data, especially at scale, is just as important as the data itself (if not more so). On the File menu, after running a crawl, there's:
  • Export Page Map to TAB Separated File
  • Export to GraphViz File
With the first, you get a simple list you can throw into Excel and convert with text-to-columns, to begin getting a sense of your link structure.

With the second option, .gv files, you can either use GraphViz itself, or possibly an easier and a bit more user-friendly option might be Gephi. Load that up, select a type of visualization (like Force Directed), and now you can actually visualize your link structure! This can be incredibly useful for pinpointing problem areas and deficiencies. If a picture is worth a thousand words, visualizing link structure of a complex site is worth a thousand Excel hours.

Is there a way to exclude a directory in the middle of a path in Xenu? I want to exclude "/product-filter/" example.com/categories/product-filter/attribute/

It's getting tripped up on an insane amount of product filter variants, should I change the maximum depth instead?
 
I'd try restricting based on depth, though that may not be enough or ideal for what you want. Also, when you first select "Check URL" there are include/exclude filters, so you might experiment with that as well. Worst case, if it's a very complex site, try segmenting things and using includes/excludes to only crawl chunks of the site. Maybe it'll take several iterations to get enough of the areas you want, but at least you'd have it.
 
I'd try restricting based on depth, though that may not be enough or ideal for what you want. Also, when you first select "Check URL" there are include/exclude filters, so you might experiment with that as well. Worst case, if it's a very complex site, try segmenting things and using includes/excludes to only crawl chunks of the site. Maybe it'll take several iterations to get enough of the areas you want, but at least you'd have it.

I'm trying to restrict on depth now but I definitely like the idea of breaking it up in chunks. Would you still be able to visualize it in gephi if it's broken up?
 
The GraphViz output format is sort of a JSON style format. I don't know if it is technically to-spec JSON or if there are any weird quirks to it, but if you open up the output file in Sublime or some other decent text editor, you'll see what I mean.

It would be a little copy pasting, but shouldn't be that difficult. Just be sure to maintain double quote encapsulation of each link element, the arrows in between them, and the semicolon line endings, and you should be fine.
 
The GraphViz output format is sort of a JSON style format. I don't know if it is technically to-spec JSON or if there are any weird quirks to it, but if you open up the output file in Sublime or some other decent text editor, you'll see what I mean.

It would be a little copy pasting, but shouldn't be that difficult. Just be sure to maintain double quote encapsulation of each link element, the arrows in between them, and the semicolon line endings, and you should be fine.

I ended up doing a crawl at a depth of one to grab all of the category pages that would have filters, then put it in excel, added the /product-filter/ to the end of each url to make a giant exclusion list. Took a few minutes because you have to add the exclusions one at a time in Xenu (unless I missed a faster way). Crawled the entire site. I'm currently running a force atlas visualization. Looks like it will take a while. It's currently rendering and untangling itself, looks cool lol but to be honest, I have no idea if set it up right or if I know how to read it.

Do you just throw it in and hit run?
 
I hit run, and after a few seconds I believe I hit stop. I think it just continues "running", whatever that means. At least, every time I've run it, it seems that it just keeps running with no end in sight.

An easier option might be PowerMapper. They have a free trial for 30 days, and it's really not that much if you find it useful. With that one, it's much easier to exclude URL types in a simple manner, especially for certain subdirectories in the middle of a path like you mentioned.
 
Sitebulb's visuals are what you are looking for:

KrDMc6b.jpg

--
cCGObOB.jpg

--
X75Ks5u.jpg
 
Sitebulb is from the same guys who made URL Profiler, that's got my attention. I'm curious as to what the price point will be.

The gephi visuals start to make more sense when you start comparing different types of sites. For fun I ran a few examples. For times sake I didn't crawl every site in its entirety.

Ecommerce site with a mega menu:
YnFwMXr.png


Ecommerce site with a mega menu and large blog:
oihIPJG.jpg


The Bruce Clay site:
PfNPZiA.png


The Wirecutter:
SvIoN1u.png


Payday loan site - with many location based pages:
hFihQsK.png
 
Last edited:
Back