How To Run a Brand Enterprise SEO Audit

written by Janet L Bartoli | CORPORATE SEO, In-House SEO, Ultimate Guide

March 29, 2019

We all conduct technical audits - BUT do you know the difference between what you should include for a large enterprise SEO audit versus what you may have done for a smaller site with less than 20 pages?

Then read on. 

If you produce enterprise SEO audits, you’ll need something a bit more advanced than what you see anytime you Google “seo technical audit”

After conducting hundreds of enterprise technical audits, and training up other SEOs on how to perform audits for larger sites I wanted to make sure to provide that level of detail here for you.

If you’re looking for an SEO audit how to, or a simpler version, for a site with less than a couple hundred pages - there’s loads of other resources out there.

In fact, I’ll help you out by sharing a few of those resources here for you 

Now that's out of the way - read on and let's dig into the fine deets on this thing. 

And some of those methods mentioned in those sites could be applied here.

Before you begin to dig into this further, let’s establish some initial ground rules ....


You should have a start and end date in mind - this can’t be an audit that goes on endlessly.

Even if you haven’t ever conducted an audit for a new site you’re working on, develop your start and end milestones 

This is a process and you have to be disciplined and organized about what you need to accomplish, and the amount of time necessary in getting it complete.


Just like baking a cake, you have to make sure you have all the ingredients and tools to get the job done. 

If you set out to conduct an audit, but you don’t have access to any tools, you’ll find your timeline get pushed out because you haven’t set that up. 


Assuming you are conducting the audit, make sure you dedicate time to actually accomplishing the audit.

 With your other day to day job tasks, try to dedicate a block of at least 3 hours uninterrupted each week.

Establish your own SEO audit template. 

Once you develop it you have something for the next audit.

Make sure to save each report (I know this is a no-brainer, but many of us have misplaced older audits).

These can and should be used for historical purposes to show growth and improvements overtime.  

The Necessary Tools 

These are the basic recommended tools you should use to conduct your audits with.

There are of course many others, including SEMrush, Conductor Searchlight (which has Deepcrawl integration), and Ahrefs, etc.

I'm not going to assume you have a budget at all right away - so the tools outlined are for those in-house SEOs on a budget.

*For anyone who might be interested in a server based tool - I would HIGHLY recommend Deepcrawl. If you manage a large scale site with thousands and thousands of pages - this is a great choice. That way you can avoid using the desktop tool, which is Screaming Frog.

Screaming Frog is a great choice, but if you're reading this -  I'm assuming you're managing a very large enterprise site with thousands of pages - and a server based audit tool would be best. 

FYI: Memory Limitations: Desktop crawlers are limited by the amount of memory on the user’s machine. When you are running audits on large sites with 10k+ pages, the crawler dies very quickly due to memory limits.


It’s basic, but you must confirm that all those pages are accessible, particularly any deeper level product pages. Don’t assume they have all the basics in place like a Robots.txt file or an XML sitemap.

This audit can also provide you with an opportunity to show your IT team how the site architecture is setup. 

Even if you don’t have the budget to make big site changes, having those areas documented with your recommendation, could be the business justification to getting it corrected next year.

What is search accessibility? 

Digital marketers should be familiar with W3C web accessibility for those with disabilities.

There are some core areas that should be included within your audit

  1. Mobile accessibility includesL  Text should be resizable up to 200% without the need for third-party assistive technology. If a user is visually impaired and unable read your website’s copy, you can bet they’ll likely leave without converting. Here's a link to the W3 Mobile accessibility page
  2. Images, video - non-text content like images and infographics should contain descriptions of the content being displayed, usually in an alt attribute. 
  3. Page titles should be unique along with HTML code that identifies the language used. User agents, like internet browsers, and other applications like screen readers need this information to accurately understand and communicate the page content. 

There are many other areas that require accessibility optimization... With over 1 billion people with disabilities - you should make sure you include an audit of your site's accessibility. 

This does not need to be a full on intensive audit - but you should make sure to include the following into your audit workbook - title it Accessibility. 

  1. YouTube captioning - link in the image below
  2. Mobile - includes touchscreens, tablets, smartphones, and even wearables
  3. Web Accessibility - Audio, Visual, imagery - here's a great YouTube video to help cover off on all you should be reviewing. 
Google Accessibility

Search Engine Accessibility Rookie SEO Mistake

Don't be overly reliant on W3C as a guide to making your site 100% accessible. It's next to impossible to ever achieve a perfect score.

More important is to make sure you cover off on all the legal accessibility requirements including being 508 compliant. Your legal team may also be interested in learning about how your site measures up against the 508 requirements. 



This is a manual review of the robots.txt for compliance. Avoid using your robots.txt file as a catch all for anything in your site you do not want included in search engine’s index. That’s not the purpose of a robots.txt file. 

According to Google A robots.txt file tells search engine crawlers which pages or files the crawler can or can't request from your site.

The robots.txt files cannot demand crawler behavior to your site, rather, these are instructions and they act as directives to the crawlers that access your site.

The Robots.txt File

should not be used as a catch all for eliminating pages from its index is that anytime Google finds a link to your page on another site, with descriptive text,  it might generate a search result from that link.

The simplest and fastest way to see your Robots.txt file is to type the following in your browser

Once there, you'll immediately be able to check if it is

  1. there is a robots.txt file in the appropriate location
  2. Is it structured correctly?

There's a cleaner way to fix this. 

Disallow: The command used to tell a user-agent not to crawl particular URL. Only one "Disallow:" line is allowed for each URL

The robots.txt file directives give bots suggestions for how to crawl a website's pages, robots meta directives provide more firm instructions on how to crawl and index a page's content.

Noindex:  is not used correctly here in this instance. What the webmaster or SEO should do, is include a Noindex Meta tag to those 2 pages they don't want indexed. Here they are essentially trying to direct search bots acknowledge the URL, but don't index the page. This is not the way to go about doing that.

Meta Directives should be used in many situations. Here's a great current post from Moz on that.

BTW - There were at least 24 other lines in this very well known North American brand's robots file I excluded to shield them from being called out.

Next check the following areas:

  1. Are there any noindex references to pages meant to be excluded from the index? Make sure this remains a clean file. Any page that you do not want in the search engine’s index, should be removed from robots.txt file and add a meta noindex tag to that page
  2. Use robots.txt to manage crawl traffic, and also to prevent image, video, and audio files from appearing in Google search results.
  3. If you want to prevent Google from indexing images, you would use the following directive

User-agent: Googlebot-Image

Disallow: /images/paris.jpg

          4.  in the rare case you would need all your images removed from Google's index, the directive, do this

User-agent: Googlebot-Image

Disallow: /

          5. Make sure the XML sitemap location is listed in the robots.txt file 

 Now  Login to the site’s Google Search Console - and go to the old version (as of this writing) or visit Google’s test your robots tester page here

Since we're talking about your company's website - there may be an occasion you need to eliminate any confidential or private information. 

 If  you don’t want any of that content to appear in Google’s search results, the easiest and most effective method to block private URls is to store them in a password-protected directory on the site server.

This would be a recommendation you would make to your development team. They would be the ones to actually create that directory. 

If you have any changes or adjustments to your robots.txt or you need to establish a new file, you'll need to upload the file and submit to Google.

Work with your dev team to have them upload the new robots.txt file to the root of your domain as a text file named robots.txt (the URL for your robots.txt file should be /robots.txt).

Inside your Google Search Console, click verify live version to see that your live robots.txt is the version that you want Google to crawl.

Once all you’ve had a chance to review and make recommendations for the revised robots.txt file. 

They would go into your overall recommendations list, you'd share AFTER you finish your full audit. 

Avoid the one-off haphazard recommendation of sending your dev team a single recommendation like this, then running back to your audit to finish - then finding something else for them to fix. 

Finish the entire audit first, then you'll prioritize ALL your recommendations, and share a nice clean XLS workbook with them afterwards. 

Lastly, according to John Mueller, of Google - "Robots.txt is not a suggestion"


Robots.Txt SEO Rookie Mistake

Avoid This: Disallowing the entire site from the site's index.

Believe it or not, I’ve seen this done a few times. The developer team had no idea what or why the site was no longer in Google’s index.

I checked the robots.txt file, which looked like this


This is another manual review, but also check this in your Google Search Console and use your Screaming Frog tool to crawl this. 

Making sure you create your XML sitemaps is important and something that should be updated on a regular basis - at least once a quarter.  Just because you create and submit an XML sitemap, does not mean Google or other search engines will crawl all pages the instant you upload the file. 

XML sitemaps are just about the BEST technical method of helping Google discover your new content

The Sitemap.XML

What is an XML sitemap? A file in XML format that lists all the most important pages in your site that you want to make sure the search engines are able to access. This file will get uploaded and live off the root domain. You will also upload the file into Google and Bing webmaster areas. 

The best place to create a new sitemap, if you don't already have one established - is to use Screaming Frog.

This will generate a static sitemap. Meaning, if you need to update the XML, it will have to be re-created.  

There are other solutions if you wanted another choice in having these automated - or Dynamic XML sitemaps - which you can have your developer code a custom script. Or try Pro Sitemap Generator - it will automatically update. 

If you use WordPress -this is easy - use SEOpressor, or Yoast or any number of SEO plugins. 

I've actually used this for a telecom client when we needed to create a few XML sitemaps and needed a quick turnaround without expending any dev team resources. 

First, you want to manually check this really fast - go to is your XML sitemap there? 

OK, next jump into Google Search Console now - there on the left side you'll see sitemaps 

If you have a new sitemap you need to let Google know about - make sure you upload your XML generated sitemap to the root domain. 

If you work with a development team, they'll take your XML sitemap and upload it to the server.

Once that's done - just type in the location so Google can have a chance to see it.

You will see whether Google notices any errors within your sitemap - which will be indicated in the status in the Coverage section - where it will indicate if any of the pages included in your XML sitemap is submitted in error - maybe it's a page you wanted to keep out of the search engine's index. 

I don't normally give Bing any notice - but there is some 20+% of traffic using Bing/Yahoo and we should make sure to have something that mentions where and how the sitemap should be uploaded. 

In your Bing Webmaster account - on the left side of the nav - under "Configure My Site" (this is NOT easy to find, Bing makes it really difficult) --> Sitemaps  

If you have an XML already in place in your site - you'll see it listed, along with last Crawl and Status. 

Create one for your videos, and images, not just your site's products and pages. 

Avoid The Image Sitemap - here's what you should do  

 JSON-LD markup to call out image properties to search engines as it provides more attributes than an image XML sitemap. Doing this makes creating an XML sitemap for images no longer necessary. 

The Video XML Sitemap

Similar to your regular XML sitemap for your site's pages. Having a separate one just for your video is ideal. 

There are rules around how to create the sitemap, and what to include. See the resources area at the end of this chapter. 

Here's a good visual to help you understand the difference between the pages within your site's navigation, and how search engines access your XML sitemap - less internal links to hop through, easily move from one page to the next in the XML sitemap, and much more efficient. 

Reference: Searchenginejournal "how to use XML sitemaps to boost SEO"

The Tags:

The Lastmod tag -  The last modified time is especially critical for content sites as it assists Google to understand that you are the original publisher.

It’s also powerful to communicate freshness, but be sure to update modification date only when you have made meaningful changes. I include a link to the full tag list in the resources below. 

According to John Mueller regarding the change frequency tag  Mueller has stated that “change frequency doesn’t really play that much of a role with sitemaps” and that “it is much better to just specify the time stamp directly”.

The priority tag is and has been largely ignored by Google.

Whether you include priority 1 or Priority 3 does not matter. 

XML sitemaps Guidelines:

  • A maximum of 50,000 URLs.
  • An uncompressed file size limit of 50MB.

Need to audit the XML sitemap? If you do have a Screaming Frog License OR if you decided to use DeepCrawl use either of them to create an audit of your XML sitemaps. 

XML Sitemaps SEO Rookie Mistake

There aren't too many Rookie mistakes here - but for the fact that maybe you audit it at all. Make sure you ONLY include URLs you want the search engines to acknowledge. All your product pages, NO privacy pages.

Separate videos out into their own XML Sitemaps. 

A search spider arrives at your website with an “allowance” for how many pages it will crawl - don't make it work harder by including all 1 billion of your pages in there. 

Exclude paginated, duplicated and non canonicalized pages, no in-site search results, archive pages, login pages.


As you audit your site, you need to know how many pages you have in your site and how many are actually indexed. 

Checking to confirm the pages that should be indexed actually are is a big part of this audit. There's a few manual ways you can easily check to see indexation.  

Site Indexation

Start by having a good idea as to how many pages the site actually has. If it has 1500 and you find 4500 that could mean there's some mad duplication going on. 

Ask your development team - they should know exactly. 

Next check using the site command - "" would have given me all sub domain results, here I was just interested in the main domain only. 

So this is quite a lot of pages to sort through, agree? 

So let's try another site command, this time sorting out only the sub folder we're interested in - here I'm only looking at REI's /rei-garage" sub folder 

What if a page or pages are missing ? 

First check to see if that page has a Meta Robots tag on it - or the improper use of disallow in the robots.txt file. 

If you don't see the actual page in the robots file - check to see if that entire sub folder is being disallowed. 

Next head over to your Search Console account and check the Coverage section 

Up until now, we've only been working this audit very manually. 

Next - Let's dive into DeepCrawl 

(or if you're using Screaming Frog - you can use that too) 

OK back to Deepcrawl - when I click on the "indexable pages" section of the dashboard I can easily see how many pages are indexed in the search engines.

I typically map that up against what I have in Google Search Console. Since the site I'm working on right now is relatively small, less than 100 pages, I can quickly see that they match up with what I have in Google Search Console. 

I cross confirm that those pages I wanted to noindex, a registration page, a privacy policy and WP login - all are accounted for in both DeepCrawl and in Search Console - all other URLs are indexable and according to Google indexed. 

If you use the template I provided you - the notes to include would be your observation around what you saw in the tool you used, against what you saw in both Search Console and when you manually performed your site command. 

Site Indexation SEO Rookie Mistake

Avoid just using one method. Make sure you cross check your results. Confirm the number of pages with your development team. 

Always confirm if you find pages most important to your business missing from the search engine's index, make sure you've checked the page's meta robots tag if, and the site's robots.txt file to confirm that sub directory isn't being disallowed. 



Checking to find your site's crawl errors is the next step in the enterprise SEO audit process.  

How does your site encounter a crawl error?  When the search engine crawls your site, it goes through a process of trying to reach every page in your site. It will find those pages through public facing links to the pages in your site. 

Your goal, SEO, is to make sure that those pages that should be accessible to the search engines are. 

Crawl Errors

There are two types of crawl errors you should be familiar with. 

  1. Site Errors - aka your entire site is unable to be crawled
  2. URL Errors - These are more specific to a URL in your site i.e. 404

There you'll see if you have any errors at all. 

I have 1 that I should fix - it's actually a registration pop up  - but found its way into my XML sitemap. 

Within many large enterprise sites - you may encounter some site errors in the form of server related, where you may experience load balancing issues.

You might encounter 502 errors. I would see that from time to time. Those are server errors you should inform your IT team about. It could be a load balancer issue - depending on the amount and frequency of those, they are errors your IT team should be aware of. 

Here's a list of all the crawl errors you could encounter and their meaning

  • Mobile Specific URL Errors - If you have a responsive site, you are not likely to see these. However, if you still have a separate m. site you may see these. You might also blocked the mobile site through your robots.txt
  • Malware Errors - Either Google or Bing found some malicious software on the URL - investigate the page and remove the malware
  • Server errors - These could relate to server connectivity through things like excessive page loading for dynamic page requests - in other words, if you have a site that delivers same content for multiples of URLs - perfect example of this might be found in a large ecommerce site where every size, and color and type of a sweater /color=blue&size=small and /size=small&color=blue are the same product page. Another thing to check is to make sure your site's hosting server isn't down, overloaded, or misconfigured.
  • Soft 404 - This happens when your server returns an actual page for a URL that does not actually exist. Search engines spend valuable time crawling and index non-existent or duplicative content. In this case if you have a page that is non-existent make sure to configure your server to return a "hard 404" (404 Not Found) or 410 (Gone) response code. You should have a custom 404 page - in your audit make sure to identify if the site has a customized 404 page within.  

How To Correct 404 Errors

  1. Download any 404 or Soft 404 errors you encounter in Search Console. Search console will report 404 pages as Google’s crawler goes through all the pages it can find. 
  2. Identify if those pages were getting any traffic through your analytics 
  3. For any that were getting any traffic to the, now non-existent or page you no longer have live, find a more relevant page to redirect that traffic to
  4. Always setup a 301 Permanent Redirect from that non-existent page to a more relevant page, i.e. if it was a product that is no longer active, redirect that page to the main product level page. NOT the Home Page! For example - Men's Diver Watch returns a 404, but it's still getting some decent amount of traffic, but the Men's Diver Watch is no longer a product you sell. In that case, redirect that page to the Men's Sports Watches page instead - more than likely that visitor was looking for a sport, water resistant watch. It's a pretty good assumption. 
  5. For Soft 404s - this could also be the result of thin content, which a tool you'd use won't be able to detect. If you find there are some good, active pages with thin content - sort those and get the content updated. This is a fairly simple suggestion - of course there may be other considerations like if there's a whole category that looks like this, you might need to redesign the template to include more content. 
  6. A soft 404 is not real 404 error, but Google will deindex those pages if they aren’t fixed fast. 

Crawl Errors Rookie SEO Mistake

The most common mistake, is one I mentioned above - taking a 404 and redirecting it to the home page. This is not only lazy, but a poor user experience, and UX is part of SEO. Make sure you always review your 404s regularly through your audits using either Screaming Frog or DeepCrawl, and check your Search Console to keep things maintained.



Internal linking is something you can also include in your audit. I include this so that I can understand which pages are not getting any visibility, and find out that maybe the page is so deep in the site and there aren't any links internally within the site pointing to it. 

This is not a backlink audit, but more specific to those links within the site. Those that exist, and those that are broken. 

Analyze Internal Linking

The fundamentals of site structure are largely factored into the design and strategy of internal links. By reviewing the main navigation you'll be able to see most critically in the main navigation of the site.

The number of clicks from the Homepage (page depth) and internal links pointing to a page, determine that page’s importance within the site. 

When conducting your audit you want to make sure the use of internal links is done according to industry standards and best practices. 

Here's a few of those best practices:

  1. Think about User Experience - Above all else look at the way in which the internal links are being used. Are you seeing random, every other third word is a link? Does it link to a relevant, useful internal page? Or are there just a bunch of random links jammed in everywhere? Make sure its beneficial to your reader and it should look natural. 
  2. In your Crawl report either with Screaming Frog or Deepcrawl - look for and remove redirects/redirect chains from internal linking that may likely dilute link equity
  3. Create more inlinks for important pages by including them in the main navigation or by including more internal linking within the body content.

In your Search Console - check for 404s, are those the result of a broken link to a page that may no longer be active? 

Next - using DeepCrawl (or Screaming Frog - they also have this report available) 

Check both Unique internal links & All Broken Links. 

Check the anchor text - which should be fairly varied and relevant to the page your linking to. 

When you check for your broken links, make sure you download the report so you can add that to your audit for when you make your recommendations to share with the developer for remediation. 

Check this against your Search Console and confirm that there is nothing missed or overlooked. 

You want to make sure there's a good and natural user of internal linking. If there's a product or service page that isn't getting any search visibility or traffic, check to see if there are any internal links pointing to that page. Of course auditing to confirm you don't have any internal broken links is of course the point of this audit as well. 

Internal Linking SEO Rookie Mistake

If I had to pick a rookie mistake for this one, it would be that this step is overlooked and just not done. Make sure you don't skip this step. It's as easy as that!


Much has been written on this topic. Google has indicated site speed (and as a result, page speed) is one of the signals used by its algorithm to rank pages

There are those that measure "page weight" and those enterprise corporate teams that measure using their own proprietary (aka home grown tool) measurement of what they (mostly the IT team) deems is their page speed. 

I've yet to meet a development team that has built their own tool that can accurately measure time to first byte, or the user experience - in the way Google measures it. 

Mobile and Desktop Page Speed

Because measuring page speed isn't nearly as easy as open up GT Metrix, plugging your URL in and getting an easy answer. 

Before I go on to describe what and how you should measure this. I'm going to give a shout out to New Relic - and if you aren't aware of this web based platform you should totally give it a look. I worked with a client recently who already had a license for APM and we added the Browser feature to get a very detailed page speed resport. 

You can also troubleshoot minified production JavaScript code, with source map support giving you full visibility to where in your code the front-end error is - which helps you and the dev team track it down and resolve it. 

While I do recommend using New Relic, for this audit, I'll assume you couldn't get your SOW approved through your legal team in time to conduct a proper audit. 

Therefore, I will use GT Metrix. There is a pro version of GT Metrix that is relatively inexpensive. 

There are a few ways to measure your page speed:

  1. Fully Loaded Page: This is how long it takes for 100% of the page to load 
  2. Time to First Byte (TTFB) - How long it takes for the page to start the loading process 

       3. First Meaningful Paint/First Contextual Paint:  This is really a measurement of the user experience because it measures how quickly a page is able to load enough resources to allow the visitor to use the page - and quickly get them to what they intended to faster. 

There are loads of resources for best practices - and I'll mention them below in the resources area. 

Make sure you use 1 tool - using 2 or 3 will only confuse things and they will most likely not even share the same results. 

If you use GT Metrix (the primary difference in the free standard version and the Pro is their developer toolkit in the Pro version. It allows you to simulate a device and can monitor from remote locations - which could also come in handy. 

When you're working with your very large enterprise site you might think they are using a CDN or content delivery network. Don't assume that. 

Here's a really good image I wanted to share from Backlinko that shows a really good view of how a CDN works, and the obvious reason around requiring that your IT team get this on their radar for next year's budget. 

This is why I highly recommend using GT Metrix - shows you those areas you should be measuring. They provide a great visual that helps you easily include this into any presentation or create a separate tab in your XLS audit workbook. 

The image below showing First paint time is on the "Timings" tab of your report. 

In your audit you should record those I mentioned above 1-4. If this is your first audit with this site, these would be your baseline numbers. After you get your recommendations implemented, go back and re-run the test again to show the improvements.

You'll also see improvement in your visitors page engagement from page to page and can also decrease your bounce rate! 

In order to get a mobile page speed analysis done - that's what the Pro version offers you. In the standard version you'll use the desktop version. 

You could also use Google Page Speed Insights or - but for what you need, I'd use either GT Metrix or New Relic. 

Page Speed Rookie SEO Mistake

Avoid using your developer's home grown tool to measure page speed. As I mentioned above, they aren't looking at this in the same way you are - being from a search engine and visitor's perspective. 

Don't use page weight as an indicator to what your page speed measurement is. I've heard this song and dance since before the dawn of search. It's old, tired and BS. 

Stick to those measurements I mentioned and you'll have an awesome and, more importantly, accurate page speed report to share. 



In light of Mobile First, and all the major mobile focus over the past few years, it's important to have a particular focus on your mobile user experience, which does play a role in how your site is accessed, indexed and performs from an SEO standpoint. 

Your mobile traffic may perform a great deal better than desktop in other than North American countries. 

There are a few factors the search engines, and Google in particular looks to the user - is the viewport set, is the content wider than the screen? 

Note: If your site is NOT responsive - that should be your priority 1 recommendation this year. 

Mobile Usability & The Mobile User Experience

First step in auditing your mobile usability is to go to Search Console. 

Select the Mobile Usability section of Enhancements

This is very straightforward, look for any errors. If you do have errors - Google will always describe what they are, and you can reference how to resolve those and validate the errors in the developers section over here

How many valid URLs do you have? Make sure to report all this in your mobile usability tab of your audit workbook.  

Next head over to Google's mobile friendly tool found here

Run the test on your site - and include a screen shot if you want, or just report out in text that your site is mobile friendly. I like to actually include the screen shot - just makes it more legit somehow .. 😉 

If you encounter any mobile issues at all - be sure to document those in your audit. Back it up with the total error count along with type and a reference to how to resolve them. In fact this is sound advice for any type of error you identify in your audit.

Understand what the error is, and how to resolve it - it's important you're able to speak to this during your audit presentation to your employer, developer or client. 

Mobile Usability SEO Rookie Mistake

This may sound obvious, but assuming your site is actually mobile responsive, and it isn't. 

Or just putting the whole usability thing on the back burner.


Take care of your mobile site, tend to the page speed, the user experience and ability for others to access your site no matter what device they use. 

Final tip: Don't block JavaScript! 



Basic knowledge for URL formation is this - shorter URLs, use hyphens, not underscores, ensure you have a logical folder structure in place and make sure your site is flat not deep. 

You're looking for the  URL structure to logically follow your categories.

Here’s an example of a URL structure that a lot of sites use:

Your URLs don’t have to look exactly like that. It's more important that all of your URLs follow the same structure site-wide. Avoid using camel case (i.e. /Category/PRODUCT-name/

URL Formation is key

One of the first things to review in your site is the actual site architecture. 

You might see a complete disastrous user interface across the site. That's OK - make sure you present the best practices. 

Here's a pretty poor site architecture.  /products/$%_00+skuID=?111322   this tells you nothing. 

Now here's a much cleaner site architecture - has a logical folder structure - easy for the visitor to see and know what the page is

You might also have a very large ecommerce site - there's even more reason to make sure it's as flat and easily accessible by your visitors than if it were a very deep site. 

When I say 'deep" I'm referring to the number of page depths from the home page.

For example, if I started out on a home page of a large ecommerce site, and the product I'm looking for is 7 clicks deep - meaning, I have to click through, category 1, into sub category 2, then sub category 3, etc or Womens, then water sports, then surfing, Long Boards, white long boards - that URL might look like this  - that's 5 levels deep - way too much! 

That is part of your audit - take a manual look at your folder structure. 

Check to see how many product categories you have. Can they be shortened? Or organized in a more accessible way? Make that recommendation. 

Next - look at how the URL is formed. Does it have parameters all over the URL? i.e. parameters are syntax that is not easily ready by either a visitor or a search crawler because they only mean something to your content management system. 

Here's an example of poor URL formation .... What is after kids??? You'll notice it has a jessonid - and not to call them out or anything but I found this page when I searched for "Boys denim shorts" boys is NOT even in the URL?! They also were found on page 8 of Google's indexed results.;jsessionid=8F796698A6B922262FBD3EA06A8DE93C.aap-prd-dal-app-02-p-app2

If this were my client - I'd recommend a revision of that folder structure. 

I'd also make sure to have a boys and girls section of the site. Without this it becomes a poor user experience. Suppose your "toddler" is a girl? You want your visitors to click through and sort through all the clothes?

What happens when you have a cleaner URL structure? You rank #1 for boys jean shorts. 

I would not have opted for the "+" in between keywords, but looks like it does work. If you want to try that, go for it, otherwise, I prefer using hyphens, NEVER underscores (they are perceived by Google as parameters) 

By also including breadcrumb navigation on the actual page you also help both search engine bot and visitor. And that is a good thing! 

You can report this out in two ways. First do a manual review like I showed you here - how do the product category pages look? Are the URLs clean and how deep is the site? 

Next run a crawl using DeepCrawl or Screaming Frog to pull all your URLs to see which might need to be optimized, like the children's apparel URL above. Look at your competition - what does their structure look like? That's a good place to start. 

Export those results and have them appended to your audit workbook to share the current view. 

URL Formation Rookie SEO Mistake

Neglecting to include links from category pages to the sub category level pages. Not using HTML for your navigation. JS isn't SEO friendly and can inhibit the crawl. 

Even though Google can partially crawl and index some JavaScript, you definitely want your navigation links to be HTML.


Duplicate content can reveal itself through a non optimized content management system. Not using canonicalization in the right way. When I say duplication, I'm not referring to duplicate content spread out from page to page due to marketers trying to scrape content all over and looking to game the system. 

The other way you might see duplicate content is if you have a site that might have similar product pages offered through the other categories, but you haven't really differentiated the content. 

Dealing with Duplication

it is NOT a penalty if Google discovers your content is not unique and doesn’t rank your page above a competitor’s page.

Now here's a list of duplicate points according to John Mueller

  • Let's kill this myth once and for all - There is no duplicate content penalty
  • Google rewards UNIQUENESS and the signals associated with ADDED VALUE
  • Duplicate content can slow Google down in finding new content 


"Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin".

Examples of non-malicious duplicate content could include:

  • Discussion forums that can generate both regular and stripped-down pages targeted at mobile devices
  • Store items shown or linked via multiple distinct URLs
  • Printer-only versions of web pages



At it its core, schema markup is code that enables the search engines to read certain data and tell it exactly what it is. 

Google and other search engines created a structured data standard called 

Way back when was originally released, the primary method of including Schema into your pages was to use microdata within your HTML elements. 

That was then. This is now. 

Now, there's a simpler and more improved solution for integrating structured data. Called JavaScript Object Notation for Linked Data (JSON-LD) pronounced Jay-son - you may have seen JSON-LD mentioned in numerous search industry articles. 

Here's a code sample of schema JSON-LD

<script type='application/ld+json'>
"@context": "",
"@type": "product",
"brand": "Strategy Rework",
"name": "Technical SEO for SEO Professionals",
"image": "",
"description": "Education for the in-house SEO who needs a solid understanding around the technical aspects of SEO "

and here's the JSON-LD Schema Generator Tool

NOTE: You should avoid using mixed types i.e. both microdata and JSON-LD types on a single page. Use either microdata or JSON-LD. 

Time to audit!

You can check for and extract several types of Schema tags – including Microdata, RDFA and JSON – using DeepCrawl’s Custom Extraction Tool.  

You can use DeepCrawl to check for the types of Schema tags within the site, but you can't test for validity unless you use Google's Structured Data Testing Tool. 

There you can go and test the data out with Google's Structured Data Testing Tool 

There's just about something for everything - here's a full list of all the's collection of types 

In the event you haven't driven down the Schema road before - here's a step by step on how to get that working for your site. 

Step 1: Head over to the structured data markup helper

Validate tools Google

Step 2: Choose your data. There's lots here, but Articles is the most widely used. However, do use products, or events if it's relevant. 

Step 3: Paste the URL, in this example I'm using this article. Then you'll see the next screen where the tagging actually beings. 

Step 4: Choose and highlight any of the items you'd like to have markup for. Here I selected the image on this article page

Step 5: Create HTML!  Then once you've added that to your site, use Google's tester to validate the code. 

Schema SEO Rookie Mistake

A failure to review structured data guide and knowing which to use for the right type. 

Becoming familiar with the guidelines of what to include and with what type is your job. 




Google defines crawl budget as follows: “Prioritizing what to crawl, when, and how much resource the server hosting the website can allocate to crawling is more important for bigger websites, or those that auto-generate pages based on URL parameters, for example.

In Google Search Console (as of this writing, the old version) you'll find Crawl Stats, the crawl rate, is not the same as a crawl budget. The crawl rate, defines how many simultaneous connections Googlebot uses to crawl a site and time it will wait before fetching another page.

So what is Crawl Budget exactly? 

Basically, it's the number of requests made by Googlebot to your website in a particular period of time. Very simply, it’s the number of opportunities to present Google the fresh content on your website.

The crawl budget is actually directly connected to a scheduling of requests spiders make to check content on the site. That's why making sure you regularly refresh, and add content to the site frequently. 

The bots typically crawl from top of your site to the bottom. Ideally getting a full list of all your pages. 

What can occur in large enterprise sites with thousands and hundreds of thousands of pages is they can't possibly crawl the site without slowing the server down. That's when they just stop the crawl. 

Google determines the crawl based on a few factors. Could be the authority or PageRank of a page, or in many cases, the XML sitemaps, links, etc. 

In order to check your crawl stats - open up your Google Search Console

Head to the section where you see Crawl, then Crawl Stats. 

Take a look at the number of pages, as an average being crawled. 

You should see over time the number of pages crawled per day rise on a regular basis. This site has had loads of new content published on a fairly regular schedule. 

Now here's where crawl budget comes into play. 

How To Optimize The Crawl Budget

1. Make sure you keep your site well maintained and cleaned up - here's why the audit done on a quarterly basis makes sense. 

2. Make it faster! 

3. Review your server logs to keep tabs on what's what 

Clean Up Your Site

  • That's what this entire ultimate guide is all about! Check for duplication, malformed URLs, areas within the site that do have crawl errors or issues that are inhibiting search bots from crawling your pages. 
  • Make sure you check your redirect chains - I've seen 6 and 7 redirect chains - literal madness - just clean that up! 
  • As mentioned in the XML sitemap area - make sure you ONLY include those pages you want search engines to crawl, don't waste the budget on pages you don't care about being indexed. 

Get Those Pages to Load Faster!

  • See my page speed area above - the better 
  • You can GZIP html, css, javascript files, but do not use it to compress images.

Check Your Server Logs

  • A server logs file contains data about all requests made to the server - you'll see crawl related errors around this if this is an issue in your site through your GSC (Google Search Console). 
  • This is something you'll likely be alerted to if it becomes a real problem - just make it a regular part of your weekly maintenance and you'll be good to go. 

Crawl Budget Rookie SEO Mistakes

Ignoring basic and general maintenance of your site. Having outdated content, and never giving your site fresh new content to help your visitors only adds to bigger crawling and lack of indexation, and therefore, not maximizing the search engine's crawl budget.


  • Changing the crawl rate in Google Search Console (Old view) 
  • Great Log File Analysis video from Stone Temple Eric Enge


The purpose and reason for an Href Lang tag was introduced by Google in December 2011, the hreflang attribute allows you to show search engines what the relationship is between web pages in alternate languages. 

These are one of THE most difficult and challenging of all technical SEO elements to get right. 

There's much that can and does go wrong with these - I'll show some of those in the Rookie section below.

But first, let's audit them. 

You need to implement Href Lang tags anytime you have a page that references an alternate language. You can use HTML and have that nested in the <head> section of the pages where a language variant is required. 

The method looks like this -  Add <link rel="alternate" hreflang="lang_code"... > elements to your page header to tell Google all of the language and region variants of a page. This is useful if you don't have a sitemap or the ability to specify HTTP response headers for your site

Google uses attributes rel="alternate" hreflang="x" to serve the correct language or regional uniform resource locator (URL) in search results.

More on how, and the guidelines around implementation of this in the following resources section below. 

Now, let's get to the auditing of the Href lang tag.

Launch DeepCrawl and dive into Hreflang section - 

In your XLS audit template, take note of any pages that should have Hreflang but don't. If you see any Not Supported or Broken Hreflang links - make sure to point those out as well. 

Here's a really great resources by Aleyda Solis find her at  (and if you don't know Aleyda, and do any International SEO, get to know her!) 

"Establish hreflang best practices guidelines to be followed whenever a new page is published"

Click to Tweet

If you and your dev team are digging in and creating Hreflang tags - here's a great tool to help create and check your Hreflang tags.

Hreflang Tag SEO Rookie Mistakes

There are many, and this is also NOT exclusive to anyone just starting out in SEO either. 

Believing that hreflang annotations will consolidate link authority. Short answer: they do not. 

There's loads to learn and make sure you get right about this topic. I'll list a bunch of really helpful resources below. Make sure you test, validate and nail this. 


That's it! The whole kit and caboodle! I'll be back here from time to time to make sure to keep this updated and current. If you think of anything that should be in this list, but isn't - tell me!

Post in the comments below and I'll add it!