There is an interesting response from John Mueller of Google on what to do with URLs that may appear duplicated because of URL parameters, like UTMs, at the end of the URLs. John said definitely don’t 404 those URLs, which I think no one would argue with. But he also said you can use the rel=canonical because that was what it was made for. The kicker is he said it probably doesn’t matter either way for SEO.
Now, I had to read John’s response a couple of times on Reddit and maybe I am interpreting the last part incorrectly, so help me out here.
Here is the question:
Hello! New to the community but have been in SEO for ~5 years. Started a new job as the sole SEO manager and am thinking about crawl budget. There are ~20k crawled not indexed URLs compared to the 2k that are crawled and indexed – this is not due to error, but due to the high number of UTM/campaign specific URLs and (intentionally) 404’d pages.
I was hoping to balance out this crawl budget a bit and removing the UTM/campaign URLs from being crawled via robots.txt and by turning some of the 404s into 410s (would also help with overall site health).
Can someone help me figure out if this could be a good idea/could potentially cause harm?
John’s 404 response:
Pages that don’t exist should return 404. You don’t gain anything SEO-wise for making them 410. The only reason I’ve heard that I can follow is that it makes it easier to recognize accidental 404s vs known removed pages as 410s. (IMO if your important pages accidentally become 404s, you’d probably notice that quickly regardless of the result code)
John’s canonical response:
For UTM parameters I’d just set the rel-canonical and leave them alone. The rel canonical won’t make them all disappear (nor would robots.txt), but it’s the cleaner approach than blocking (it’s what the rel canonical was made for, essentially).
Okay, so far, do not use 404s in this situation but do use rel=canonical – got it.
John then explained SEO wise, it probably doesn’t matter?
For both of these, I suspect you wouldn’t see any visible change on your site in search (sorry, tech-SEO aficionados). The rel-canonical on UTM URLs is certainly a cleaner solution than letting them accumulate & bubble out on their own. Fixing that early means you won’t get 10 generations of SEOs who inform you of the “duplicate content problem” (which isn’t an issue there anyway if they’re not getting indexed; and when they do get indexed, they get dropped as duplicates anyway), so I guess it’s a good investment in your future use of time 🙂
So Google will likely handle the duplicate URLs, the UTM parameters anyway, even if they do index them. But to make SEO consultants happy, use the rel=canonical? Is that what he is saying here? I do like that response, if that is his message – but maybe I got it wrong?
Forum discussion at Reddit.