Cybersecurity and privacy concepts to protect data. Lock icon and internet network security technology. Businessmen protecting personal data on laptop and virtual interfaces.

While analyzing some of our ecommerce clients’ Google Search Console (GSC) accounts recently, we noticed some odd URLs with non-English characters and some with excessive English characters that they didn’t create. These showed up as large numbers of non-indexed pages in GSC, specifically as server errors, soft 404s, and/or 404 warnings. After doing some research, we found that the URLs depended on the client’s platform, and typically were caused by website exploits to unused or often-forgotten-about pages.

Graph showing an increase in the number of non-indexed pages in Google Search Console
Graph showing an increase in the number of non-indexed pages in Google Search Console.

I’ll break down the list of common website exploits and vulnerabilities per platform and what to do about them. If you’re in a hurry, jump straight to your platform here: Shopify | Magento | WordPress | BigCommerce

If Your Website Platform Is Shopify

Issue: Almost all Shopify sites have the page /collections/vendors – but not all sites use this page. Hackers know this and find ways to inject junk code (and sometimes junk content) into these pages.

Example URLs:

Solution: If you are not using these pages, keep them out of SERPs by ensuring all /collections/vendors?q= yield a 404-status code and adding a meta robots “noindex” tag to the section. Doing this will prevent the pages from being indexed and wasting your site’s crawl budget.

While crawl budget isn’t often an issue anymore, it can be if Googlebot has to crawl through thousands or hundreds of thousands of unnecessary URLs that you didn’t create and don’t consider important.

How to do it:

  1. Go to Online Store (in Shopify Admin) > Navigation > View URL Redirects link at the top of the page.
  2. Redirect /collections/vendors to /404
    • Note, if you are using this page path, check for the issue at /collections/vendors?= and redirect that to /404, if necessary.
  3. Edit your theme.liquid file by adding the following in the section:
    • {%- if request.path == ‘/collections/vendors’ -%}
      <meta name=”robots” content=”noindex”>
      {%- endif -%}

For more information about how to fix this security loophole on your Shopify site, visit this Shopify Community thread.

Shopify sites should also check their internal site search results pages. This was found to be a source of indexed, non-English character URLs for a client’s site recently. This would look something like: site.com/search?q=홍콩클라우드서버⌒텐… To block these pages from being indexed (or to force them out of SERPs), add a meta robots tag to the section of these pages’ template on your site. See the Magento recommendations below for more information.

If Your Website Platform Is Magento

Issue: Search results pages on Magento sites are indexable by default. If you’re a current or past ROI Revolution SEO client, you know we always recommend you “noindex” your search results pages (because anything that’s visible via your site search results should also be navigable to on your site in another way). In this exploit, hackers inject junk code into indexable search results pages to make your site appear to be full of spammy URLs.

Example URLs:

Solution: Googlebot doesn’t like crawling infinite spaces that lead to low-quality or empty pages/soft 404s. Keep these search results pages out of SERPs by adding a snippet of code to the page template.

How to do it: Add the following to the section of your /catalogsearch/result/ pages:

<html>
<head>
<meta name="robots" content="noindex">
(...)
</head>
<body>(...)</body>
</html>

If Your Website Platform Is WordPress

Issue: WordPress pages have a search results page (/search/) that may be indexable by default. This could allow hackers to inject junk code and create hundreds of useless pages for Googlebot to waste time spidering through.

Example URLs:

Solution: Make sure these pages yield a 404-status code and apply a “noindex” meta robots tag to keep them out of SERPs (or to remove them if they’re already in there).

How to do it: If you’re using Yoast, this setting has likely already been applied for you. If you’re not using Yoast, consider adding it for an easy (read: hands-off!) way to edit your internal search results pages setting.

If Your Website Platform Is BigCommerce

BigCommerce does not allow you to edit individual pages’ meta robots tags, but there is a disallow statement in robots.txt for /search.php by default. Unfortunately, I have seen evidence of Googlebot indexing the /search.php page for some clients, but I have not seen any instances with the excessive character usage mentioned above. This may be a non-issue for BigCommerce users, but you’ll want to keep an eye on Google Search Console to make sure it stays that way.

Data protection concept. GDPR. EU. Cyber security. Business man using mouse computer with padlock icon and internet technology network on blue background.

Tying It All Together: Website Exploits & Securing Your Internal Site Search Results Pages

Taking proactive steps now to secure any potential loopholes in your internal site search results pages can save major headaches down the line. Protect your site from hackers looking for easy website exploit opportunities using the guidelines above.

Noticing other concerns in Google Search Console and not sure what to do? Check out our post about finding and fixing GSC errors.

Sources