3

I have a website that I took over as webmaster. It was in WordPress and was hacked and had thousands of SPAM pages injected to the website. These pages were indexed by Google and in the end had the message "This site may be hacked" against search results.

I have migrated the site to a different CMS and made sure it is clean, added it to my Webmaster Tools and the new pages have been indexed - the problem is Google has just added the new pages to the old SPAM pages. The website is small - not more than 100 pages, but on searching site:example.org I get "About 368,000 results".

Google Webmaster Tools sends the message : Googlebot identified a significant increase in the number of URLs on http://example.org/ that return a 404 (not found) error. This can be a sign of an outage or misconfiguration, which would be a bad user experience. This will result in Google dropping those URLs from the search results. If these URLs don't exist at all, no action is necessary.

It has been over a month, but these thousands of 404 errors are still being reported by Google Webmaster Tools.

I have tried searching the forums and so far my only option is to remove the site completely from Google's index and then adding it afresh. I don't want this blackout because we rely a lot on search traffic to find the site.

Any ideas on how to remove these 404 not found pages from Google Index - all 368,000 of them.

2 Answers2

2

Did you tried to send a site map to Google.

Ask Google to recrawl your URLs If you’ve recently added or made changes to a page on your site, you can ask Google to (re)index it using the Fetch as Google tool.

The "Request indexing" feature on Fetch as Google is a convenience method for easily requesting indexing for a few URLs; if you have a large number of URLs to submit, it is easier to submit a sitemap. instead. Both methods are about the same in terms of response times.

From: https://support.google.com/webmasters/answer/6065812?hl=en

If that does not work if those URL share a similar path please try to add those URL in the robot.txt in a disallow rule.

User-agent: *
Disallow: /common_path_indexed/
yagmoth555
  • 17,495
1

You can try adding 301 redirects for those pages so that they point to your front page. This might make it faster for Google to expire the hacked pages.

Tero Kilkanen
  • 38,887