Should I block duplicate pages using robots.txt?

Loading...

Sign in or sign up now!
Alert icon
Upgrade to the latest Flash Player for improved playback performance. Upgrade now or more info.
8,798
Loading...
Alert icon
Sign in or sign up now!
Alert icon

Uploaded by on Mar 10, 2010

Halfdeck from Davis, CA asks: "If Google crawls 1,000 pages/day, Googlebot crawling many dupe content pages may slow down indexing of a large site. In that scenario, do you recommend blocking dupes using robots.txt or is using META ROBOTS NOINDEX,NOFOLLOW a better alternative?"

Short answer: No, don't block them using robots.txt. Learn more about duplicate content here: http://www.google.com/support/webmasters/bin/answer.py?answer=66359

Category:

Science & Technology

Tags:

License:

Standard YouTube License

Link to this comment:

Share to:

Top Comments

  • Surprised canonical isn't mentioned as a solution here

  • Matt,

    At the beginning of the video, it sounds like your answer is we SHOULD NOT block the URLs, because Google needs to crawl everything and figure out the duplicates for itself. But then at about 0:57 you seem to reverse your stance by saying we SHOULD block them.

    Can you please clarify?

    Thanks,

    SEOmofo

see all

All Comments (22)

Sign In or Sign Up now to post a comment!
  • great clip keep it up =)

  • I think this has already been answered on previous videos. But anyway, like other videos, it helps a lot.

  • Using 304 If-Modified-Since in combination of a meta robots directives "noindex,nosnippet,noarchive,f­ollow" would be the best way to go. Everything else is simply BS. 

  • It sounded like the answer at the end was that we should not block. I also think that depending on the case you have to do a combination of these techniques, meta-robots, robots.txt and canonical, especially if re-architect-ing the site is not an easy option (in some CMSes' its never an option). Tried the parameter filter that Google provided, and it doesn't work as fast as I wanted it to. leaving Google to identify the dupe increase the dupes cache count 10x and resulted in a rank fall.

  • agreed. He should mention canonical as the best choice here.

  • "We can figure out the dups on our own".

    Looks like Google would prefer to crawl all your site and take the filtering job on their own !

  • Actually web users have made google popular and the most used search engine, so if you want to point fingers, blame the collective world using the internet. I don't know about you but I don't want to go back to 1998 when search query results were filled with pages with ridiculous keyword spamming, hidden text, and the like. I'm not saying that people aren't gaming the google algorhytms as they are and forever will. But search has improved, thanks to google.

  • @infiltrator7777 You only have to jump through the hoops if you want Google to index your site and if you want to rank highly. If you aren't concerned about search engines or ranking of your site, then you can completely ignore the "hoops".

Loading...

Alert icon
0 / 00Unsaved Playlist Return to active list
    1. Your queue is empty. Add videos to your queue using this button:
      or sign in to load a different list.
    Loading...Loading...Saving...
    • Clear all videos from this list
    • Learn more