The pages on your website are indexed automatically via search engine robots. Well formed robots will check a file named robots.txt before continuing on with their indexing activities. This file should exist on the root of your domain, and can be accessed by entering your domain name followed by /robots.txt (http://www.mydomain.com/robots.txt).
The robots.txt file tells well formed search engine robots the files and directories they do and do not have access to. Keep in mind this only applies to the robots visiting your site. Say for instance there is a link to your admin home page on another site, unfortunately it may be indexed that way, if the rel="nofollow, noindex" tag was not added to the link.
Below are a few lines of text to add to your robots.txt files to not only prevent the indexing of proprietary files and directories, but to also reduce the amount of duplicate content a dynamic website can generate.
For Wordpress I use the following:
User-agent: *
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Disallow: /wp-
Disallow: /category/
Disallow: /comments/
Disallow: /tag/
Disallow: /author/
Disallow: /trackback/
When using Wordpress, if for any reason you don't have access to the server root to get, edit, and put the edited file back, then use this plugin for easy editing of your robots.txt file. Robots Meta by Joost de Valk who has a wealth of Wordpress optimization tips at yoast.com.
For Joomla I use a different approach. Joomla 1.5.x ships with the following text in the robots.txt file:
User-agent: *
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /images/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Disallow: /xmlrpc/
You'll want to add the search to this list by entering the following if using the SEF component sh404:
Disallow: /search/
The approach differs for Joomla in that I'll go through the individual 3rd party components, modules and plugins to add rel="noindex" tags to the links that are problematic. I'll save the sh404 and custom permalink approach to duplicate content for another post.
Happy robot dot texting!
When utilizing a CMS, such as Wordpress or Joomla to publish this content there are inevitably certain files and directories that you will not want indexed by the search engines. For instance Joomla has an administrator directory or the Wordpress wp-admin directory that store the files needed for your CMS to function correctly.







