I guess I have seen an advantage of installing this mod, as today I was crawled by "ia_archiver".
On Investigating this is what I have found.
The crawler is
Alexa crawler (robot), which identifies itself as ia_archiver.
Whenever ia_archiver lands on the top level of a Web site, it looks for a file called "robots.txt". Robots.txt is a file website administrators can place at the top level of a site to direct the behavior of web crawling robots.
A crawler will always pick up a copy of the robots.txt file prior to its crawl of the site.
To exclude all robots, the robots.txt file should look like this:User-agent: *
Disallow: /
To exclude just one directory (and its subdirectories), say, the /images/ directory, the file should look like this:
User-agent: *
Disallow: /images/
Web site administrators can allow or disallow specific robots from visiting part or all of their site. Alexa's crawler identifies itself as ia_archiver, and so to allow ia_archiver to visit (while preventing all others), your robots.txt file should look like this:
User-agent: ia_archiver
Disallow:
To prevent ia_archiver from visiting (while allowing all others), your robots.txt file should look like this:
User-agent: ia_archiver
Disallow: /
For more information regarding robots, crawling, and robots.txt visit the Web Robots Pages at
http://www.robotstxt.org, an excellent source for the latest information on the Standard for Robots Exclusion.
In any event, simply by visiting your site with the Alexa Toolbar open, Alexa will learn of your site and add it to our list of sites to visit, thus ensuring your inclusion in the Alexa service and in the Alexa archive.
If you are the type of person who won't be satisfied until you get to click a button that says "Crawl My Site," then Alexa have just the form for you.
http://pages.alexa.com/help/webmasters/index.html#crawl_siteI have not been crawled by Alexa before installing this mod.