SMFHacks.com
** Home Forum Index Hacks Products Login Register Search
Welcome, Guest. Please login or register.
May 24, 2012, 09:17:16 am

Login with username, password and session length
Members
Total Members: 10071
Latest: cdavidson012
Stats
Total Posts: 28687
Total Topics: 4977
Online Today: 59
Online Ever: 2482
(April 09, 2011, 07:02:45 pm)
Users Online
Users: 0
Guests: 50
Total: 50
+ 
|-+ 
| |-+ 
| | |-+ 
| | | |-+ 
0 Members and 2 Guests are viewing this topic. « previous next »
Pages: [1] Go Down Print
Author Topic: Archiver Mod info and robot.txt  (Read 2645 times)
GameSocket
Jr. Member
**
Offline Offline

Posts: 79


NZ Made


View Profile WWW
« on: June 22, 2006, 04:06:56 am »

I guess I have seen an advantage of installing this mod, as today I was crawled by "ia_archiver".
On Investigating this is what I have found.

The crawler is Alexa crawler (robot), which identifies itself as ia_archiver.
Whenever ia_archiver lands on the top level of a Web site, it looks for a file called "robots.txt". Robots.txt is a file website administrators can place at the top level of a site to direct the behavior of web crawling robots.

A crawler will always pick up a copy of the robots.txt file prior to its crawl of the site.

To exclude all robots, the robots.txt file should look like this:

User-agent: *
Disallow: /
To exclude just one directory (and its subdirectories), say, the /images/ directory, the file should look like this:

User-agent: *
Disallow: /images/

Web site administrators can allow or disallow specific robots from visiting part or all of their site. Alexa's crawler identifies itself as ia_archiver, and so to allow ia_archiver to visit (while preventing all others), your robots.txt file should look like this:

User-agent: ia_archiver
Disallow:
To prevent ia_archiver from visiting (while allowing all others), your robots.txt file should look like this:

User-agent: ia_archiver
Disallow: /

For more information regarding robots, crawling, and robots.txt visit the Web Robots Pages at http://www.robotstxt.org, an excellent source for the latest information on the Standard for Robots Exclusion.

In any event, simply by visiting your site with the Alexa Toolbar open, Alexa will learn of your site and add it to our list of sites to visit, thus ensuring your inclusion in the Alexa service and in the Alexa archive.
If you are the type of person who won't be satisfied until you get to click a button that says "Crawl My Site," then Alexa have just the form for you. 

http://pages.alexa.com/help/webmasters/index.html#crawl_site


I have not been crawled by Alexa before installing this mod.






Logged

(\__/)
(O.o )   *If You need help, best not to ask me*
(> < )

SMFHacks
Administrator
Hero Member
*****
Offline Offline

Posts: 9678


View Profile
« Reply #1 on: June 22, 2006, 06:42:26 am »

Good news. I think it is better since it allows the search engines to find the boards and threads easier without going though all the other links they find.

SMFHacks.com
Logged
Pages: [1] Go Up Print 
« previous next »
Jump to:  

Recent
[Today at 08:02:50 am]

[Today at 04:11:41 am]

[May 21, 2012, 08:54:11 am]

[May 20, 2012, 11:06:52 am]

[May 20, 2012, 05:58:11 am]

[May 19, 2012, 06:16:58 pm]

[May 19, 2012, 05:42:37 pm]

[May 18, 2012, 03:08:38 pm]

[May 17, 2012, 06:07:46 pm]

[May 17, 2012, 02:22:07 pm]
Random Picture
Donate to SMFHacks.com
Help Support the SMFHacks.com mod making.
Powered by SMF 1.1.16 | SMF © 2006-2011, Simple Machines LLC
TinyPortal v0.9.7 © Bloc
SMF and SimpleMachines are registered trademarks of Simple Machines. SMFHacks.com is not affiliated with nor endorsed by Simple Machines.
Page created in 4.695 seconds with 18 queries.