I was looking at the stats for this site, and noticed that since I started, 895 people have tried to open a page called robots.txt … a page which doesn’t exist.
Looking into this, I found out why…
I won’t bore you with anything more than a very basic summary:
Search engines send out robots, or spiders to look all around the net and map what sites are out there. If you don’t want them doing that on your site … mebbe you’re a terrorist, or have naked pictures of yourself on there, then you can tell them to stop.
You do this by putting up a robots.txt page with instructions. Robots / Spiders will check this page first, to see what instructions you’ve given them about access to the site.
The proper format is something like this:
# robots.txt for http://www.monkeysandpirates.com/
User-agent: *
Disallow: /nakedphotos/mcphee/
Disallow: /terrorism/
Disallow: /ladyboys.html
That’ll stop the robots visiting and cataloguing the listed folders.
(note those are just an example, and not real pages … and yes my site stats WILL tell me which of you try to access /nakedphotos/mcphee/).
Anyway, I thought this was a cool idea, but the suggested robot commands didn’t quite cover it, so I stuck up my own robots.txt page 🙂
# robots.txt for http://www.monkeysandpirates.com/
User-agent: *
Begin World Domination:
Learn Emotion:
Morph Into a Car:
Assimilate The Pickled Pixie:
Leave a Comment