Spiders from search engines that visit your site and crawl pages leave some unique trace marks in your access log. This can tell you whether a spider has visited or not, what pages they have visited and also the frequency or duration of their visit.
The best way to identify spider visits is by finding out which visitors asked for the file robots.txt from your site. Only spiders make such a request, as this file is an indication to them to avoid covering the page in question. So the first thing a crawler would do is to check for this file. If you see the access log and analyze it using some convenient software, you would be able to spot all the visits that were initiated with this request. Then one can spot the host name and relate that to major search engines. Host names are related to the search engine company’s name (it is the name of the site that hosts the spider). Another name that is used to identify such visits is the agent or browser names used by respective search engines. Get a list of host names and agent names from available resources (these names tend to change often) and also develop your own intuitive list by searching your access logs for all occurrences of known engine, host or agent names. Concentrate only on the top engines; though you may find several other smaller and less known search engines visiting your site.
Pay attention to not only the total number of visits but to the activity pattern for each of the recent visits to actually judge how many pages they covered. This is a very good way of ensuring if submissions have worked or if other inducements such as links from other sites have worked or not. This also helps you to distinctly evaluate the effectiveness of submission, indexing and page ranking characteristics of your site.
Some examples of hostnames and agent names are as below:
• AltaVista: hostname may have altavista.com within its name; agent is often called Scooter
• Excite host name may have atex or excite.com and agent name is Architextspider.
• Inktomi agent and host names have inktomi.com and Slurp is often used as the agent name.
• Lycos uses lycos.com within its host name and Lycos Spider is often part of the agent name.
WinHost Web Hosting