Search Engines & cookies

Out of context: Reply #6

Started
Last post
7 Responses

BZZZP0
don't know any links offhand cuz i'd just wing it myself...
but the basic idea goes like this:
when a page request is received, code in the page determines whether, to the best of its knowledge, the request comes from a spider or human user (using IPs/hostnames of known spiders and/or their user agent[1])
[1] example, this is what a log hit looks from Googlebot:
---
crawl-66-249-65-202.googlebot.co... - - [23/Feb/2005:20:57:21 -0500] "GET / HTTP/1.1" 200 19399 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html...
--
- the first field is the hgosrtname (if reverse lookups are on - otherwise (or if the host doesn't have a name) it's the IP) the field at the end there is the user agent. you can see it identifies itself as Googlebot...
so say i find that someone requests a page from ip X with UA Y... i can have a little logic in the page that decides whether it's a legitimate user or a spider:
if(ereg("Googlebot",$_SERVER['HT... // totally bogus regex there, but whatever
$fake_cookie=1;
}
// theen, in your cookie checking code
if($cookie||$fake_cookie){ // note we are checking for either a real or fake cookie, you will have to roll your own cookie check
// show page
}
now i have a page showing if the user has a legit cookie OR if they happen to have the word "Googlebot" in their UA...
then if neither is the case, i would GIVE them a new cookie or create a session or something, store the requested URL, take them to the state selector, then have the state selector processing page redirect them back to the page they asked for...
see (or post to)
http://www.webmasterworld.com/ca…
and
http://www.webmasterworld.com/ca…
for much better discussion of this than you will find on NT
BZZZP 0Permalink
Upvote Downvote
Flag
Show [[ numHiddenNotes ]] more notes Add Note
Save Cancel

Search Engines & cookies

Out of context: Reply #6

View thread