Search Engines & cookies

Out of context: Reply #6

  • Started
  • Last post
  • 7 Responses
  • BZZZP0

    don't know any links offhand cuz i'd just wing it myself...

    but the basic idea goes like this:

    when a page request is received, code in the page determines whether, to the best of its knowledge, the request comes from a spider or human user (using IPs/hostnames of known spiders and/or their user agent[1])

    [1] example, this is what a log hit looks from Googlebot:
    ---
    crawl-66-249-65-202.googlebot.co... - - [23/Feb/2005:20:57:21 -0500] "GET / HTTP/1.1" 200 19399 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html...
    --
    - the first field is the hgosrtname (if reverse lookups are on - otherwise (or if the host doesn't have a name) it's the IP) the field at the end there is the user agent. you can see it identifies itself as Googlebot...
    so say i find that someone requests a page from ip X with UA Y... i can have a little logic in the page that decides whether it's a legitimate user or a spider:
    if(ereg("Googlebot",$_SERVER['HT... // totally bogus regex there, but whatever
    $fake_cookie=1;
    }
    // theen, in your cookie checking code
    if($cookie||$fake_cookie){ // note we are checking for either a real or fake cookie, you will have to roll your own cookie check
    // show page
    }

    now i have a page showing if the user has a legit cookie OR if they happen to have the word "Googlebot" in their UA...

    then if neither is the case, i would GIVE them a new cookie or create a session or something, store the requested URL, take them to the state selector, then have the state selector processing page redirect them back to the page they asked for...

    see (or post to)
    http://www.webmasterworld.com/ca…
    and
    http://www.webmasterworld.com/ca…
    for much better discussion of this than you will find on NT

View thread