Search Engines & cookies

  • Started
  • Last post
  • 7 Responses
  • IRNlun6

    I'm researching now but hoped to see if anyone has delt with this in the past.

    I'm having crawling issues because of compulsory cookies (i.e. the website won't work because the specific cookie isn't set.). Due to SEC requirements on the site a client must select a state before entering the site. The cookie was placed so that if they returned to the site they could bypass the state selection page.

    Now the problem is that spiders will not search further because of the cookie. We tried javascript with the same spider halting results. The sites are not being ranked high on search engines even though they recieve several thousand hits a week and link to several sites.

    If anyone could provide any ideas or links that deal with this I will be your friend for life.

    ..smart ass comments are acceptable and encouraged as well...

    thanks.

  • myobie0

    Put all the text you can on the beginning page. There is no way to fix the search engine's problems.

    I suggest placing some text that might be hidden with CSS or something similar...

  • BZZZP0

    put code in the site to work without cookies for certain user agents or remote addresses... then when a user hit comes through to a url for which requires a state selection, redirect to state selection then redirect back to the page in question.

  • BZZZP0

    oh, and by code, not js, but whatever you got on the server side....

  • IRNlun60

    *bump

    thanks all...

    I'm not following you BZZZP but I'll research your suggestion. Would you have any links about this?
    Thanks again for the help...

  • heavyt0

    it doesnt seem that you ar gettign any bots to crawl from the homepage, so i assume that any direct links in must redirect to set that cookie first, eh?
    if so, why not check for bots and redirect them in? is that possible using a 301 or something?
    TR1

  • BZZZP0

    don't know any links offhand cuz i'd just wing it myself...

    but the basic idea goes like this:

    when a page request is received, code in the page determines whether, to the best of its knowledge, the request comes from a spider or human user (using IPs/hostnames of known spiders and/or their user agent[1])

    [1] example, this is what a log hit looks from Googlebot:
    ---
    crawl-66-249-65-202.googlebot.co... - - [23/Feb/2005:20:57:21 -0500] "GET / HTTP/1.1" 200 19399 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html...
    --
    - the first field is the hgosrtname (if reverse lookups are on - otherwise (or if the host doesn't have a name) it's the IP) the field at the end there is the user agent. you can see it identifies itself as Googlebot...
    so say i find that someone requests a page from ip X with UA Y... i can have a little logic in the page that decides whether it's a legitimate user or a spider:
    if(ereg("Googlebot",$_SERVER['HT... // totally bogus regex there, but whatever
    $fake_cookie=1;
    }
    // theen, in your cookie checking code
    if($cookie||$fake_cookie){ // note we are checking for either a real or fake cookie, you will have to roll your own cookie check
    // show page
    }

    now i have a page showing if the user has a legit cookie OR if they happen to have the word "Googlebot" in their UA...

    then if neither is the case, i would GIVE them a new cookie or create a session or something, store the requested URL, take them to the state selector, then have the state selector processing page redirect them back to the page they asked for...

    see (or post to)
    http://www.webmasterworld.com/ca…
    and
    http://www.webmasterworld.com/ca…
    for much better discussion of this than you will find on NT

  • IRNlun60

    Thanks again BZZZP. This is really helpful.

    I'm testing that out now...