Hacker Newsnew | past | comments | ask | show | jobs | submit | sigmadelta's commentslogin

craigslist didn't change robots.txt:

$ wget -q -O- --save-headers http://www.craigslist.org/robots.txt | fgrep Last-Modified

Last-Modified: Fri, 04 Nov 2011 18:13:24 GMT


http://blog.sfgate.com/techchron/2012/08/10/craigslist-backs...

"One data harvester, 3taps, said earlier this week that Craigslist had blocked search engines such as Google from including Craigslist pages in search results. But that report was inaccurate.

3taps’ product and quality assurance leader, Meg Nakamura, acknowledged Wednesday in a chat with The Chronicle that something fishy was taking place, but developers there haven’t fully figured out what’s going on."


Saying that "newyork.craigslist.org does not have any robots.txt and can be crawled as you like" is false. Search engines follow redirects until valid robots.txt files are found. From that same document you linked:

3xx (redirection)

Redirects will generally be followed until a valid result can be found (or a loop is recognized). We will follow a limited number of redirect hops (RFC 1945 for HTTP/1.0 allows up to 5 hops) and then stop and treat it as a 404. Handling of robots.txt redirects to disallowed URLs is undefined and discouraged. Handling of logical redirects for the robots.txt file based on HTML content that returns 2xx (frames, JavaScript, or meta refresh-type redirects) is undefined and discouraged.


http://www.seroundtable.com/archives/019533.html

That article is from 2009, noting that CL had added the meta tag to posting pages. So I think this is old news.

I think storborg has it right in http://news.ycombinator.com/item?id=4353120 when saying that "3taps seems to be claiming that Craigslist has cut off Google, but I think it's just that Craigslist has cut off 3taps."


https://twitter.com/markmilian/statuses/233015694432813057

Mark Milian ‏@markmilian 7 Aug

Contradicting earlier statement, 3Taps spokeswoman emails to say, "Craigslist is still allowing indexing of pages." Still nothing from CL PR


Just tried again, and I got a bunch of results:

http://www.google.com/search?q=site%3Asfbay.craigslist.org+b...

Of the 10 results on the first page, 7 are within the last hour.

(1) Boat sfbay.craigslist.org/sby/boa/3191071012.html 2 hours ago ... 408-726-8722 i don�t know about the motor or about the boat but if you want to see it call.... that is the reason that i can�t wrote about the ...

SF bay area boats - by owner classifieds - craigslist sfbay.craigslist.org/boa/ - Cached - Similar SF bay area boats - by owner classifieds - craigslist.

(2) 1964 Fabuglas 16ft fishing boat sfbay.craigslist.org/nby/boa/3191162483.html 1 hour ago ... This was my dads fishing boat. It runs good. But could use a little TLC. Its a 1964 16 foot Fabuglas out of Nashville Tennessee. Most of the ...

SF bay area marine services classifieds - craigslist sfbay.craigslist.org/mas/ - Cached Sat Aug 04. Shipwright/Boat Work - (berkeley) ... Boat & Marine Related Service - (hayward / castro valley) ... SF Charter Boat, Book Now - (San Francisco Bay) ...

(3) SIDEWINDER 16' SPEED BOAT trade for services or...??? - Craigslist sfbay.craigslist.org/eby/bar/3191164088.html 1 hour ago ... 1980 BLUE sidewinder motor boat. Seats 4. 35+mph. 70 hp 2 stroke VRO Evinrude motor, no need to premix fuel. Runs strong. Starts right up.

Fishing / Hunting Boat sfbay.craigslist.org/sby/boa/3176240765.html 6 days ago ... 2004 War Eagle Boat, semi v front flat bottom(17ft.), with a 2003 40hp Mercury(4 stroke) motor with 20- 25 hours on it. The boat is on an EZ ...

(4) Sailboat rudder off a 22' boat sfbay.craigslist.org/sby/boa/3191237551.html 14 minutes ago ... 4'10'' tall rudder from a 22' boat. It is in great condition and is a solid Bay rudder. Can be brought up in case of a grounding by pulling on a rope ...

(5) MB Sports V Drive Ski & Wakeboard Boat sfbay.craigslist.org/eby/boa/3187986181.html 1 day ago ... 2002 MB SPORTS 220 V-Drive. This boat is an excellent for both skiing and wakeboards. We have used it and enjoyed it for slalom skiing and ...

(6) * BAYLINER* NICE BOAT, NICE PRICE... BEST OFFER MOVING ... sfbay.craigslist.org/eby/boa/3191163056.html 1 hour ago ... VERY CLEAN BOAT New wheel bearings on trailer. 3.0 Mercruiser 135 H.P. Great on gas. 40-45 mph top speed. Just registered! Garaged for 7 ...

(7) wanted boat polisher sfbay.craigslist.org/eby/boa/3191196564.html 50 minutes ago ... wanted boat polisher (pittsburg / antioch) ... I am looking for someone to polish and wax my 28 ft boat. topside only dont have to do the hull.


http://www.sfgate.com/technology/businessinsider/article/Cra...

Not sure I agree with most the conclusions drawn in that article.

The article does say that "sure enough, Google displays recent listings from Craigslist right now," which does seem to be true for me, too, when I try.



Did these people do any fact checking? I just ran another test:

visit http://sfbay.craigslist.org/bia/

spot ad for a kids Giant MTX 250 bike http://sfbay.craigslist.org/nby/bik/3189980073.html

perform a google search for the same thing http://www.google.com/search?q=site%3Acraigslist.org+giant+m...

result number two in the set is that posting, with the following summary:

Giant MTX 250 Kid's Mountain Bike sfbay.craigslist.org/nby/bik/3189980073.html 4 minutes ago ... Giant kid sized hardtail mountain bike for sale. It is a 21 speed, and it has front suspension. The bike was always taken care of, and is in perfect ...

"4 minutes ago" tells me Google is having no problems hitting craigslist.


...Do you do any fact checking?

I just checked the source:

<meta name="robots" content="NOARCHIVE,NOFOLLOW">


Nothing new. That tag has always been in place. It's to prevent using an ad for SEO purposes


Those tags aren't as obvious in interpretation as they first appear. A full write-up is a http://noarchive.net/meta/ for completeness, but in short:

NOARCHIVE: ok to index this page, but don't cache the result permanently

NOFOLLOW: don't follow any links on this page

Note that these two things don't prevent a search engine from visiting a page, indexing it, and providing search results for it.


I just tried the following:

visited http://sfbay.craigslist.org/bia/

saw ad for "Burning Man Beach Cruiser Bicycle" http://sfbay.craigslist.org/eby/bik/3189652975.html

performed the following search on google http://www.google.com/search?q=site%3Acraigslist.org+neon+gr...

first result returned is that ad, with the following snippet: Burning Man Beach Cruiser Bicycle sfbay.craigslist.org/eby/bik/3189652975.html 22 minutes ago ... Neon Green cruiser bicycle, perfect for the playa, or cruising around town. Single speed, great shape. Available nights and weekends.

--

"22 minutes ago" sounds like Google is still actively crawling content from craigslist to me.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: