So you replace the hard to read CAPTCHAs with easy to read ads. Of which presuma...

mseebach · on Sept 20, 2010

A longtime standing solution to hard-to-read captchas are easy to read (but hard to process) captchas. E.g. show a picture of an animal or a shape and ask what it is, or even just ask a simple math problem or riddle in writing.

Problem with those are that they need to be constantly updated from a reliable source, or else, once the solution becomes popular, the spammer can bruteforce it in linear time (no matter how high N is, there are only N possible patterns).

This seems to be an attempt at fixing this. Ads are often relatively short lived so by the time the spammer has them bruteforced, it might be out of circulation - and more importantly, there are new ads in. It's an armsrace, and this is a way to pay our troops. Also, it's trivial for advertisers to make many different variations (e.g. one for each sales bulletpoint), so there are many variations in circulation. Since there's often more textual content than the password in the ad, they're not prone to simple OCR, while still easy to comprehend for the user.

Obvious shortcomings are if ads are not so shortlived, and if it's easy to identify and break classes of ads (e.g. if it's yellow and has the IE logo in position X, OCR area Y, done). Also, it's a bit of a dealbreaker if I'm forced to open and visit a website to get the password.

shalmanese · on Sept 20, 2010

This was the process used by the Microsoft Research project ASSIRA (http://research.microsoft.com/en-us/um/redmond/projects/asir...) which used images of kittens and puppies put up for adoption as a constant fresh source of input data.

This was later, awesomely riffed on by HotCaptcha (http://valleywag.gawker.com/246656/a-face-only-a-bot-could-l...) which pulled HotOrNot data and asked you to select the 3 hot women out of 9. Sadly, the site is down now but I remember trying it and it was remarkably useful and a hell of a lot more fun than word captchas.

pedalpete · on Sept 20, 2010

Your argument is somewhat flawed from a technical level as Olegk mentioned, but also from a business marketing stand-point.

Specific ads may be in circulation for a short time, or there may be many running concurrently, but it is the message that the advertiser is trying to get across, the tagline, and it would be of most benefit to the advertiser to get the user to associate their brand with a single concept. Using the example in the article, Subaru may want to be associated with "outback", not "four wheel drive" or "comfy" or "sporty". Businesses who try to target too many things or too wide an audience end up not getting their message across.

I'm not saying that what Solve Media is trying to do isn't a great idea. I think it has lots of potential, but they clearly still have more to work on.

olegk · on Sept 20, 2010

You have no idea what you're talking about.

> show a picture of an animal or a shape and ask what it is,

Ok, so let's say you come up with pictures of 20 different animals. A spam script that picks the same answer every time will have a 5% success rate.

> or even just ask a simple math problem or riddle in writing.

Computers are way better at solving most math problems than people.

raganwald · on Sept 20, 2010

You have no idea what you're talking about.

Please be careful. It could be that his thoughts are excellent but his communication is flawed. It could also be that he has a great deal of general expertise in the subject but is mistaken in this specific statement.

Thus, a general statement about him could be false is also aggressively ad hominem. You might want to consider focusing on the statement itself rather than the speaker, such as:

"Your suggestion is entirely wrong."

JM2C of course, and it is possible that I don't know what I'm talking about. I am not a psychologist or a logician.

mseebach · on Sept 20, 2010

Please read my post. Specifically the second paragraph where exactly this issue is addressed.

olegk · on Sept 20, 2010

Still doesn't make sense.

If you display 10 random animals (or shapes), and ask a user to pick the right one, a dumb spam script would have a 10% success rate.

If you display one image of an animal and give 10 possible answers, a dumb spam script would have a 10% success rate.

mseebach · on Sept 20, 2010

So if I have a library of 10s of thousands of images of animals, shapes, things etc. that are all easily recognisable to English speakers -- e.g. cat, dog, house, drum, road, tree, book, horse -- and ask them to write in what is it, AND I constantly update that library and retire pictures that's been used many time -- what is your dumb script's success rate?

What if I combine three pictures in each challenge - e.g. "cat house triangle"?

olegk · on Sept 20, 2010

My spam script would always answer "cat", so among 8 options (cat, dog, house, drum, road, tree, book, horse), I'd get a 12.5% success rate.

Plus you constantly have to update your image library, which a huge pain.

Also, recognizing 10,000 images will take me around one day and less than $1000 with Amazon turk, thus giving me a perfect 100% success rate. After that you would have to completely renew your image database.

mseebach · on Sept 20, 2010

sigh .. etc. There would be more than 8 options, many more.

YES, it would be a pain to update the library, which is why I'm commending this particular concept for solving that problem ...

stavros · on Sept 20, 2010

You aren't getting it. The probability is 1/<number of options you present to the user>. If you show the user 100 images and ask them to select one, a bot will have a 1% probability to find the right one, but the user will tell you to get lost.

If you present 10 images (still a stretch), bots will have 10% success rate just answering randomly.

EDIT: Wait, from what I see you mean that the user will have to write "cat" or "dog" or whatever? That's better, yes. Communication, however, is hard, which is why me the GP didn't understand what you meant.

Ilovepee · on Sept 21, 2010

not to mention the fact that the bot will get spotted for entering the same phrase more then a few times, get put on a list and get served the squiggly crap

olegk · on Sept 21, 2010

You have no idea how spam works. A bot isn't just one user trying to enter "cat" repeatedly. Botnets send requests from thousands of different IPs. You wouldn't know which ones are real users, and which ones are bots.

il · on Sept 20, 2010

You know, most modern capchas are not solved by bots, but by people in third world countries solving them for pennies. The current going rate is about $1/1000 and there are easy to use captcha solving APIs for any platform. capcha has long provided an illusion of security, nothing more. Any and all captchas will be broken.

jfager · on Sept 20, 2010

Before adding captchas to my stupid personal blog, Akismet was catching ~300 attempted spams a day, and I was manually flagging ~2 a day.

Then I installed reCaptcha, and now Akismet catches ~10 a month, and I've had to manually flag zero.

I'll take that illusion, thanks.

there · on Sept 20, 2010

ok, but how many comments are you losing because people don't want to fill out a captcha?

i've used defensio.com for filtering comments on my site with no captcha and rarely ever get a false positive or negative. false negatives are easy to spot, and users can manually override false positives by supplying an email address to get a confirmation link (which gets fed back to defensio as a false positive once clicked).

jfager · on Sept 20, 2010

It's a tradeoff, sure. In my case, I'm perfectly happy to trade a few comments from people who don't want to deal with a captcha for never having to manually deal with spam. YMMV.

mike-cardwell · on Sept 20, 2010

There are better ways of reducing comment spam to near zero without resorting to captchas. Here's how I did it:

https://secure.grepular.com/Blocking_Comment_Spam_Using_ModS...

It's still working now.

mseebach · on Sept 20, 2010

That's great, pretty much any home-rolled CAPTCHA will perform great, simply because it's not worthwhile for spammers to design an attack. If that, however, was the default anti-spam mechanism on, say, wordpress installs, it'd be automated against in minutes.

mike-cardwell · on Sept 20, 2010

Definitely. If more people rolled their own, the spammers would be screwed.

il · on Sept 20, 2010

That's one datapoint. The counterpoint is the thousands of forum owners who mistakenly relied on captcha to prevent spam at the expense of other antispam measures, like GeoIP.

jfager · on Sept 20, 2010

I'm not saying captchas are a magic one-step solution to end all spam (hence the mention of Akismet). I'm just disputing the idea that they are "nothing more" than illusory security. Sure, they can be turked, and sure, if someone relies only on captchas they'll probably eventually get screwed. But they do provide some benefit as part of a complete approach against spam, in that they raise the cost of a spam attempt up from zero. Unless you have meaningful levels of traffic, the vast majority of spammers aren't going to bother with you if it costs anything at all to hit you.

jchonphoenix · on Sept 20, 2010

I know the founders of reCaptcha. This is easier to prevent than you think. Its also quite rare.

il · on Sept 20, 2010

Heh, I know people who are working on breaking reCaptcha. Maybe they should get together and learn from each other.

If you're saying that it's possible to prevent manual, outsourced captcha solving, I would have to strongly disagree. It's similar to the futility of DRM as an antipiracy measure. As long as I can see the captcha, I can pay someone in another country to solve it for me.

I will agree that it's rare- but the most sophisticated and high-volume spammers do it, and they're the ones you have to worry about.

In case people start giving me sideways glances I want to make it clear that I haven't done any blackhat stuff in a very long time, but I'm still interested in the blackhat community from a security researcher's perspective.

jchonphoenix · on Sept 20, 2010

I'm not saying that its possible to do it in an automated way. I'm just saying that tools exist to create red flags that would make thing easily verified by a human. They have ways in which they deal with these things. All I'm saying is so far, its been working for them.

il · on Sept 20, 2010

You're probably not going to tell me more about these tools, although I would be very curious to learn how they work.

All I have is anecdotal evidence- I know several people who are making their living spamming Craigslist and outsourcing their ReCaptcha solving. Since they can make $XX per post, paying pennies for captcha is a tiny expense. I don't condone this behavior, but be aware- it happens more than you think.

Anyway, probably best to take the discussion to email if you want more...perspective from the other side.

almost · on Sept 20, 2010

£1/1000 is still more expensive then if they could be done with human input. And they need to be done for every time a captcha is needed, with this scheme you could presumably pay for humans to solve the adds currently in rotation (ten? hundreds? thousands? not more than that) then just remember the results.

TorKlingberg · on Sept 20, 2010

I don't think having people solve capchas for money is all that common. Do you have a source or something? Even if it is common, capchas still stop those spamming strategies that rely on each spam being free for the spammer.

il · on Sept 20, 2010

I don't know how prevalent this practice is exactly, but I have it on good authority that sites like decaptcher.com are making a mint selling this service.

Ilovepee · on Sept 21, 2010

yeh but you gota look at it from a business standpoint who profits from the outsourcing? why would a website owner pay to outsource the ads to get solved in india just to get blacklisted then lose revenue that would have come in anyway for less

aristidb · on Sept 20, 2010

Indeed. Without some form of randomisation (which advertisers probably do not want), this would drastically lower the price of solving a captcha. Because you need to submit each ad to a human solver network only once, and can store the solution.