Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm guessing the answer's 'no' because you didn't elaborate to begin with, but on the off-chance - can you share any more about what the problem/project was?


Sure, it has been a while. I worked at Amazon in Brand Protection, specifically on Trademark and Logo infringement.

There were many different strategies at play (eg neural nets) but the false positives can be very expensive. But the one I described above is an expert system of manual overrides that made the final decisions.

Amazon don’t have much structured content, eg “title”, “description” and bad actors are constantly trying to obfuscate using language based grammar tricks or N1ke type attacks.

Also just the law is very nuanced. You can say “a case compatible with iPhone, that is ..” but it is infringement to say “iPhone case that is ..”

Also there are many nuances around licensing. For example, “BurntSushi T-shirts” as a company cannot have the Disney logo on it, however, a “Lego tshirt” might have the Disney logo legally on it. So a lot of overrides.

Even if you compress all the nuances into a single regex, which you can’t: there are 10 MM estimated brands worldwide (Amazon at the time hosted 1 MM) with on average 10 trademarks each that is 10 MM regexes.

Then you also need to multiple the 25 countries Amazon operates in for legal nuances, and multiple languages (eg even in the US Amazon store someone might use Spanish or Italian to sell their fake product). Additionally there is more fanout for reasons I’ve long forgotten.

Point of story 10 MM was a conservative upper bound on the RegexSet and it had to be fast and cheap. I built it on top of Lucene with some hacks here and there but it did (does) the job - fast and cheap :)

I hope now generative AI can help them more, but I’m not holding my breath.


> Even if you compress all the nuances into a single regex, which you can’t

You cant, standalone hw isnt that capable, distributed computing/cluster computing is.

> I hope now generative AI can help them more, but I’m not holding my breath.

I'm not from what I have seen.

>Also just the law is very nuanced. You can say “a case compatible with iPhone, that is ..” but it is infringement to say “iPhone case that is ..”

And Amazon would be up against users of Lexis Nexis who have a head start in all of this RegEx malarkey, by virtue of what they sell, without giving too much away about their code inner workings.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: