Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The proxy is altering the JavaScript, css, and HTML to get rid of various countermeasures.

Maybe one version dynamically pulls in crucial js with subresource integrity, with obfuscated text that isn't as simple to strip out. Another version where the main obfuscated js bombs if the subresource integrity has been stripped from the HTML, etc.

The approach I described means they need to solve for X different variations of different types of countermeasures. Think things like obfuscated JS that checks location.host, or the Bitcoin address. And responds in different ways. Buried in the main js. But each variation of the site using slightly different countermeasures, div ids, ways to encode the text of the bitcoin address, etc.

Like copy protection. Yes, you can break it. But time and effort are expended.

The article already suggests some laziness from the scraper. He only substituted the hostname if it was lower case. The general idea here is lots of countermeasures, and ones that change with every request. Wearing them down.

>You can get alerted to it by checking your web server access logs

Maybe. Hard on tor, there may be nothing unique about the requests.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: