Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've caught Huawei and Tencent IPs scraping the same image over and over again, with different query parameters. Sure, the image was only 260KiB and I don't use Amazon or GCP or Azure so it didn't cost me anything, but it still spammed my logs and caused a constant drain on my servers' resources.

The bots keep coming back too, ignoring HTTP status codes, permanent redirects, and what else I can think of to tell them to fuck off. Robots.txt obviously doesn't help. Filtering traffic from data centers didn't help either, because soon after I did that, residential IPs started doing the same thing. I don't know if this is a Chinese ISP abusing their IP ranges or if China just has a massive botnet problem, but either way the traditional ways to get rid of these bots hasn't helped.

In the end, I'm now blocking all of China and Singapore. That stops the endless flow of bullshit requests for now, though I see some familiar user agents appearing in other east Asian countries as well.



So make sure the image is only available at one canonical URL with proper caching headers? No, obviously the only solution is to install crapware that worsens the experience for regular users.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: