Amazon's AI crawler is making my Git server unstable

3 points by subset 10 hours ago

Nginx has an assortment of options to rate limit people/bots. [1] For the 10% not using a user-agent generic rate limits could be applied based on volume limit_rate_after. One would have to get creative with nginx maps. Another option would be to calculate patterns in your access logs and just blackhole or ipset reject IP addresses or networks that are abusive accepting some of them may be abusive humans vs abusive bots.

[1] - https://serverfault.com/questions/639671/nginx-how-to-limit-...

osdotsystem 10 hours ago

I feel for you. Same for OpenAI (https://www.linkedin.com/posts/appinv_openai-is-a-felon-comp...). A friend told me Ai bots even ignored the robots.tx ...

Arnt 3 hours ago

I've heard that too. Your posting inspired me to check. I see daily accesses from claudebot and gtpbot to /robots.txt on two sites. Also from something called amazonbot, which may be another AI bot?

fragmede 9 hours ago

> What else do I need to do?

I mean, not sending 418 and sending 429 or 403 or anything more useful would be on my list of things to try. Might double check what 418 is while you're trying to figure out why sending that doesn't seem to do anything.

xena 4 hours ago

Changing the status code didn't help. I eventually just withdrew the server from the internet.