*** ANSWER 1 of 2 ***
Very belated answer. But since I spent most of today studying this problem I thought I'd show what worked for me. The short of it is using map and restarting the service instead of simply signaling a reload.
Here's what I have near the top of my sites-available/default file:
# User agent strings get pretty long
map_hash_bucket_size 256;
Detect the basic agent type (bot or not)
map $http_user_agent $agent_type {
# Not a bot
default 0;
# All of these below are bots that should be rate-limited
"~*(Amazonbot|ClaudeBot|DataForSeoBot|GPTBot|SemrushBot)" 1;
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0 Safari/537.36" 1;
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36" 1;
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36 Edg/114.0.1823.43" 1;
}
Define our rate-limiting zones for use later
map $agent_type $default_zone_key {
default '';
0 $binary_remote_addr;
}
limit_req_zone $default_zone_key zone=default_zone:20m rate=5r/s;
map $agent_type $bot_zone_key {
default '';
1 $binary_remote_addr;
}
limit_req_zone $bot_zone_key zone=bot_zone:10m rate=6r/m;
Note that in addition to some bots that announce themselves (eg ClaudeBot and GPTBot) I added a few "exactly this string" cases for some botnets that don't announce themselves. I may be accidentally punishing some real human users too. But I'm just happy to have something that works for now.
Then in the server section I've got this:
server {
server_name somedomain.com;
...
limit_req zone=default_zone burst=50 nodelay;
limit_req zone=bot_zone burst=10 nodelay;
limit_req_status 429; # Too many requests
...
# Want to see whether you're seen as a bot? Uncomment the following:
#add_header X-Routing-Agent-Type "$agent_type";
}
If you're not understanding what's going on then I'll explain. As a relative novice with NGINX I can relate. limit_rate_zone allows you to give a key. If it's blank then that rule is effectively ignored. In our case we're feeding it an IP address. More specifically we have two rules. One rule matches all the IP addresses associated with bots (by user agent). The other rule matches all the IP addresses that aren't. Those map $agent_type <key> blocks populate those two IP-address-or-blank keys.
It's worth noting that you could add more $agent_type numbers if you wanted a finer gradation than simply "bot" and "everything else". You'd just need to add more of the various definition blocks and lines.
I was so frustrated today because so much of what I was trying out was not working. Only near the end did it dawn on me that NGINX was not loading my configuration as expected. I was using nginx -s reload. Everything started working when I instead used service nginx restart.