10

Is there a way I can make Nginx to notify me if hits from a referrer goes beyond a threshold?

e.g If my website is featured at Slashdot and all of sudden I have 2K hits coming in an hour I want to be notified when goes beyond 1K hits an hour.

Will it be possible to do this in Nginx? Possibly without lua? (since my prod is not lua compiled)

Quintin Par
  • 4,493

4 Answers4

13

I think this would be far better done with logtail and grep. Even if it's possible to do with lua inline, you don't want that overhead for every request and you especially don't want it when you have been Slashdotted.

Here's a 5-second version. Stick it in a script and put some more readable text around it and you're golden.

5 * * * * logtail -f /var/log/nginx/access_log -o /tmp/nginx-logtail.offset | grep -c "http://[^ ]slashdot.org"

Of course, that completely ignores reddit.com and facebook.com and all of the million other sites that could send you lots of traffic. Not to mention 100 different sites sending you 20 visitors each. You should probably just have a plain old traffic threshold that causes an email to be sent to you, regardless of referrer.

Ladadadada
  • 27,207
4

The nginx limit_req_zone directive can base its zones on any variable, including $http_referrer.

http {
    limit_req_zone  $http_referrer  zone=one:10m   rate=1r/s;

    ...

    server {

        ...

        location /search/ {
            limit_req   zone=one  burst=5;
        }

You will also want to do something to limit the amount of state required on the web server though, as the referrer headers can be quite long and varied and you may see an infinte variet. You can use the nginx split_clients feature to set a variable for all requests that is based on the hash of the referrer header. The example below uses only 10 buckes, but you could do it with 1000 just as easily. So if you got slashdotted, people whose referrer happened to hash into the same bucket as the slashdot URL would get blocked too, but you could limit that to 0.1% of visitors by using 1000 buckets in split_clients.

It would look something like this (totally untested, but directionally correct):

http {

split_clients $http_referrer $refhash {
               10%               x01;
               10%               x02;
               10%               x03;
               10%               x04;
               10%               x05;
               10%               x06;
               10%               x07;
               10%               x08;
               10%               x09;
               *                 x10;
               }

limit_req_zone  $refhash  zone=one:10m   rate=1r/s;

...

server {

    ...

    location /search/ {
        limit_req   zone=one  burst=5;
    }
rmalayter
  • 3,832
3

The most efficient solution might be to write a daemon that would tail -f the access.log, and keep track of the $http_referer field.

However, a quick and dirty solution would be to add an extra access_log file, to log only the $http_referer variable with a custom log_format, and to automatically rotate the log every X minutes.

  • This can be accomplished with the help of standard logrotate scripts, which might need to do graceful restarts of nginx in order to have the files reopened (e.g., the standard procedure, take a look at /a/15183322 on SO for a simple time-based script)…

  • Or, by using variables within access_log, possibly by getting the minute specification out of $time_iso8601 with the help of the map or an if directive (depending on where you'd like to put your access_log).

So, with the above, you may have 6 log files, each covering a period of 10 minutes, http_referer.Txx{0,1,2,3,4,5}x.log, e.g., by getting the first digit of the minute to differentiate each file.

Now, all you have to do is have a simple shell script that could run every 10 minutes, cat all of the above files together, pipe it to sort, pipe it to uniq -c, to sort -rn, to head -16, and you have a list of the 16 most common Referer variations — free to decide if any combinations of numbers and fields exceeds your criteria, and perform a notification.

Subsequently, after a single successful notification, you could remove all of these 6 files, and, in subsequent runs, not issue any notification UNLESS all six of the files are present (and/or a certain other number as you see fit).

cnst
  • 14,646
2

Yes, of course it is possible in NGINX!

What you could do is implement the following DFA:

  1. Implement rate limiting, based on $http_referer, possibly using some regex through a map to normalise the values. When the limit is exceeded, an internal error page is raised, which you can catch through an error_page handler as per a related question, going to a new internal location as an internal redirect (not visible to the client).

  2. In the above location for exceeded limits, you perform an alert request, letting external logic perform the notification; this request is subsequently cached, ensuring you will only get 1 unique request per a given time window.

  3. Catch the HTTP Status code of the prior request (by returning a status code ≥ 300 and using proxy_intercept_errors on, or, alternatively, use the not-built-by-default auth_request or add_after_body to make a "free" subrequest), and complete the original request as if the prior step wasn't involved. Note that we need to enable recursive error_page handling for this to work.

Here's my PoC and an MVP, also at https://github.com/cnst/StackOverflow.cnst.nginx.conf/blob/master/sf.432636.detecting-slashdot-effect-in-nginx.conf:

limit_req_zone $http_referer zone=slash:10m rate=1r/m;  # XXX: how many req/minute?
server {
    listen 2636;
    location / {
        limit_req zone=slash nodelay;
        #limit_req_status 429;  #nginx 1.3.15
        #error_page 429 = @dot;
        error_page 503 = @dot;
        proxy_pass http://localhost:2635;
        # an outright `return 200` has a higher precedence over the limit
    }
    recursive_error_pages on;
    location @dot {
        proxy_pass http://127.0.0.1:2637/?ref=$http_referer;
        # if you don't have `resolver`, no URI modification is allowed:
        #proxy_pass http://localhost:2637;
        proxy_intercept_errors on;
        error_page 429 = @slash;
    }
    location @slash {
        # XXX: placeholder for your content:
        return 200 "$uri: we're too fast!\n";
    }
}
server {
    listen 2635;
    # XXX: placeholder for your content:
    return 200 "$uri: going steady\n";
}
proxy_cache_path /tmp/nginx/slashdotted inactive=1h
        max_size=64m keys_zone=slashdotted:10m;
server {
    # we need to flip the 200 status into the one >=300, so that
    # we can then catch it through proxy_intercept_errors above
    listen 2637;
    error_page 429 @/.;
    return 429;
    location @/. {
        proxy_cache slashdotted;
        proxy_cache_valid 200 60s;  # XXX: how often to get notifications?
        proxy_pass http://localhost:2638;
    }
}
server {
    # IRL this would be an actual script, or
    # a proxy_pass redirect to an HTTP to SMS or SMTP gateway
    listen 2638;
    return 200 authorities_alerted\n;
}

Note that this works as expected:

% sh -c 'rm /tmp/slashdotted.nginx/*; mkdir /tmp/slashdotted.nginx; nginx -s reload; for i in 1 2 3; do curl -H "Referer: test" localhost:2636; sleep 2; done; tail /var/log/nginx/access.log'
/: going steady
/: we're too fast!
/: we're too fast!

127.0.0.1 - - [26/Aug/2017:02:05:49 +0200] "GET / HTTP/1.1" 200 16 "test" "curl/7.26.0"
127.0.0.1 - - [26/Aug/2017:02:05:49 +0200] "GET / HTTP/1.0" 200 16 "test" "curl/7.26.0"

127.0.0.1 - - [26/Aug/2017:02:05:51 +0200] "GET / HTTP/1.1" 200 19 "test" "curl/7.26.0"
127.0.0.1 - - [26/Aug/2017:02:05:51 +0200] "GET /?ref=test HTTP/1.0" 200 20 "test" "curl/7.26.0"
127.0.0.1 - - [26/Aug/2017:02:05:51 +0200] "GET /?ref=test HTTP/1.0" 429 20 "test" "curl/7.26.0"

127.0.0.1 - - [26/Aug/2017:02:05:53 +0200] "GET / HTTP/1.1" 200 19 "test" "curl/7.26.0"
127.0.0.1 - - [26/Aug/2017:02:05:53 +0200] "GET /?ref=test HTTP/1.0" 429 20 "test" "curl/7.26.0"
%

You can see that the first request results in one front-end and one backend hit, as expected (I had to add a dummy backend to the location that has limit_req, because a return 200 would take precedence over the limits, a real backend isn't necessary for the rest of the handling).

The second request is above the limit, so, we send the alert (getting 200), and cache it, returning 429 (this is necessary due to the aforementioned limitation that requests below 300 cannot be caught), which is subsequently caught by the front-end, which is free now free to do whatever it wants.

The third request is still exceeding the limit, but we've already sent the alert, so, no new alert gets sent.

Done! Don't forget to fork it on GitHub!

cnst
  • 14,646