Handling rate limits / delays in consumers without affecting performance of other operations

Question

I have a producer that generates a batch job that consists of multiple operations (approx. 100 - 10000). These operations can be processed in any order, ideally as fast as possible.

The processing of the operations involves making several API calls to a 3rd party service which is pretty slow and has quite aggressive rate limits. The limits are bounded to a particular API key that is provided with the job and shared across the operations from the same batch. The same API key might be used in multiple different batch jobs.

Once the rate limit is hit, all the operations using the same API key need to wait for approx. 1 hour.

I'm trying to wrap my head around this and find an elegant solution that would fulfill the following criteria:

Multiple operations can be executed in parallel, but they should not exceed MAX_PARALLEL_OPERATIONS constant, otherwise the 3rd party API will start failing with 500 errors
Rate limiting an API key should not prevent operations using other API keys from being executed
The number of different API keys can be dynamic

I'm struggling designing an elegant solution for this. I've considered using a single SQS queue and putting each operation in a separate message. However after receiving first rate limit error I would have to mark the API key as rate limited, skip all the messages that use the same API key and persist them somewhere else (queue / db / scheduler). This would increase the costs on the queue.

The second thing I've considered was putting multiple operations in a single message, but I don't like the idea much. It limits the ability to execute the operations by multiple machines and it is not even viable, since I would hit the max size of item limit on SQS.

The best solution I could think of would involve putting the operations in a DB instead of a queue. Each row would represent each operation and contain a timestamp when it can be processed. Hitting a rate limit would involve doing an update on all the items that were not processed, belongs to the same API key and increasing their timestamp.

Then there would have to be some polling mechanism that would check for unprocessed item with due timestamp.

Is there any better approach for this? I have a feeling that I could be overengineering / reinventing the wheel.

score 7 · Accepted Answer · edited Nov 22 '24 at 08:29

I assume there is no penalty for a request that is rejected other than the failure. For example, you’d hope that rejected requests don’t count towards rate limiting.

Create one queue for each key. Then execute requests and remove them from their queue unless you get an error. At the time of the error figure out at which point you can make further requests successfully and store that time with the queue.

So if you have 10 users with ten keys making lots of requests, and you have rate limiting after 100 requests for a key, you can perform 1,000 requests per hour.

If you are free to use these keys for different users, you could use any keys that haven't been rate-limited, but the server you are using might not be happy with that.

On MacOS/iOS, you wouldn’t do any polling but determine the earliest time for further requests and set a timer triggering at that time, at zero runtime cost.

score 1 · Answer 2 · answered Aug 16 '24 at 11:01

I think you can improve on the queue per apikey answer. The key thing seems to be that the apikey you have to use for a message is determined before the message is sent.

The limits are bounded to a particular API key that is provided with the job and shared across the operations from the same batch

Given that you know the rate limit and your rate of generating messages you should take your batch split decision earlier on. ie, when creating a batch don't make it larger than ratelimit * estimated time to process the batch.

This should mean you can guarantee not hitting the rate limit. As each batch gets a new apikey with enough room to process the number of requests.

score 1 · Answer 3 · answered Aug 16 '24 at 14:55

Distinguish between throttling errors versus other errors. For other errors, they would go on a dead letter queue.

For throttling errors, one could set the visibility timeout to a value larger time for the throttle to be removed. If throttling is lifted after an hour or so, set the visibility timeout would be longer than that. This would effectively gate messages on the consumer side until the throttle is removed.

Bear in mind, there are usually limits on "in flight" messages, so if you have a lot of volume, this approach may not be feasible as quotas could be reached on the number of in-flight messages if many keys have been throttled. I am not sure how many messages could be waiting for the throttle to be removed, so take that into account.

Processing could be smart in that the first thing that is checked is if the API key is throttled. If so, then do not proceed any further and simply leave the message to be processed again after the visibility timeout has been reached.

This works if each message uses the same API key and there isn't a mix of APIs keys in the message. It could work under 1 queue or 1 queue for each API key.

Handling rate limits / delays in consumers without affecting performance of other operations

3 Answers3