notes-computer-programming-apis

Rate limiting

" Start by building a Request Rate Limiter ... We use the token bucket algorithm to do rate limiting....We implement our rate limiters using Redis. You can either operate the Redis instance yourself, or, if you use Amazon Web Services, you can use a managed service like ElastiCache?. ...

...

However, when you’re running a production API, not only do you have to make it robust with techniques like idempotency, you also need to build for scale and ensure that one bad actor can’t accidentally or deliberately affect its availability. ...

Rate limiters and load shedders

A rate limiter is used to control the rate of traffic sent or received on the network. ...A load shedder makes its decisions based on the whole state of the system rather than the user who is making the request. Load shedders help you deal with emergencies, since they keep the core part of your business working while the rest is on fire.

...

At Stripe, we operate 4 different types of limiters in production. The first one, the Request Rate Limiter, is by far the most important one. ... Request rate limiter...restricts each user to N requests per second...Our rate limits for requests is constantly triggered. It has rejected millions of requests this month alone, especially for test mode requests where a user inadvertently runs a script that’s gotten out of hand. ...

Concurrent requests limiter

Instead of “You can use our API 1000 times a second”, this rate limiter says “You can only have 20 API requests in progress at the same time”. Some endpoints are much more resource-intensive than others, and users often get frustrated waiting for the endpoint to return and then retry. These retries add more demand to the already overloaded resource, slowing things down even more. The concurrent rate limiter helps address this nicely....Our concurrent request limiter is triggered much less often (12,000 requests this month), and helps us keep control of our CPU-intensive API endpoints. Before we started using a concurrent requests limiter, we regularly dealt with resource contention on our most expensive endpoints caused by users making too many requests at one time. ... Fleet usage load shedder

Using this type of load shedder ensures that a certain percentage of your fleet will always be available for your most important API requests. We divide up our traffic into two types: critical API methods (e.g. creating charges) and non-critical methods (e.g. listing charges.) We have a Redis cluster that counts how many requests we currently have of each type. We always reserve a fraction of our infrastructure for critical requests. If our reservation number is 20%, then any non-critical request over their 80% allocation would be rejected with status code 503. We triggered this load shedder for a very small fraction of requests this month....we definitely had the ability to handle those extra requests. But we’ve had other months where this has prevented outages. ... Worker utilization load shedder

Most API services use a set of workers to independently respond to incoming requests in a parallel fashion. This load shedder is the final line of defense. If your workers start getting backed up with requests, then this will shed lower-priority traffic.

This one gets triggered very rarely, only during major incidents...Only 100 requests were rejected this month from this rate limiter, but in the past it’s done a lot to help us recover more quickly when we have had load problems.... We divide our traffic into 4 categories: Critical methods, POSTs, GETs, Test mode traffic "

-- https://stripe.com/blog/rate-limiters

That blog post also links to this Gist with Redis Lua code for rate limiters:

from the discussion at [1]:

" dansingerman: ... Does this mean every rate limited request is hitting redis?

I had this problem years ago, and I used varnish to offload the rate limited traffic, which scaled very well then. Here is a blog post I wrote about it: http://blog.dansingerman.com/post/4604532761/how-to-block-rate-limited-traffic-with-varnish ... The rate limiting was done by the application server, but was completely separate from the varnish layer. (I think it was something fairly naive to do with memcached, but that's not terribly important). What was important was that the application server could serve the 400* with an expiry, and varnish would deal with that load until the expiry.

...

b-ryan 1 day ago [-]

I'm curious what your thoughts are on implementing this via a reverse proxy in front of your services. In terms of implementation it appears to me much simpler. But it obviously comes at the cost of maintaining the proxy if you are, say, already using ELBs.

reply

ptarjan 22 hours ago [-]

This can be done either way, and is really infrastructure dependent. There are pros and cons on both sides. We opted to bake it into the web stack instead of in front so that we get all the benefits of existing infrastructure like log aggregation, exception tracing, HTTP error response formatting and tracking, user-specific gating, auto service scaling, etc.

reply

...

bigdubs 23 hours ago [-]

is token bucket the best algorithm for this? why not ring buffered timestamps?

ex: https://github.com/blendlabs/go-util/blob/master/collections/rate_limiter.go

reply

ptarjan 22 hours ago [-]

Both should work. Token buckets only have to store two things (count and timestamp) where a timestamp set has to store all N timestamps. I like the simper approach since I don't actually need access to all the timestamps.

If you look at the concurrent request limiter I do indeed keep all the timestamps there in a Redis set. That one was more error prone to write in practice as I often would accidentally hit Redis storage limits.

reply

...

 mritun 21 hours ago [-]

There is no replenishing process, its just a conceptual thing. In code one just computes time difference and then multiplies that with fill rate to determine tokens to fill before deducting any. Token buckets are popular because they work well, require fixed size storage and are extremely simple to implement and test.

reply

Animats 21 hours ago [-]

I've put fair queuing into an web API. Requests queue by IP address; if you have a request in the mill, any further requests are processed behind those from other IP addresses. This handles single-source overloads very well. It doesn't require tuning; there are no fixed time constants or rate limits. For about a month, someone was pounding on the system with thousands of requests per minute. This caused no problems to other users. I didn't notice for several days.

One would think that this would be a standard feature in web servers.

reply

sinzone 23 hours ago [-]

Another way to add a fast, distributed rate-limiting on top of your APIs is by using an API Gateway [1] like the open-source Kong [2]. This pattern is becoming a standard since it saves time from re-implementing and duplicating the same rate limiting logic in each backend microservice.

[1] http://microservices.io/patterns/apigateway.html

[2] https://getkong.org/plugins/rate-limiting/

reply

LeonidBugaev? 22 hours ago [-]

Yeah, worth mentioning https://tyk.io too.

Also, do not forget about Quotas which usually comes along with Rate limits. Modern API gateways can handle so many stuff for you and help with API scaling.

reply

susi22 13 hours ago [-]

FYI: Redis now has a module that does rate limiting for you: https://github.com/brandur/redis-cell

reply

jdwyah 23 hours ago [-]

Can't help but put a link to https://www.ratelim.it in here. A rock solid distributed rate-limit was something I desperately missed when moving to a smaller company.

Primarily I use this for idempotency and throttling things like usage events. But you can also use it for locking and concurrency control.

reply

coolg54321 10 hours ago [-]

Very interesting, we have throttling for our customer facing API built with PHP and found this library very useful: https://github.com/sunspikes/php-ratelimiter#throttler-types

reply

"

Rate limiting links

API links