Why Rate Limiting Matters in Istio and How to Implement It

In this post, we’ll explore why rate limiting is important in Istio and show you how to set it up effectively.

Josephine Eskaline Joyce

CORE ·

Mohamed Rafiq

Apr. 08, 25 · Tutorial

Likes (4)

Comment

Save

3.9K Views

In today's microservices-driven world, managing traffic smartly is just as crucial as deploying the services themselves. As your system grows, so do the risks — like overuse, misuse, and cascading failures. And if you're running multi-tenant services, it's essential to enforce request limits for each customer. That’s where rate limiting in a service mesh like Istio can make a big difference. In this post, we’ll explore why rate limiting is important in Istio and show you how to set it up effectively.

Why Rate Limiting Matters in Istio

Why Was It Important for Us?

This is in continuation of the incident that we faced, which is detailed in How I Made My Liberty Microservices Load-Resilient. One of the findings during the incident was the missing rate limiting in the Istio ingress private gateway. Here are the challenges that we faced:

Challenge 1: Backend Overload During High-Traffic Events: Our services — especially user-facing APIs — would get hit hard during high traffic. Without any rate controls in place, back-end services and databases would spike, leading to degraded performance or even downtime.
Challenge 2: Need for Tenant-Based Throttling: We support multiple customers (tenants). We needed a way to enforce quotas, ensuring one tenant’s overuse wouldn’t impact others.

Architecture

Here is a simplified version of the application architecture, highlighting the key components involved in the Istio rate limiting.

Why Rate Limiting Matters in Istio

Rate limiting is a mechanism to control how many requests a service or user can make over a given time interval. In the context of an Istio service mesh, it serves several important purposes:

Protecting Backend Services: Without rate limiting, a sudden spike in requests — whether from legitimate users or abusive clients — can overload back-end services and databases. Rate limiting helps prevent service degradation or outages.
Fair Usage and Quotas: Ensures fair access in a multi-tenant system among our tenants by enforcing per-user limits.
Cost Control: In cloud-native environments, uncontrolled requests can lead to excessive resource usage and unexpected costs. Rate limiting helps contain those costs.
Security and Abuse Prevention: Blocking excessive requests from abusive users can mitigate denial-of-service (DoS) attacks or brute-force attempts.
Improved Reliability: By limiting how fast clients can make requests, services stay more responsive and predictable, even under heavy load.

With implementing rate limiting in Istio, there are two places where it is applied. One for the public endpoint customers is done using Cloudflare, and another for private endpoint customers is using Istio.

Local Rate Limiting vs Global Rate Limiting

When implementing rate limiting in an Istio mesh, it’s important to understand the difference between local and global rate limiting, as they serve different use cases and come with distinct trade-offs.

Local rate limiting is enforced individually on each Envoy sidecar proxy. The rate limit is applied at the instance level, meaning each pod or replica gets its own independent limit. Global rate limiting is centrally managed using an external RLS. Envoy sidecars query this central service to decide whether to allow or deny a request, ensuring cluster-wide enforcement.

Note: Local rate limiting in Kubernetes is enforced per pod. This means the token or request count value you configure applies to each individual pod independently.

For example, if you have 5 pods and want to enforce a global limit of 100 requests per 60 seconds, you should configure the local rate limit token value to 20 per pod (i.e., 100 / 5 = 20). This calculation assumes that the load balancer distributes traffic evenly across all pods.

Feature	Local Rate Limiting	Global Rate Limiting
Enforced At	Individual Envoy proxy	Central Rate Limit Service
Scope	Per pod/instance	Across all instances
Infrastructure Required	None	Redis + Rate Limit Service
Granularity	Basic (fixed per pod)	Fine-grained (per-user, route)
Use Case	Simple protection	Multi-tenant, API Gateway
Scalability Awareness	No	Yes

Let's explore how to configure Local Rate Limiting in a Kubernetes environment.

Steps to Set Up Local Rate Limiting in Istio

Step 1: Create the EnvoyFilter for Local Rate Limiting

Descriptors are keys used to match requests and apply limits. Example config.yamlfor Envoy:

    YAML
   
 

   apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: ingress-gateway-ratelimit-envoyfilter
  namespace: <namespace>
spec:
  configPatches:
    - applyTo: HTTP_FILTER
      match:
        context: GATEWAY
        listener:
          filterChain:
            filter:
              name: envoy.filters.network.http_connection_manager
      patch:
        operation: INSERT_FIRST
        value:
          name: envoy.filters.http.local_ratelimit
          typed_config:
            '@type': type.googleapis.com/udpa.type.v1.TypedStruct
            type_url: >-
              type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
            value:
              stat_prefix: http_local_rate_limiter
    - applyTo: VIRTUAL_HOST
      match:
        context: GATEWAY
        routeConfiguration:
          gateway: <namespace>/<gateway-name>
          portName: https-custom-regional
          portNumber: 443
      patch:
        operation: MERGE
        value:
          rate_limits:
            - actions:
                - request_headers:
                    descriptor_key: method
                    header_name: ":method"
          typed_per_filter_config:
            envoy.filters.http.local_ratelimit:
              '@type': type.googleapis.com/udpa.type.v1.TypedStruct
              type_url: >-
                type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
              value:
                always_consume_default_token_bucket: false
                descriptors:
                  - entries:
                      - key: method
                        value: "GET"
                    token_bucket:
                      fill_interval: 60s
                      max_tokens: 10
                      tokens_per_fill: 10
                  - entries:
                      - key: method
                        value: "POST"
                    token_bucket:
                      fill_interval: 60s
                      max_tokens: 2
                      tokens_per_fill: 2
                  - entries:
                      - key: method
                        value: "PATCH"
                    token_bucket:
                      fill_interval: 60s
                      max_tokens: 4
                      tokens_per_fill: 4
                filter_enabled:
                  default_value:
                    denominator: HUNDRED
                    numerator: 100
                  runtime_key: local_rate_limit_enabled
                filter_enforced:
                  default_value:
                    denominator: HUNDRED
                    numerator: 100
                  runtime_key: local_rate_limit_enforced
                response_headers_to_add:
                  - append: false
                    header:
                      key: x-local-rate-limit
                      value: 'true'
                  - append: false
                    header:
                      key: retry-after
                      value: '60'
                stat_prefix: http_local_rate_limiter
                token_bucket:
                  fill_interval: 60s
                  max_tokens: 40
                  tokens_per_fill: 40
  workloadSelector:
    labels:
      <gateway-label>: <gateway-label>
  

Provide the Gateway name and namespace in route_configuration section, as we are applying the rate limiting for Gateway pods. WorkLoadSelector will have the pods labels where you want to apply rate limiting.

The token bucket algorithm is used to define a bucket that holds a limited number of "tokens" representing allowed requests. The bucket is continuously refilled with tokens at a specific rate (tokens per second or minute). When a request arrives, it consumes a token from the bucket. If the bucket is empty, subsequent requests are rejected (rate-limited) until the bucket refills.

Imagine a bucket with a capacity of 10 tokens, refilled with 1 token every second. If 10 requests arrive within 1 second, they will all be allowed, but the 11th request will be rejected until the bucket refills.

Istio supports both local and global rate limiting. Local rate limiting applies to individual service instances, while global rate limiting applies across all instances of a service. In local rate limiting, each Envoy proxy (representing a service instance) has its own token bucket, and the token bucket quota is not shared among the replicas. In global rate limiting, the token bucket quota is shared among all Envoy proxies, meaning that the service will only accept a certain number of requests in total, regardless of the number of replicas.

In the above configuration, token_bucket is defined in the (HTTP_ROUTE) patch, which includes a typed_per_filter_config for the envoy.filters.http.local_ratelimit local envoy filter for routes to virtual host inbound for https at port 443.

max_tokens specifies the maximum number of tokens in the bucket, representing the maximum allowed requests within a certain time frame. tokens_per_fill indicates the number of tokens added to the bucket with each fill, essentially the rate at which tokens are replenished. fill_interval defines the time interval at which the bucket is replenished with tokens.

In the example above, we configured rate limits for specific HTTP methods like GET, POST, and PATCH. You can add or remove methods in this section based on your requirements. The token values assigned under each method define the rate limit for requests using that method. If a request uses a method that isn't explicitly listed, the default token value (configured below stat_prefix) will apply instead.

Step 1.1: Define Rate Limit per Instance (Optional)

If you need tenant-based rate limiting, enable it based on the instance ID (or any other value as applicable) of the tenant instead of the above configuration mentioned in Step 1. Below is the updated configuration to rate limit based on the instance ID. The instance ID, in our case, is part of the request URL.

So, the below configuration has two noticeable updates:

Rate-limiting descriptors are added with entries to the instance ID value as a key-value pair. These are the instance IDs, which generally have a heavy load, and these are identified based on the history of requests for rate limits.
Source code addition: function envoy_on_request is added to retrieve the instance ID from the request URL and to match the instance IDs identified in the above one.

    YAML
   
 

   apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: ingress-gateway-ratelimit-envoyfilter
  namespace: <namespace>
spec:
  configPatches:
    - applyTo: HTTP_FILTER
      match:
        context: GATEWAY
        listener:
          filterChain:
            filter:
              name: envoy.filters.network.http_connection_manager
      patch:
        operation: INSERT_BEFORE
        value:
          name: envoy.filters.http.local_ratelimit
          typed_config:
            '@type': type.googleapis.com/udpa.type.v1.TypedStruct
            type_url: >-
              type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
            value:
              stat_prefix: http_local_rate_limiter
    - applyTo: VIRTUAL_HOST
      match:
        context: GATEWAY
        routeConfiguration:
          gateway: <namespace>/<gateway-name>
          portName: https-custom-regional
          portNumber: 443
      patch:
        operation: MERGE
        value:
          rate_limits:
            - actions:
                - request_headers:
                    descriptor_key: rate-limit
                    header_name: x-instance-id
          typed_per_filter_config:
            envoy.filters.http.local_ratelimit:
              '@type': type.googleapis.com/udpa.type.v1.TypedStruct
              type_url: >-
                type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
              value:
                always_consume_default_token_bucket: false
                descriptors:
                  - entries:
                      - key: rate-limit
                        value: ```"<instance-id>"```
                    token_bucket:
                      fill_interval: 60s
                      max_tokens: 2
                      tokens_per_fill: 2
                  - entries:
                      - key: rate-limit
                        value: ```"<instance-id-2>"```
                    token_bucket:
                      fill_interval: 60s
                      max_tokens: 5
                      tokens_per_fill: 5
                filter_enabled:
                  default_value:
                    denominator: HUNDRED
                    numerator: 100
                  runtime_key: local_rate_limit_enabled
                filter_enforced:
                  default_value:
                    denominator: HUNDRED
                    numerator: 100
                  runtime_key: local_rate_limit_enforced
                response_headers_to_add:
                  - append: false
                    header:
                      key: x-local-rate-limit
                      value: 'true'
                  - append: false
                    header:
                      key: retry-after
                      value: '60'
                stat_prefix: http_local_rate_limiter
                token_bucket:
                  fill_interval: 60s
                  max_tokens: 40
                  tokens_per_fill: 40
    - applyTo: HTTP_FILTER
      match:
        context: GATEWAY
        listener:
          filterChain:
            filter:
              name: envoy.filters.network.http_connection_manager
      patch:
        operation: INSERT_BEFORE
        value:
          name: envoy.filters.http.lua
          typed_config:
            '@type': type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua
            default_source_code:
              inline_string: |
                function envoy_on_request(request_handle)
                  local path = request_handle:headers():get(":path") or "nil"
                  request_handle:logInfo("Request received: " .. path)
                  local instance_id = path:match("/instances/([^/]+)/")
                  local rate_limited_instance = ```"<instance-id>"```
                  local rate_limited_instance_2 = ```"<instance-id-2>"```
                  local method = request_handle:headers():get(":method")
                  if instance_id then
                      request_handle:logInfo("Extracted Instance ID: " .. instance_id)
                  else
                      request_handle:logInfo("No Instance ID found in path")
                  end
                  if instance_id == rate_limited_instance or instance_id == rate_limited_instance_2 then
                      request_handle:logInfo("method: "..request_handle:headers():get(":method"))
                      if method == "GET" then
                          request_handle:logInfo("Applying rate limiting for instance_id: " .. instance_id)
                          request_handle:headers():add("x-instance-id", tostring(instance_id))
                          request_handle:logInfo("instance_header: "..request_handle:headers():get("x-instance-id"))
                      else
                          request_handle:logInfo("Not get call for instance_id: " .. tostring(instance_id))
                      end    
                  else
                      request_handle:logInfo("No rate limiting for instance_id: " .. tostring(instance_id))
                  end
                end
  workloadSelector:
    labels:
      <gaterway-label>: <gaterway-label>
  

Note: Since we used a custom Envoyfilter, we need not deploy Redis, which is otherwise used for metadata like counters and TTLs.

Step 2: Test the Configuration

Send requests to your service and observe if the rate limits are triggered. You can check the logs from the pods where you have applied rate limiting.

Step 3: Monitor the Requests That Are Rate Limited

Today, we monitor the rate limits from logs. But in the future, we would enable Prometheus/Grafana dashboards to observe the rate-limited traffic.

Conclusion

Rate limiting in Istio is not just a performance enhancement — it’s a strategic traffic control mechanism that protects your services, ensures fairness, and supports system scalability. Whether you're securing an API gateway or managing internal service communication, rate limiting should be part of your Istio playbook.

rate limit Kubernetes Cloud microservices

Opinions expressed by DZone contributors are their own.

Related

Trending

Why Rate Limiting Matters in Istio and How to Implement It

In this post, we’ll explore why rate limiting is important in Istio and show you how to set it up effectively.

Why Rate Limiting Matters in Istio

Why Was It Important for Us?

Architecture

Why Rate Limiting Matters in Istio

Local Rate Limiting vs Global Rate Limiting

Steps to Set Up Local Rate Limiting in Istio

Step 1: Create the EnvoyFilter for Local Rate Limiting

Step 1.1: Define Rate Limit per Instance (Optional)

Step 2: Test the Configuration

Step 3: Monitor the Requests That Are Rate Limited

Conclusion

Related

Partner Resources