TCP Flow Collision Over Multiple LoadBalancers

TCP Flow Collision Over Multiple LoadBalancers
What happens
In typical high-traffic secenarios, server clusters ( e.g, game servers or chat platforms) must handle billions of concurrent TCP connections. To achieve this, they require deployment of distributed infrastructure componets including Kubernetes pods and virtual machines. For traffic aggregatin, a gateway layer - typically implemented as multiple load balancers - is essential to establish an effecient framework.
T’o understand load balancers(LBs), refer to resources like Wikipedia. Two common LB implementations are reverse proxy and DNAT(Destionation Network Address Translation). The TCP flow collision problem occurs specifically in DNAT-based load balancers.
Consider this simulation of two flow:
In this scenario:
- Flow1 and Flow2 originate from the same client (
IP1:PORT1) - Both target different gateway endpoints (
GWIP1:GWPORT1vsGWIP2:GWPORT2) - The client’s TCP stack reuses the source port since destinations differ
- With high connection volumes, port reuse probability increases significantly
- Both gateways perform DNAT, rewriting destination to
VMIP1:VMPORT1 - When hashed to the same backend VM, the flows share identical 5-tuples:
(SrcIP=IP1, SrcPort=PORT1, DstIP=VMIP1, DstPort=VMPORT1, Protocol=TCP) - This 5-tuple collision causes flow misidentification at the server.
When Collisions Occur
TCP flow collisions result from specific coincidences in 4-tuple matching. Their impact varies by deployment scenario:
| Client Type | Collision Risk | Recommendation |
|---|---|---|
| Direct user clients (Low connection volume per IP) | Negligible (< 0.001%) | Acceptable risk Implement standard TCP retry mechanisms |
| Gateway-concentrated traffic (CDN/anti-DDoS egress points) | High (> 1% possible) | Requires mitigation Especially when few gateway IPs funnel massive traffic |
TCP Flow Collision Probability Formula
**The results arefrom DeepSeek-R1. To simplify the results, we consider that all flows come from the same client. Multiple flows are evenly distributed by multiple clients, resulting in the same outcome. **
The result has not been proven by probability theory. To verify the accuracy of the formula, I simulate the situation by a simple golang program. The results are close and for reference only.
Where:
- N = Total concurrent TCP flows
- R = Number of real servers
- S = Source port range size (typically 64,512)
- L = Number of load balancers (each with unique VIP)
- C = Number of distinct client IPs
Key Assumptions:
- Each LB has a unique Virtual IP (VIP)
- Flows are evenly distributed across LBs ($\frac{N}{L}$ flows per LB)
- Real servers see original client IP/port after DNAT
- Collisions occur when two flows to the same real server have identical:
(client IP, client port, server IP, server port)
Formula Derivation:
-
Per-LB flow distribution:
Probability a flow goes to LB $i$: $P(LB_i) = \frac{1}{L}$ -
Collision condition:
Two flows collide when:- Same client IP/port (probability $\frac{1}{C \cdot S}$)
- Different LBs (probability $\frac{L-1}{L}$)
- Same real server (probability $\frac{1}{R}$)
-
Simplified conservative model:
$$ P_{\text{collision}} \approx \left(1 - \left(1 - \frac{1}{R \cdot S}\right)^{N \cdot \frac{L-1}{L}}\right) \times 100% $$
Simulation Result
|
|
Mitigation Strategies
Isolate Upstream Gateways
- Mechanism: Assign dedicated real servers per gateway
- Effect: Eliminates cross-gateway 4-tuple collisions
- Deployment: Requires gateway-aware load balancing
Increase Client IP Diversity
From the formula last section, the percentage is relatively by total concurrent TCP flows.
- Increase the client IPs, use a larger IP pool
- Enable IPv6 to expand client IP space
Optimize Load Balancers Count
Increasing the number of load balancers may raise the collision rate. Conversely, fewer load balancers reduce collision occurrences.
Switch Load Balancers to Fullnat mode
Migrating load balancers to FullNat mode will eliminate flow collisions fundamentally, though it demands higher performance and resource capabilities from LB services.