Performance¶
GoCC is designed for high throughput while maintaining simplicity.
Benchmarks¶
Measured on Apple M4 Pro laptop:
Internal (No HTTP)¶
| Metric | Value |
|---|---|
| Throughput | ~6 million req/s |
| Latency | ~150 ns/request |
This represents the raw actor model performance without network overhead.
With HTTP¶
| Protocol | Endpoint | Throughput |
|---|---|---|
| HTTP/2 | /rate/:key |
~350-450k req/s |
| HTTP/2 | /healthz |
~500k req/s |
| HTTP/1.1 | /rate/:key |
~80k req/s |
| HTTP/1.1 | /healthz |
~85k req/s |
Bottleneck Analysis¶
Component Time per Request
─────────────────────────────────────
Rate limiter logic: ~150 ns
HTTP/2 overhead: ~2000 ns
HTTP/1.1 overhead: ~12000 ns
Network (typical): ~1-10 ms
The HTTP layer is the bottleneck, not the rate limiting logic.
Why HTTP/2 is Faster¶
| Feature | HTTP/1.1 | HTTP/2 |
|---|---|---|
| Connections | One request per connection (or keep-alive) | Multiplexed streams |
| Headers | Text, repeated | Binary, compressed (HPACK) |
| Pipelining | Not reliable | Native multiplexing |
Use HTTP/2 clients for best performance:
- Go: http.Client with HTTP/2 transport
- CLI: h2load for benchmarking
Theoretical Limits¶
Network Bandwidth¶
Assuming 100 Mbit/s per pod: - Request size: ~300-500 bytes - Max theoretical: ~25-33k req/s
In practice, local/loopback is much faster.
Actor Model¶
With pipelining/batching (not implemented): - Theoretical: ~300 million req/s - See snail for batched networking
GoCC uses strict request→response without batching for simplicity.
Optimization Decisions¶
What We Optimized¶
- HTTP/2 by default - 4-5x faster than HTTP/1.1
- Sharded managers - Parallel processing across 25 shards
- Direct response path - Responses bypass manager
- Lazy instance creation - No overhead for unused keys
- Auto-expiration - Memory reclaimed for idle keys
What We Didn't Optimize¶
- No batching - Simplicity over maximum throughput
- No custom protocol - HTTP compatibility
- No connection pooling - Clients handle this
- No pre-allocation - Go's GC handles it well
Scaling Strategies¶
Vertical Scaling¶
More CPU cores = more parallelism: - Manager shards run in parallel - Each instance is independent - HTTP server handles concurrent requests
Horizontal Scaling¶
Multiple GoCC instances: - Deploy as Kubernetes StatefulSet - Consistent hashing distributes keys - No coordination overhead - See Kubernetes Deployment
Comparison with Alternatives¶
vs. In-Process Rate Limiting¶
// In-process (e.g., golang.org/x/time/rate)
limiter := rate.NewLimiter(100, 10)
if !limiter.Allow() {
return errors.New("rate limited")
}
| Aspect | In-Process | GoCC |
|---|---|---|
| Latency | ~10 ns | ~2 μs (HTTP/2) |
| Deployment | Per-service | Centralized |
| Configuration | Code change | Hot-reload |
| Cross-service | No | Yes |
vs. Redis-Based¶
| Aspect | Redis | GoCC |
|---|---|---|
| Persistence | Yes | No (in-memory) |
| Latency | ~1 ms | ~2 μs (local) |
| Queueing | Limited | Native FIFO |
| Dependencies | Redis server | None |
vs. API Gateway Rate Limiting¶
| Aspect | API Gateway | GoCC |
|---|---|---|
| Latency | Varies | Low |
| Configuration | Gateway-specific | Universal |
| Queueing | Usually no | Yes |
| Protocol | HTTP only | HTTP (extensible) |
Benchmarking Tools¶
HTTP/1.1¶
HTTP/2¶
Internal¶
Performance Tips¶
For Clients¶
- Use HTTP/2 - 4-5x faster than HTTP/1.1
- Reuse connections - Avoid connection setup overhead
- Set timeouts - Don't wait forever for queued requests
- Batch where possible - Combine related operations
For Operators¶
- Deploy close to clients - Minimize network latency
- Size instances appropriately - More CPU = more throughput
- Monitor queue depth - High queues indicate underprovisioning
- Use appropriate window sizes - Smaller windows = faster feedback