Profiling Memory Allocations in a High-Throughput Go Service
Our API was handling 50k requests per second, but p99 latency kept spiking to 200ms. The culprit wasn't slow code—it was the garbage collector pausing everything while it cleaned up millions of tiny allocations we didn't know we were making.
Here's how we found them and what we did about it.
The Symptoms
Classic GC pressure symptoms:
- Latency spikes every few seconds
- CPU usage higher than expected
- Memory usage stable but GC running constantly
# Check GC stats
GODEBUG=gctrace=1 ./myservice
# Output shows frequent GCs:
# gc 1 @0.012s 2%: 0.018+2.3+0.018 ms clock, 0.14+0.23/4.5/0+0.14 ms cpu, 4->4->2 MB, 5 MB goal, 8 P
# gc 2 @0.025s 3%: 0.019+3.1+0.021 ms clock, 0.15+0.31/6.1/0+0.17 ms cpu, 4->5->3 MB, 5 MB goal, 8 P
# gc 3 @0.041s 4%: ...
GC running every 15ms means every request has a chance of hitting a pause.
Finding Allocations with pprof
Heap Profile
import _ "net/http/pprof"
func main() {
go func() {
// Exposes /debug/pprof/*
log.Println(http.ListenAndServe("localhost:6060", nil))
}()
// ... rest of your service
}
Grab a heap profile:
# Allocations since program start
go tool pprof http://localhost:6060/debug/pprof/heap
# Or save for later analysis
curl -o heap.prof http://localhost:6060/debug/pprof/heap
go tool pprof heap.prof
Inside pprof:
(pprof) top 20
Showing nodes accounting for 1.5GB, 89% of 1.7GB total
flat flat% sum% cum cum%
512MB 30.12% 30.12% 512MB 30.12% encoding/json.(*decodeState).literalStore
256MB 15.06% 45.18% 768MB 45.18% myservice/handlers.(*Handler).ProcessRequest
128MB 7.53% 52.71% 128MB 7.53% fmt.Sprintf
The Key Insight: alloc_objects vs inuse_objects
# Total allocations (even if freed) - shows allocation rate
go tool pprof -alloc_objects http://localhost:6060/debug/pprof/heap
# Currently in use - shows memory retention
go tool pprof -inuse_objects http://localhost:6060/debug/pprof/heap
For GC pressure, alloc_objects matters more. You might have low memory usage but high allocation rate, causing constant GC work.
Common Hidden Allocations
1. String Concatenation
// BAD: Each + allocates a new string
func buildKey(prefix, id, suffix string) string {
return prefix + ":" + id + ":" + suffix
}
// GOOD: strings.Builder pre-allocates
func buildKey(prefix, id, suffix string) string {
var b strings.Builder
b.Grow(len(prefix) + len(id) + len(suffix) + 2)
b.WriteString(prefix)
b.WriteByte(':')
b.WriteString(id)
b.WriteByte(':')
b.WriteString(suffix)
return b.String()
}
// BETTER for simple cases: fmt with buffer pool
var keyBufferPool = sync.Pool{
New: func() any {
return new(strings.Builder)
},
}
func buildKey(prefix, id, suffix string) string {
b := keyBufferPool.Get().(*strings.Builder)
b.Reset()
defer keyBufferPool.Put(b)
b.Grow(len(prefix) + len(id) + len(suffix) + 2)
b.WriteString(prefix)
b.WriteByte(':')
b.WriteString(id)
b.WriteByte(':')
b.WriteString(suffix)
return b.String()
}
2. Slice Appends Without Capacity
// BAD: Multiple reallocations as slice grows
func collectIDs(items []Item) []string {
var ids []string
for _, item := range items {
ids = append(ids, item.ID)
}
return ids
}
// GOOD: Pre-allocate
func collectIDs(items []Item) []string {
ids := make([]string, 0, len(items))
for _, item := range items {
ids = append(ids, item.ID)
}
return ids
}
3. Interface Boxing
// BAD: Each call boxes the int
func logValue(key string, value any) {
log.Printf("%s: %v", key, value)
}
func process(count int) {
logValue("count", count) // int -> any allocation
}
// GOOD: Type-specific methods
func logInt(key string, value int) {
log.Printf("%s: %d", key, value)
}
4. Closures Capturing Variables
// Both patterns are correct in Go 1.22+, but parameter passing
// can help escape analysis in some cases
// Closure capture (correct, but item may escape to heap)
func processAll(items []Item) {
var wg sync.WaitGroup
for _, item := range items {
wg.Add(1)
go func() {
defer wg.Done()
process(item)
}()
}
wg.Wait()
}
// Parameter passing (may stay on stack in some cases)
func processAll(items []Item) {
var wg sync.WaitGroup
for _, item := range items {
wg.Add(1)
go func(it Item) {
defer wg.Done()
process(it)
}(item)
}
wg.Wait()
}
5. fmt.Sprintf for Simple Conversions
// BAD: fmt.Sprintf allocates
id := fmt.Sprintf("%d", userID)
// GOOD: strconv doesn't (for small ints)
id := strconv.Itoa(userID)
// For int64:
id := strconv.FormatInt(userID, 10)
Escape Analysis: Why Things Allocate
Go decides at compile time whether a variable escapes to the heap. Check with:
go build -gcflags='-m -m' ./... 2>&1 | grep escape
Common reasons for escape:
// Escapes: returned pointer to local variable
func newUser() *User {
u := User{Name: "test"} // escapes to heap
return &u
}
// Escapes: assigned to interface
func process(u User) {
var i any = u // u escapes
}
// Escapes: captured by closure in goroutine
func startWorker(data []byte) {
go func() {
process(data) // data escapes
}()
}
// Escapes: too large for stack (varies by Go version)
func bigArray() {
data := make([]byte, 10*1024*1024) // escapes, too big
}
sync.Pool: Recycling Allocations
For frequently allocated objects, sync.Pool eliminates allocations:
var bufferPool = sync.Pool{
New: func() any {
return make([]byte, 0, 4096)
},
}
func processRequest(data []byte) []byte {
buf := bufferPool.Get().([]byte)
buf = buf[:0] // Reset length, keep capacity
defer bufferPool.Put(buf)
// Use buf...
buf = append(buf, data...)
// Important: return a copy if buf escapes this function
result := make([]byte, len(buf))
copy(result, buf)
return result
}
Pool Gotchas
// WRONG: Putting different sizes back
var pool = sync.Pool{New: func() any { return make([]byte, 1024) }}
func process(size int) {
buf := pool.Get().([]byte)
if size > len(buf) {
buf = make([]byte, size) // Created larger buffer
}
defer pool.Put(buf) // Now pool has mixed sizes
}
// RIGHT: Either use fixed sizes or cap the pool
func process(size int) {
buf := pool.Get().([]byte)
if size > cap(buf) {
// Don't put oversized buffers back
buf = make([]byte, size)
defer func() { /* don't return to pool */ }()
} else {
buf = buf[:size]
defer pool.Put(buf[:0])
}
}
Real Example: JSON Encoding
Our biggest allocation source was JSON encoding in HTTP handlers:
// BEFORE: ~5 allocations per request
func (h *Handler) GetUser(w http.ResponseWriter, r *http.Request) {
user := h.db.GetUser(r.Context(), userID)
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(user) // Allocates encoder + buffer
}
After profiling:
// AFTER: Pooled encoders
var encoderPool = sync.Pool{
New: func() any {
return &pooledEncoder{
buf: bytes.NewBuffer(make([]byte, 0, 4096)),
}
},
}
type pooledEncoder struct {
buf *bytes.Buffer
}
func (h *Handler) GetUser(w http.ResponseWriter, r *http.Request) {
user := h.db.GetUser(r.Context(), userID)
enc := encoderPool.Get().(*pooledEncoder)
enc.buf.Reset()
defer encoderPool.Put(enc)
if err := json.NewEncoder(enc.buf).Encode(user); err != nil {
http.Error(w, err.Error(), 500)
return
}
w.Header().Set("Content-Type", "application/json")
w.Header().Set("Content-Length", strconv.Itoa(enc.buf.Len()))
w.Write(enc.buf.Bytes())
}
Benchmarking Allocations
Always benchmark before optimizing:
func BenchmarkBuildKey(b *testing.B) {
b.ReportAllocs() // Shows allocations per op
for i := 0; i < b.N; i++ {
_ = buildKey("user", "12345", "profile")
}
}
Output:
BenchmarkBuildKey-8 5000000 234 ns/op 64 B/op 2 allocs/op
After optimization:
BenchmarkBuildKey-8 10000000 112 ns/op 32 B/op 1 allocs/op
The Results
After applying these patterns:
| Metric | Before | After | | --------------- | ------ | ----- | | Allocations/req | ~45 | ~12 | | GC pause p99 | 50ms | 2ms | | Latency p99 | 200ms | 35ms | | GC frequency | 15ms | 200ms |
Key Takeaways
-
Profile first. Don't guess where allocations happen. Use
pprof -alloc_objects. -
alloc_objects > inuse_objects for GC pressure. High allocation rate matters even if memory is freed quickly.
-
Escape analysis tells you why things allocate. Use
-gcflags='-m'to understand. -
sync.Pool is your friend for hot paths. But measure—it has overhead too.
-
Pre-allocate slices when you know the size.
make([]T, 0, n)is your friend. -
Avoid interface boxing in hot paths. Type-specific functions allocate less.
-
String operations are expensive. Use
strings.Builderor[]byteoperations. -
Benchmark with
b.ReportAllocs(). Allocations per operation tells you if you're improving.
Most services don't need this level of optimization. But when you're handling tens of thousands of requests per second, every allocation counts. Profile first, optimize what matters, and always measure the results.