Building a High-Performance Push Notifications Service in Go
Building a push notifications service that handles millions of events reliably requires careful architectural decisions. In this post, I'll walk through how I built a notifications backend in Go that processes events via gRPC and delivers push notifications through AWS SNS.
Architecture Overview
The service follows a simple but effective architecture:
gRPC Events → Event Processor → Deduplication → SNS Publisher → FCM/APNs
↓
Redis
Key components:
- gRPC server for receiving events from upstream services
- Worker pool for concurrent event processing
- Redis-based deduplication to prevent duplicate notifications
- AWS SNS for cross-platform delivery (FCM for Android, APNs for iOS)
Project Structure
I organized the codebase following Go best practices:
├── cmd/
│ ├── server/ # Main application entry point
│ └── tasks/ # CLI for scheduled tasks
├── internal/
│ ├── config/ # Configuration management
│ ├── dedup/ # Deduplication logic
│ ├── grpc/ # gRPC handlers
│ ├── processor/ # Event processing
│ ├── repository/ # Database access
│ ├── sns/ # AWS SNS client
│ └── tasks/ # Scheduled task implementations
├── pkg/
│ └── proto/ # Protocol buffer definitions
└── scripts/
└── benchmark/ # Performance testing tools
Distributed Deduplication with Redis
One of the critical requirements was preventing duplicate notifications. Users receiving the same notification multiple times creates a poor experience.
The challenge: with multiple service instances behind a load balancer, traditional in-memory deduplication doesn't work. The same event could arrive at different instances.
Solution: Atomic Redis operations using Lua scripts.
type Deduplicator struct {
redis redis.UniversalClient
ttl time.Duration
dedupScript *redis.Script
}
func New(redisClient redis.UniversalClient, ttl time.Duration, logger *zap.Logger) *Deduplicator {
// Lua script for atomic check-and-set
// Returns: 0 = duplicate, 1 = new, 2 = correction
script := redis.NewScript(`
local existing = redis.call('GET', KEYS[1])
if existing then
local data = cjson.decode(existing)
if data.h == ARGV[1] then
return 0 -- Duplicate: same hash
else
-- Correction: different hash, update and allow
redis.call('SET', KEYS[1], ARGV[2], 'EX', ARGV[3])
return 2
end
else
-- New: set and allow
redis.call('SET', KEYS[1], ARGV[2], 'EX', ARGV[3])
return 1
end
`)
return &Deduplicator{
redis: redisClient,
ttl: ttl,
dedupScript: script,
}
}
The Lua script runs atomically on Redis, ensuring that even with concurrent requests from multiple instances, only one notification is sent per unique event.
Three deduplication states:
- New (1): First time seeing this event, allow notification
- Duplicate (0): Same event with same content, block
- Correction (2): Same event but content changed (score correction), allow
Worker Pool Pattern
For processing events concurrently while maintaining control over resource usage, I implemented a worker pool:
type EventProcessor struct {
eventChan chan EventJob
workerCount int
wg sync.WaitGroup
// ... other fields
}
func (p *EventProcessor) Start(ctx context.Context) {
p.eventChan = make(chan EventJob, p.workerCount*10)
for i := 0; i < p.workerCount; i++ {
p.wg.Add(1)
go p.worker(ctx, i)
}
}
func (p *EventProcessor) worker(ctx context.Context, id int) {
defer p.wg.Done()
for {
select {
case <-ctx.Done():
return
case job, ok := <-p.eventChan:
if !ok {
return
}
p.processSingleEvent(job.ctx, &job.event)
}
}
}
This pattern provides:
- Bounded concurrency: Control exactly how many goroutines process events
- Backpressure: Buffered channel prevents overwhelming the system
- Graceful shutdown: WaitGroup ensures all in-flight work completes
gRPC Handler with Fire-and-Forget
The gRPC handler receives events and returns immediately, processing asynchronously:
func (h *EventsHandler) NotifyEvents(ctx context.Context, req *pb.EventsRequest) (*emptypb.Empty, error) {
events := make([]models.EventRequest, 0, len(req.Events))
for _, e := range req.Events {
events = append(events, convertProtoToModel(e))
}
// Fire and forget - process in background
go func() {
processCtx := context.Background()
if err := h.processor.ProcessEvents(processCtx, events); err != nil {
h.logger.Error("failed to process events", zap.Error(err))
}
}()
return &emptypb.Empty{}, nil
}
This design choice prioritizes:
- Low latency: Clients don't wait for processing
- Reliability: Processing continues even if client disconnects
- Throughput: Server can accept more requests while processing previous ones
SNS Integration
AWS SNS handles the complexity of delivering to different platforms. The service publishes to SNS topics, which fan out to platform-specific endpoints:
type Publisher struct {
client *sns.Client
logger *zap.Logger
}
func (p *Publisher) Publish(ctx context.Context, topicARN string, message SNSMessage) error {
payload, err := json.Marshal(message)
if err != nil {
return fmt.Errorf("marshal message: %w", err)
}
input := &sns.PublishInput{
TopicArn: aws.String(topicARN),
Message: aws.String(string(payload)),
}
_, err = p.client.Publish(ctx, input)
if err != nil {
return fmt.Errorf("publish to SNS: %w", err)
}
return nil
}
Configuration Management
I used a struct-based configuration with environment variable support:
type Config struct {
GRPCPort int `env:"GRPC_PORT" envDefault:"50052"`
HTTPPort int `env:"HTTP_PORT" envDefault:"8080"`
DatabaseURL string `env:"DATABASE_URL,required"`
RedisURL string `env:"REDIS_URL,required"`
AWSRegion string `env:"AWS_REGION" envDefault:"us-east-1"`
EventWorkers int `env:"EVENT_WORKERS" envDefault:"10"`
DedupTTL time.Duration `env:"DEDUP_TTL" envDefault:"6h"`
}
func Load() (*Config, error) {
cfg := &Config{}
if err := env.Parse(cfg); err != nil {
return nil, fmt.Errorf("parse config: %w", err)
}
return cfg, nil
}
Health Checks
A simple health endpoint for Kubernetes/load balancer probes:
func (s *Server) healthHandler(w http.ResponseWriter, r *http.Request) {
health := map[string]string{"status": "healthy"}
// Check Redis
if err := s.redis.Ping(r.Context()).Err(); err != nil {
health["status"] = "unhealthy"
health["redis"] = err.Error()
}
// Check Database
if err := s.db.Ping(r.Context()); err != nil {
health["status"] = "unhealthy"
health["database"] = err.Error()
}
w.Header().Set("Content-Type", "application/json")
if health["status"] != "healthy" {
w.WriteHeader(http.StatusServiceUnavailable)
}
json.NewEncoder(w).Encode(health)
}
Scheduled Tasks with Cobra
The service includes scheduled tasks for maintenance operations, built with Cobra CLI:
var rootCmd = &cobra.Command{
Use: "tasks",
Short: "Notifications backend scheduled tasks",
}
var pruneCmd = &cobra.Command{
Use: "prune",
Short: "Prune expired data",
}
var pruneMatchTopicsCmd = &cobra.Command{
Use: "match-topics",
Short: "Remove topics for matches that ended over 24 hours ago",
RunE: func(cmd *cobra.Command, args []string) error {
deps, err := initDeps()
if err != nil {
return err
}
defer deps.Close()
runner := tasks.NewRunner(deps.DB, deps.Redis, deps.SNS, deps.Logger)
return runner.PruneMatchTopics(deps.Ctx, dryRun)
},
}
Tasks include:
prune match-topics- Clean up old match topicsprune disabled-devices- Remove inactive devicesgenerate match-topics- Create topics for upcoming matchessubscriptions fix- Repair subscription inconsistencies
Performance Results
Benchmarking on a standard development machine:
╔══════════════════════════════════════════════════════════════╗
║ BENCHMARK RESULTS ║
╠══════════════════════════════════════════════════════════════╣
║ Total events: 5000 ║
║ Concurrency: 10 workers ║
║ Total time: 265ms ║
╠══════════════════════════════════════════════════════════════╣
║ THROUGHPUT ║
╠══════════════════════════════════════════════════════════════╣
║ Events/second: 18,879 ║
╠══════════════════════════════════════════════════════════════╣
║ LATENCY ║
╠══════════════════════════════════════════════════════════════╣
║ Average: 525µs ║
║ P50 (median): 414µs ║
║ P95: 1.16ms ║
║ P99: 2.72ms ║
╠══════════════════════════════════════════════════════════════╣
║ RELIABILITY ║
╠══════════════════════════════════════════════════════════════╣
║ Success rate: 100% ║
╚══════════════════════════════════════════════════════════════╝
Key Takeaways
- Lua scripts in Redis provide atomic operations essential for distributed deduplication
- Worker pools give fine-grained control over concurrency
- Fire-and-forget gRPC handlers maximize throughput for async workloads
- SNS abstracts platform complexity - one publish, delivery to iOS and Android
- Structured configuration makes deployment across environments straightforward
What's Next
Future improvements I'm considering:
- Metrics with Prometheus for better observability
- Rate limiting per device to prevent notification spam
- Message batching for SNS to reduce API calls
- Dead letter queue for failed notifications
The full architecture handles the real-world requirements of a sports notifications app: high throughput during live matches, reliable delivery, and no duplicate notifications. Go's concurrency primitives and the ecosystem of libraries made building this service straightforward.