15 Minutes to 3: Million-Row Excel Exports in Node.js
Cut export time from ~15 minutes to ~3 using queues and workers.
Generating Excel files sounds trivial, until your dataset hits a million rows.
At my current role, I worked on a system where users frequently exported large datasets. The implementation worked, but performance was a major issue: ~15 minutes per export under real-world load.
At that scale, it’s not just slow, it’s risky:
- Long-running requests -> timeouts
- Memory spikes -> unstable servers
- Poor UX -> users lose trust
This wasn’t something query optimization could fix.
It required rethinking the execution model.
The Original Approach (And Why It Failed)
The export pipeline was entirely request-driven:
- Fetch data
- Transform rows
- Generate Excel
- Write file
- Upload and return URL
What went wrong
-
CPU-heavy work blocked the event loop
XLSX generation and transformations ran on the main thread. -
High memory usage
Large datasets stayed in-process, leading to GC pressure and instability. -
Fragile execution model
A single request handled everything, any failure meant restarting the entire process.
Even after optimizing queries, performance barely improved.
The architecture itself was the bottleneck.
The Shift: Moving Work Out of the Request Cycle
To address this, I worked on redesigning the export pipeline to move heavy processing into background jobs.
The new flow:
- API validates request
- Creates a job payload (filters, metadata, format)
- Enqueues the job using BullMQ
- Returns immediately
This made the API fast and reliable, regardless of dataset size.
Using BullMQ for Orchestration
We used BullMQ to handle:
- Background processing
- Retry strategies with backoff
- Concurrency control
- Progress tracking
This allowed failed jobs to retry safely without affecting user requests.
Parallel Processing with Worker Threads
The biggest bottleneck was CPU-heavy transformation.
To solve this, I implemented parallel processing using Node.js worker threads:
- Split large datasets into batches
- Process batches in parallel across worker threads
- Merge results before writing to the final output
This removed the event loop bottleneck and allowed us to utilize multiple CPU cores effectively.
Stream-Based Excel Generation
Another key improvement was switching to a streaming approach for Excel generation:
- Read data in batches
- Process batches incrementally
- Write rows directly to a stream
- Upload progressively to storage
Why this mattered
- Controlled memory usage
- No large in-memory workbook
- Much better stability under load
The Result
- Export time reduced from ~15 minutes → ~2–3 minutes
- System remained stable under large workloads
- No impact on API responsiveness
- Failures became isolated and recoverable
Key Takeaways
- Background jobs improve both reliability and performance
- Node.js handles heavy workloads well if CPU work is offloaded
- Batch size tuning is critical for performance vs memory trade-offs
- Observability is essential for scaling systems like this
Closing Thoughts
This wasn’t about optimizing a function, it was about choosing the right execution model.
By moving heavy work out of the request cycle, parallelizing CPU-intensive tasks, and adopting streaming, we significantly improved both performance and system reliability.
If your Node.js application handles large report generation inside request handlers, this is one of the highest-impact architectural improvements you can make.