15 Minutes to 3: Million-Row Excel Exports in Node.js

Cut export time from ~15 minutes to ~3 using queues and workers.

March 30, 2026 (2mo ago) • 3 min read

Generating Excel files sounds trivial, until your dataset hits a million rows.

At my current role, I worked on a system where users frequently exported large datasets. The implementation worked, but performance was a major issue: ~15 minutes per export under real-world load.

At that scale, it’s not just slow, it’s risky:

Long-running requests -> timeouts
Memory spikes -> unstable servers
Poor UX -> users lose trust

This wasn’t something query optimization could fix.
It required rethinking the execution model.

The Original Approach (And Why It Failed)

The export pipeline was entirely request-driven:

Fetch data
Transform rows
Generate Excel
Write file
Upload and return URL

What went wrong

CPU-heavy work blocked the event loop
XLSX generation and transformations ran on the main thread.
High memory usage
Large datasets stayed in-process, leading to GC pressure and instability.
Fragile execution model
A single request handled everything, any failure meant restarting the entire process.

Even after optimizing queries, performance barely improved.
The architecture itself was the bottleneck.

The Shift: Moving Work Out of the Request Cycle

To address this, I worked on redesigning the export pipeline to move heavy processing into background jobs.

The new flow:

API validates request
Creates a job payload (filters, metadata, format)
Enqueues the job using BullMQ
Returns immediately

This made the API fast and reliable, regardless of dataset size.

Using BullMQ for Orchestration

We used BullMQ to handle:

Background processing
Retry strategies with backoff
Concurrency control
Progress tracking

This allowed failed jobs to retry safely without affecting user requests.

Parallel Processing with Worker Threads

The biggest bottleneck was CPU-heavy transformation.

To solve this, I implemented parallel processing using Node.js worker threads:

Split large datasets into batches
Process batches in parallel across worker threads
Merge results before writing to the final output

This removed the event loop bottleneck and allowed us to utilize multiple CPU cores effectively.

Stream-Based Excel Generation

Another key improvement was switching to a streaming approach for Excel generation:

Read data in batches
Process batches incrementally
Write rows directly to a stream
Upload progressively to storage

Why this mattered

Controlled memory usage
No large in-memory workbook
Much better stability under load

The Result

Export time reduced from ~15 minutes → ~2–3 minutes
System remained stable under large workloads
No impact on API responsiveness
Failures became isolated and recoverable

Key Takeaways

Background jobs improve both reliability and performance
Node.js handles heavy workloads well if CPU work is offloaded
Batch size tuning is critical for performance vs memory trade-offs
Observability is essential for scaling systems like this

Closing Thoughts

This wasn’t about optimizing a function, it was about choosing the right execution model.

By moving heavy work out of the request cycle, parallelizing CPU-intensive tasks, and adopting streaming, we significantly improved both performance and system reliability.

If your Node.js application handles large report generation inside request handlers, this is one of the highest-impact architectural improvements you can make.

← Previous

MongoDB Populate vs $lookup

GitHub·Twitter·LinkedIn·iamgautamsuthar@gmail.com