I had a Node.js payment API hang for nine seconds on a single request and take down two upstream services with it. The endpoint did one thing: hash a password with bcrypt at cost 14, then run a small SQL update. The SQL was 4 ms. The bcrypt was 9 seconds. Nobody had told the team that bcrypt.hashSync blocks the entire Node.js event loop, and bcrypt at cost 14 is a long block. Every request that arrived during those nine seconds queued up behind it. The fix was one character: dropping Sync. The lesson — that the event loop isn’t an abstraction you can ignore — took longer to absorb.
This is the mental model I use for the event loop. Not the academic version. The version that prevents production incidents — with the libuv internals at the level of detail you actually need to debug a stuck server.
The one-paragraph version
Node.js runs your JavaScript on a single thread. When that thread is busy, nothing else happens — no requests get accepted, no callbacks fire, no timers tick. The “event loop” is the loop that runs after each piece of synchronous work, asking the OS “any I/O ready? any timers due?” and dispatching whatever is ready. Asynchronous code doesn’t make work parallel; it lets the loop continue while waiting. Block the loop and you’ve blocked everything.
The six phases (in the order they run)
- Timers — callbacks scheduled by
setTimeoutandsetIntervalthat are now due. - Pending callbacks — system-level things deferred from the previous loop (TCP errors like
ECONNREFUSED; rare in app code). - Idle / prepare — internal libuv housekeeping. You don’t interact with this from JS.
- Poll — accept new I/O events and execute their callbacks. Calculates how long to block waiting for I/O. This is where most of your async code actually runs.
- Check — callbacks scheduled by
setImmediate. - Close callbacks — cleanup callbacks like
socket.on("close", ...).
Between every phase, two microtask queues drain completely:
- process.nextTick queue — anything you scheduled with
process.nextTick(). - Promise microtasks — callbacks chained off resolved promises (
then,catch,finally,awaitcontinuations).
The order matters: process.nextTick drains before promise microtasks. A flood of nextTick calls can starve promise resolution. The official Node.js event loop docs have the canonical phase diagram if you want to print it out.
libuv’s role: the C engine that owns the loop
The “Node.js event loop” is really libuv‘s event loop. libuv is the C library that gives Node its cross-platform async I/O — it abstracts over epoll on Linux, kqueue on macOS/BSD, and IOCP on Windows. The phases I just listed map directly onto libuv’s uv_run() function: uv__run_timers, uv__run_pending, uv__run_idle, uv__io_poll, uv__run_check, uv__run_closing_handles, in that order.
One change worth knowing for 2026: libuv 1.45+ (which ships in Node 20+) made timer execution always run after the poll phase. Before that, timers could fire both before and after poll, which made the setTimeout(0) vs setImmediate ordering nondeterministic even inside I/O callbacks. Now setImmediate reliably wins inside an I/O callback. Code that depended on the old behavior breaks silently on upgrade.
setImmediate vs setTimeout(0) vs process.nextTick
Three “do this later” primitives that look interchangeable. They’re not.
| Call | Drains in | Latency in practice | Use it for |
|---|---|---|---|
process.nextTick(fn) |
Microtask queue (between phases) | Microseconds — before any I/O | “Run this before the next I/O event” — error emission, defer-until-current-stack-clears |
queueMicrotask(fn) |
Promise microtask queue (after nextTick) | Microseconds — after nextTick | Spec-compliant alternative to nextTick that works in browsers too |
setImmediate(fn) |
Check phase | Sub-millisecond — after I/O | “Run this after I/O, yielding to the loop” — chunk CPU work without starving I/O |
setTimeout(fn, 0) |
Timers phase | 1–4 ms in practice (clamped) | Almost never. Use setImmediate instead unless you specifically need timer phase semantics. |
Real implication: inside an I/O callback (Node 20+, libuv 1.45+), setImmediate(fn) always fires before setTimeout(fn, 0). Outside an I/O callback (e.g., at process startup), timing variability still creeps in.
The pattern that bit me once: a recursive task that scheduled itself with setTimeout(0) for “yielding to the event loop.” It worked. I changed it to setImmediate and the same code ran 4× faster because it stopped waiting for the timers phase. The clamp on setTimeout(0) is real and noticeable on hot paths.
Why process.nextTick is dangerous
The microtask queue runs to completion between every phase. If you schedule nextTick from inside a nextTick callback, the queue never drains and the event loop never advances. Your process spins forever — accepting no connections, firing no timers, looking healthy in ps but completely dead to your users.
// DON'T DO THIS
function loop() {
process.nextTick(loop); // I/O is now starved forever
}
loop();The same is true for promise microtasks, but in practice nobody writes a recursive promise resolution by hand. nextTick is the one to watch, especially if you maintain a library that emits errors via nextTick (a common pattern). Use setImmediate instead when you want to yield and let I/O proceed.
How async/await actually works
An await expression splits the function into two: the part before the await (runs synchronously up to the await), and the part after (runs as a promise microtask when the awaited thing resolves).
async function getUser(id: string) {
const user = await db.user.findUnique({ where: { id } });
return user;
}
// Functionally equivalent to:
function getUser(id: string) {
return db.user.findUnique({ where: { id } }).then((user) => {
return user;
});
}The then callback is a microtask. It runs between event loop phases — meaning that a long chain of await resolutions can starve I/O if the work between them isn’t actually async. for (const x of bigArray) await doNothing(x) looks async but blocks the loop because the microtasks just keep queueing.
The fix is to interleave a setImmediate every N iterations to yield back to the I/O phase:
// BAD — looks async, actually blocks
for (const item of millionItems) {
await transform(item); // resolves immediately, microtask floods
}
// GOOD — yields to I/O every 1000 items
for (let i = 0; i < millionItems.length; i++) {
await transform(millionItems[i]);
if (i % 1000 === 0) await new Promise(r => setImmediate(r));
}The bug from the opener, in detail
The payment API code looked like this:
app.post("/auth/register", (req, res) => {
const { email, password } = req.body;
const hash = bcrypt.hashSync(password, 14); // BLOCKS for ~9s at cost 14
await db.user.create({ data: { email, passwordHash: hash } });
res.json({ ok: true });
});During those 9 seconds:
- The event loop didn’t run.
- No new requests were accepted (TCP backlog filled, then dropped).
- Timers didn’t fire — including health-check responses to the load balancer.
- The load balancer marked the instance unhealthy and pulled it from rotation.
- The same request, retried by the upstream, hit a different instance which had the same bug.
One request, in synchronous code, took down the entire pool.
The fix:
const hash = await bcrypt.hash(password, 14); // async — uses libuv thread poolbcrypt.hash (no Sync) uses libuv’s thread pool. The hashing still takes 9 seconds, but it runs on a worker thread. The event loop stays free. Other requests get served. The single request waits on its own work, but everyone else is fine.
libuv’s thread pool: the hidden parallelism
Node has a thread pool of 4 by default, used for:
- File system operations (
fs.readFile, etc.) - DNS lookups via
dns.lookup— but notdns.resolve*(those use c-ares directly) - Crypto:
bcrypt,scrypt,pbkdf2,randomBytes,generateKeyPair - All of zlib (gzip, brotli, deflate) when called asynchronously
Network I/O does not use the thread pool — it uses the OS’s async primitives (epoll on Linux, kqueue on macOS, IOCP on Windows) directly. This is why a Node.js app can hold 10,000 open sockets without breaking a sweat: each socket isn’t a thread. The same property is what makes Socket.io scale to tens of thousands of long-lived connections — patterns in the WebSockets with Socket.io guide.
If you have a lot of concurrent fs or crypto work, the default 4 threads bottleneck:
UV_THREADPOOL_SIZE=16 node dist/index.jsCap is 1024 (raised from 128 in libuv 1.30). For most apps, 4 is fine. For services that hash a lot of passwords or read a lot of files, 16 or 32 helps measurably. The pool size is fixed at first I/O — setting UV_THREADPOOL_SIZE after the first fs.readFile has no effect. This catches teams who try to set it dynamically based on runtime conditions.
The dns.lookup vs dns.resolve trap
One specific bug worth calling out: dns.lookup() uses the libuv thread pool because it calls into getaddrinfo, which is a blocking C function. dns.resolve*() uses c-ares, which speaks DNS directly and never touches the thread pool. Most application code (including fetch, http.request, anything via undici) goes through dns.lookup by default — meaning a busy app under DNS-resolution load can saturate the same 4-thread pool that your fs reads need.
// Default fetch uses dns.lookup → libuv thread pool
await fetch("https://api.example.com");
// Force c-ares for thread-pool-free DNS
import { Agent } from "undici";
import dns from "node:dns/promises";
const agent = new Agent({
connect: {
lookup: (hostname, options, cb) => {
dns.resolve4(hostname).then(
(addresses) => cb(null, addresses[0], 4),
(err) => cb(err)
);
},
},
});The c-ares path scales to thousands of concurrent DNS resolutions without touching the thread pool. Worth knowing the moment your app starts proxying to many distinct hostnames.
How to detect that you’ve blocked the loop
Three tools, in increasing order of effort:
perf_hooksbuilt-in. Measure event loop lag withmonitorEventLoopDelay(). Anything above 50 ms in p99 is a problem.- clinic.js doctor. Runs your app under load, classifies the bottleneck (event loop blocked, GC pauses, I/O wait, etc.). 90 seconds of work for a clear answer.
- blocked-at. An npm package that captures stack traces every time the event loop blocks for more than N ms. Useful in production if you’re hunting an intermittent block.
// production-friendly event loop monitoring
import { monitorEventLoopDelay } from "node:perf_hooks";
import { logger } from "./logger.js";
const histogram = monitorEventLoopDelay({ resolution: 10 });
histogram.enable();
setInterval(() => {
const p99 = histogram.percentile(99) / 1e6; // ns to ms
const p50 = histogram.percentile(50) / 1e6;
const max = histogram.max / 1e6;
if (p99 > 100) {
logger.warn({ eventLoopP50Ms: p50, eventLoopP99Ms: p99, eventLoopMaxMs: max }, "event loop lag");
}
histogram.reset();
}, 10_000);Add this to any production service. The first time it fires, you’ll be glad it’s there. If you ship to Grafana or any APM, expose p50, p99, and max as metrics and alert on the p99 crossing 100 ms.
Things that will block the event loop (memorize this list)
| Blocker | Typical block time | Fix |
|---|---|---|
fs.readFileSync on a large file |
10–500 ms | Use fs.promises.readFile or stream |
JSON.parse on a multi-MB payload |
50–500 ms | Streaming parser; streams tutorial |
crypto.pbkdf2Sync / bcrypt.hashSync |
100 ms – 10 s (cost-dependent) | Drop the Sync suffix; uses libuv thread pool |
| Catastrophic-backtracking regex | Seconds to minutes | Validate input length, use re2 |
| Tight CPU loop (image processing, ML inference, parsing huge strings) | 50 ms+ | Move to worker_threads pool or job queue |
console.log to a slow terminal |
1–50 ms (latency, not CPU) | Use pino with async transport |
Recursive process.nextTick |
Forever | Use setImmediate instead |
Array.sort() on a million-row array |
500 ms+ | Move to worker thread or paginate the sort |
For CPU work, move it to a worker thread (covered in the cluster vs worker threads guide). For I/O, use the async variants. There’s almost never a reason to use the synchronous version of an I/O call in production; the only legitimate use is at startup, before you start serving traffic.
Pick X when Y: which “later” primitive to reach for
| Pick this | When you want to… | Example |
|---|---|---|
process.nextTick(fn) |
Defer until the current synchronous stack clears, but before any I/O | Emit an error immediately after returning a stream object so listeners attached before nextTick drains can handle it |
queueMicrotask(fn) |
Same as nextTick but runs in browser too (cross-runtime libraries) | Hono / Web-streams library shared between Node and Cloudflare Workers |
setImmediate(fn) |
Yield to the event loop after the current I/O callback finishes | Chunk a CPU loop into 1000-iteration batches without blocking healthchecks |
setTimeout(fn, 0) |
Almost never. Maybe to interleave a low-priority retry on a budget | “Wait at least one timers-phase tick before retrying” — a pattern most code shouldn’t need |
worker_threads |
Run CPU work concurrently with the request thread | Image transform, PDF generation, password hashing pool |
| Job queue (BullMQ / Inngest) | Defer expensive work outside the request | Send 10k transactional emails after a deploy; nightly export jobs |
The actual order: a worked example
This snippet demonstrates the entire ordering rule in one file:
import { setImmediate } from "node:timers/promises";
console.log("1: top of stack");
setTimeout(() => console.log("7: setTimeout 0"), 0);
setImmediate(() => console.log("8: setImmediate (top level)"));
Promise.resolve().then(() => console.log("4: promise microtask 1"));
process.nextTick(() => console.log("3: nextTick 1"));
process.nextTick(() => {
console.log("3.5: nextTick 2 (drains in same batch as 3)");
Promise.resolve().then(() => console.log("5: promise scheduled from nextTick"));
});
queueMicrotask(() => console.log("6: queueMicrotask"));
console.log("2: bottom of stack");
import("node:fs/promises").then(async (fs) => {
await fs.readFile(import.meta.url);
// Inside an I/O callback: setImmediate beats setTimeout(0)
setTimeout(() => console.log("10: setTimeout 0 inside I/O"), 0);
setImmediate(() => console.log("9: setImmediate inside I/O"));
});
// Output on Node 24:
// 1, 2, 3, 3.5, 4, 5, 6, 7, 8, 9, 10
// (numbers above are the print order they appear in the log)Read the rules out of the output: synchronous code first (1, 2), then the nextTick queue drains fully (3, 3.5), then the promise microtask queue drains fully (4, 5, 6 — note that the promise scheduled from inside a nextTick still runs before timers because microtasks drain to completion). Then timers (7), then check (8). Inside an I/O callback the order flips: setImmediate beats setTimeout(0) deterministically (9 before 10).
What changed in Node 24 (and why it matters)
Three event-loop-relevant improvements landed since Node 20:
- libuv 1.45+ timer ordering. Inside I/O callbacks,
setImmediatenow reliably wins oversetTimeout(0). Code that relied on the old nondeterministic order will see different behavior on upgrade. Usually for the better; occasionally a test relies on the previous accidental ordering. - AsyncLocalStorage performance. Async context propagation in Node 24 is materially faster than in Node 20 — relevant if you use OpenTelemetry, request-scoped logging, or per-tenant context. The cost was once measurable on hot paths; in 2026 it’s negligible for most apps.
node:test+--watchstable. Not directly an event-loop change, but the built-in test runner uses the loop in interesting ways (parallel test workers, timer mocking) and is now production-stable. Worth replacing Jest in greenfield work for the simpler dependency story.
FAQ
How does the Node.js event loop work?
Node runs your JavaScript on a single thread. After each chunk of synchronous work, it cycles through six libuv phases — timers, pending callbacks, idle/prepare, poll (I/O), check (setImmediate), close — dispatching ready callbacks at each phase. Microtask queues (process.nextTick first, then promises) drain to completion between every phase. Block the thread and the entire loop stops.
What is non-blocking I/O in Node.js?
Network and disk operations don’t wait for the OS to finish — they register a callback and return immediately. The event loop picks up the callback when the OS signals the operation is done (via epoll/kqueue for network, libuv’s thread pool for disk). The single thread stays free to handle other work. libuv is the C library that implements this abstraction.
What is the difference between setImmediate and process.nextTick?
process.nextTick drains in the microtask queue between phases — before any I/O, before promise microtasks, recursively until empty. setImmediate runs in the check phase, after the current poll phase completes. nextTick can starve I/O if used recursively; setImmediate always lets I/O proceed. Use setImmediate for “yield to the event loop”; use nextTick for “before any I/O happens.”
What blocks the Node.js event loop?
Synchronous CPU work, all *Sync file system calls, synchronous crypto, large JSON.parse, regex with catastrophic backtracking, recursive process.nextTick calls, and synchronous console.log to a slow terminal. Detect blocks with perf_hooks.monitorEventLoopDelay or clinic doctor; fix with async I/O, worker threads, or a job queue.
Is Node.js single-threaded?
The JavaScript runtime is single-threaded per V8 isolate. But Node uses libuv’s thread pool (default 4 threads, raise via UV_THREADPOOL_SIZE) for filesystem, crypto, and dns.lookup work, and offloads network I/O to OS-level async primitives that don’t touch the pool. Worker threads let you spawn additional JavaScript threads explicitly when you need true CPU parallelism.
How do I check event loop lag in production?
Use perf_hooks.monitorEventLoopDelay() with a 10 ms resolution histogram, sample p50/p99/max every 10 seconds, and log when p99 crosses a threshold (50–100 ms is a reasonable warning level). Performance optimization patterns are covered in the Node.js performance optimization piece. Alert on max > 1 second — that’s almost always a synchronous bug.
Why does setImmediate sometimes fire before setTimeout(0) and sometimes after?
Inside an I/O callback (Node 20+, libuv 1.45+), setImmediate always fires first because the check phase runs before the loop wraps back to timers. Outside an I/O callback (e.g., at process startup), the order depends on how long the current tick takes and whether the timers phase is reached first. The fix is: don’t rely on the order. If you need ordering, chain explicitly with promises.
What’s the difference between dns.lookup and dns.resolve?
dns.lookup calls into the OS’s getaddrinfo via libuv’s thread pool — it honors /etc/hosts, mDNS, and any system-level DNS configuration. dns.resolve* speaks DNS directly via c-ares and never touches the thread pool. The first is slower under contention but matches what every other process on the box sees; the second is faster under high concurrency but ignores system DNS configuration. fetch defaults to dns.lookup.