Understanding Node.js writable streams
I will assume that you know that when working with large amounts of data, streams
need to be used in order to avoid exhausting all the memory available.
Working with streams may seem complicated, and many people try to avoid them. But they are extremely powerful and if you want to master Node.js you can't ignore them, since most of the built-in modules implement streams.
The following snippet is something I've encountered quite often when reviewing code, and many people thinks because they're using streams, they're using them correctly.
(Of course I've tweaked some things to prove a point).
const fs = require('fs');
const file = fs.createWriteStream('/tmp/test.txt');
for(let i = 0; i < 1e7; i++) {
file.write("a");
}
The above code will throw an error because it reaches the maximum memory allowed for a Node.js process.
You can increase those limits using: --max_old_space_size={MB}
flag, but that won't fix the code, will only extend the suffering.
To understand why when using streams it still reaches the memory limit, we have to understand how .write
works and after that we will know why the snippet is poorly coded.
.write
returns false
when the internal buffer is full and it's time to flush it, when this happens we should stop calling .write
again until the drain
event is emitted. The buffer size is controlled by highWaterMark
option which defaults to 16384
(16kb)
In the example we are ignoring the returned value and we keep writing, we don't allow the internal buffer to be flushed, and that's why it reaches the memory limit.
So here's the snippet fixed:
const fs = require('fs');
const file = fs.createWriteStream('/tmp/test.txt');
(async () => {
for (let i = 0; i < 1e7; i++) {
// Will pause every 16384 iterations until `drain` is emitted
if (!file.write('a')) {
await new Promise(resolve => file.once('drain', resolve));
}
}
})();
Luckily for us, most of the times we're using streams, we use .pipe
which automatically handles the backpressure for us, and that's one of the reasons why you may have never encountered the error.
const fs = require('fs');
const read = fs.createReadStream('./large-file.txt');
const write = fs.createWriteStream('./larg-file-copy.txt');
read.pipe(write);
In conclusion, always handle the drain
event when working with writable streams, or use .pipe
which handles this for you automatically.
Very clear and useful article.