Tools

Tools: Converting Large JSON, NDJSON, CSV and XML Files without Blowing Up Memory

2026-02-14 0 views admin

Tools: Converting Large JSON, NDJSON, CSV and XML Files without Blowing Up Memory

Source: Dev.to

The Real Issue: Buffering vs Streaming ## Why I Built a Streaming Converter ## What Does "Low Memory" Actually Mean? ## CSV → JSON Benchmarks ## Where Streaming Isn't Always Faster ## NDJSON → JSON Performance ## What the Library Looks Like ## Why Rust + WebAssembly? ## When This Tool Makes Sense ## What I Learned Building It ## Closing Thoughts Most of us have written something like this at some point: At some point the file grows.\ 50MB. 200MB. 1GB. 5GB. This isn't a JavaScript problem. It's a buffering problem. Most parsing libraries operate in buffer mode: That means memory usage scales with file size. Streaming flips the model: That architectural difference matters far more than micro-optimizations. I've been working on a project called convert-buddy-js, a Rust-based streaming conversion engine compiled to WebAssembly and exposed as a JavaScript library. The core goal was simple: Keep memory usage flat, even as file size grows. Not "be the fastest library ever."\ Just predictable. Stable. Bounded. Here's an example from benchmarks converting XML → JSON. The difference is architectural. The streaming engine processes elements incrementally instead of constructing large intermediate structures. I benchmarked against: Here's a representative neutral case (1.26 MB CSV): In favorable large cases (13.52 MB CSV): In most CSV scenarios tested, the streaming approach resulted in roughly 3x--4x throughput improvements, with dramatically lower memory overhead. For tiny NDJSON files, native JSON parsing can be faster. When files are extremely small, the overhead of streaming infrastructure can outweigh benefits.\ Native JSON.parse is heavily optimized in engines and extremely efficient for small payloads. The goal here isn't to replace native JSON for everything. It's to handle realistic and large workloads predictably. For medium nested NDJSON datasets: That's where streaming and incremental transformation shine --- especially when the workload involves structured transformation rather than just parsing. Because the core engine is written in Rust and compiled to WebAssembly. Not because it's trendy. WebAssembly allows that engine to run safely in the browser without server uploads. You probably don't need it if: There are many good parsing libraries in the JavaScript ecosystem. PapaParse is mature.\ csv-parse is robust.\ Native JSON.parse is extremely optimized. convert-buddy-js is simply an option focused on: If that matches your constraints, it may be useful. If not, the ecosystem already has excellent tools. If you're curious, the full benchmarks and scenarios are available in the repository. convert-buddy-js — npm brunohanss/convert-buddy And if you have workloads where streaming would make a difference, I’d be interested in feedback. You can get more information or try the interactive browser playground here: https://convert-buddy.app/ Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse CODE_BLOCK: const data = JSON.parse(hugeString); Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: const data = JSON.parse(hugeString); CODE_BLOCK: const data = JSON.parse(hugeString); COMMAND_BLOCK: npm install convert-buddy-js Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: npm install convert-buddy-js COMMAND_BLOCK: npm install convert-buddy-js COMMAND_BLOCK: import { convert } from "convert-buddy-js"; const csv = 'name,age,city\nAlice,30,NYC\nBob,25,LA\nCarol,35,SF'; // Configure only what you need. Here we output NDJSON. const buddy = new ConvertBuddy({ outputFormat: 'ndjson' }); // Stream conversion: records are emitted in batches. const controller = buddy.stream(csv, { recordBatchSize: 2, // onRecords can be async: await inside it if you need (I/O, UI updates, writes...) onRecords: async (ctrl, records, stats, total) => { console.log('Batch received:', records); // Simulate slow async work (writing, rendering, uploading, etc.) await new Promise(r => setTimeout(r, 50)); // Report progress (ctrl.* is the most reliable live state) console.log( `Progress: ${ctrl.recordCount} records, ${stats.throughputMbPerSec.toFixed(2)} MB/s` ); }, onDone: (final) => console.log('Done:', final), // Enable profiling stats (throughput, latency, memory estimates, etc.) profile: true }); // Optional: await final stats / completion const final = await controller.done; console.log('Final stats:', final); Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: import { convert } from "convert-buddy-js"; const csv = 'name,age,city\nAlice,30,NYC\nBob,25,LA\nCarol,35,SF'; // Configure only what you need. Here we output NDJSON. const buddy = new ConvertBuddy({ outputFormat: 'ndjson' }); // Stream conversion: records are emitted in batches. const controller = buddy.stream(csv, { recordBatchSize: 2, // onRecords can be async: await inside it if you need (I/O, UI updates, writes...) onRecords: async (ctrl, records, stats, total) => { console.log('Batch received:', records); // Simulate slow async work (writing, rendering, uploading, etc.) await new Promise(r => setTimeout(r, 50)); // Report progress (ctrl.* is the most reliable live state) console.log( `Progress: ${ctrl.recordCount} records, ${stats.throughputMbPerSec.toFixed(2)} MB/s` ); }, onDone: (final) => console.log('Done:', final), // Enable profiling stats (throughput, latency, memory estimates, etc.) profile: true }); // Optional: await final stats / completion const final = await controller.done; console.log('Final stats:', final); COMMAND_BLOCK: import { convert } from "convert-buddy-js"; const csv = 'name,age,city\nAlice,30,NYC\nBob,25,LA\nCarol,35,SF'; // Configure only what you need. Here we output NDJSON. const buddy = new ConvertBuddy({ outputFormat: 'ndjson' }); // Stream conversion: records are emitted in batches. const controller = buddy.stream(csv, { recordBatchSize: 2, // onRecords can be async: await inside it if you need (I/O, UI updates, writes...) onRecords: async (ctrl, records, stats, total) => { console.log('Batch received:', records); // Simulate slow async work (writing, rendering, uploading, etc.) await new Promise(r => setTimeout(r, 50)); // Report progress (ctrl.* is the most reliable live state) console.log( `Progress: ${ctrl.recordCount} records, ${stats.throughputMbPerSec.toFixed(2)} MB/s` ); }, onDone: (final) => console.log('Done:', final), // Enable profiling stats (throughput, latency, memory estimates, etc.) profile: true }); // Optional: await final stats / completion const final = await controller.done; console.log('Final stats:', final); - The tab freezes (in the browser) - Memory spikes - The process crashes - Or worse --- everything technically "works" but becomes unusable - Read the entire file into memory - Parse it completely - Return the result - Read chunks - Process incrementally - Emit records progressively - Keep memory nearly constant - Web Workers - Predictable memory behavior - Strong streaming primitives - Deterministic performance - Easier control over allocations - Files are always < 1MB - You're already happy with JSON.parse - You don't care about memory spikes - You process large CSV exports - You handle XML feeds - You work with NDJSON streams - You need conversion in the browser without uploads - You want predictable memory footprint - Streaming is not just about speed --- it's about stability. - Benchmarks should include losses. - Native JSON.parse is hard to beat for tiny payloads. - Memory predictability matters more than peak throughput. - Low memory usage - Format transformation - Large file handling

🏷️ Tags

how-totutorialguidedev.toaimlserverjavascript