Tools: Open Source Transfering Files With Grpc
Is transfering files with gRPC a good idea? Or should that be handled by a separate REST API endpoint? In this post, we will implement a file transfer service in both, use Kreya to test those APIs, and finally compare the performance to see which one is better.
When handling large files, it is important to stream the file from one place to another. This might sound obvious, but many developers (accidentally) buffer the whole file in memory, potentially leading to out-of-memory errors. For a web server that provides files for download, a correct implementation would stream the files directly from the file system into the HTTP response.
Another problem with very large files are network failures. Imagine you are downloading a 10 GB file on a slow connection, but your connection is interrupted for a second after downloading 90% of it. With REST, this could be solved by sending a HTTP Range header, requesting the rest of the file content without download the first 9 GB again. For the simplicity of the blogpost and since something similar is possible with gRPC, we are going to ignore this problem.
Handling file transfers with REST (more correctly plain HTTP) is pretty straight forward in most languages and frameworks. In C# or rather ASP.NET Core, an example endpoint offering a file for downloading could look like this:
We are effectively telling the framework to stream the file /files/test-file.pdf as the response. Internally, the framework repeatedly reads a small chunk (usually a few KB) from the file and writes it to the response.
The whole response body will consist of the file content and Kreya, our API client, automatically renders it as a PDF. Other information about the file, such as content type or file name, will have to be sent via HTTP headers.
This is important. If you have a JSON REST API and try to send additional information in the response body like this:
This is bad! The whole file content will be Base64-encoded and takes up 30% more space than the file size itself. In most languages/frameworks (without additional workarounds), this would also buffer the whole file in memory since the Base64-encoding process is usually not streamed while creating the JSON response. If the file itself is also buffered in memory, you could see memory usage over twice the size of the file. This may be fine if your files are only a few KB in size. But even then, if multiple requests are concurrently hitting this endpoint, you may notice quite a lot of memor
Source: HackerNews