Making Writers Read

about | archive


[ 2019-April-16 19:36 ]

(Originally posted on the Bluecore Engineering Blog.) Recently, I ran into a unique challenge: I needed to connect a byte stream Writer to code that reads from a byte stream Reader. They both process a stream of bytes, but in different directions. This felt like tracing a big knot of cables only to find that both cables have plugs on the ends so you can't connect them. I needed a thing-a-ma-bob to turn one of the "plugs" into a "socket."

To turn a "plug" into a "socket," I first needed to understand how Readers and Writers work and how they can be chained together. It turns out they are remarkably similar, so it is possible to make a Writer read. While this was a somewhat unusual situation, I learned a lot about processing byte streams. I wanted to write it down so I don't forget, and hopefully someone else will find this useful.

The Direction of Readers and Writers

Every programming language has a byte stream abstraction in its standard library, such as Go's Reader/Writer and Java's InputStream/OutputStream. The examples in this article are in Go, but other languages work in a similar way. As their names suggest, Readers are for getting bytes into your program and Writers are for sending them out. The Writer interface is:

type Writer interface {
        Write(p []byte) (n int, err error)
}

This means the caller passes in a chunk of bytes, the writer does something with it, then returns the number of bytes processed or an error. The Reader interface is nearly identical:

type Reader interface {
        Read(p []byte) (n int, err error)
}

These interfaces show how similar Readers and Writers are: the arguments and return values are exactly the same. They both take chunks of bytes and do something with them and can be called in a loop to process a stream. The difference is the direction the data flows.

Luckily, which one to use is usually clear. Let's consider a program that takes data from disk, converts each character to uppercase, then sends it over a network. We need a Reader to get data from disk and a Writer to send it over the network. It looks something like the diagram below.

Using a Reader to get bytes in to the program and Writer to get them out

In this example, which one to use is pretty clear. You must use a Reader to get bytes from disk, and you must use a Writer to send bytes out.

When working with byte streams, it is pretty common to transform the stream by converting the bytes from one format to another. For example, you might want to compress data so it takes up less space. When transforming bytes, the direction of the data is usually clear. For example, if we are compressing data, it is almost always when we are sending data out of the program, so compression is a Writer. By contrast, we typically decompress bytes to get the original data into our program, so decompression is a Reader.

I recently found myself in the rare situation of wanting to process data in the "opposite" direction. I had a network server, with some complicated function that took a Reader and processed all the data in it. This function assumed the bytes were base64 encoded:

I was modifying it so the function would process unencoded bytes instead. This meant I needed to add some logic to encode the bytes if they weren't already encoded. I wanted something that looked like this:

However, Base64 Encoder is a Writer, not a Reader, so this doesn't work. I needed to take a Writer and convert it into a Reader.

One solution is to avoid the problem. We could read the entire source stream into memory, transform it, then call the function with the now base64-encoded data. However this is inefficient. It needs to store the entire input in memory, while the streaming version uses a fixed amount to process an infinite stream.

Making a Reader Write

The Go standard library provides a solution to this problem: the somewhat obscure io.Pipe. It creates a Reader and a Writer that are connected. The magic is that calling write on the pipe blocks until another thread calls read, and vice-versa. When there are two threads that want to exchange data, the Pipe copies bytes from one buffer to another. This looks something like the following:

This means we can start a new thread that reads from the original Reader and copies bytes into the Base64 Encoder. The Base64 Encoder then writes the encoded data to the Writer side of the Pipe. Finally, we can pass the Reader end of the Pipe into the original complicated function. When it reads from it, it gets Base64 encoded bytes. It looks like the following:

There are a few tricky parts to ensure everything completes in order and that errors are handled correctly. I've created a Github repository to demonstrate how this works.

A Common Abstraction: Transformer

This led me to wonder: If Readers and Writers can be converted into each other, shouldn't there be some Transformer interface that could be used for both purposes? Something implementing this interface could be used as both a Reader and a Writer, without needing a Pipe. Compression libraries like zlib or Zstandard provide exactly this sort of transform function for compression and decompression. So why isn't this more common? The issue is complexity, as it frequently is in engineering.

To understand why, let's take a look at an example of what the Transformer interface could look like in Go:

type Transformer interface {
        Transform(output []byte, input []byte) (written int, read int, err error)
}

The Transform function reads as many bytes as it can from the source buffer, processes them, and writes as many bytes as it can to the destination buffer. It returns the number of bytes written and the number of bytes read, or an error. The caller can then do something with the bytes in the output buffer. If the input buffer is empty, it needs to put more bytes in it. Finally, it can call transform again to process an infinite stream.

Since the caller needs to manage both an input and an output buffer, using this interface is more complex than a Reader or a Writer, where there is only one buffer. At the same time, this interface is more powerful, because we can use it implement both a Reader and a Writer. Unfortunately, most transformations have some natural direction, as discussed previously. This means the Transformer is more more complex than Readers and Writers for typical use cases, but simpler for some rare cases.

On the whole, Readers and Writers make the normal case simple, while the unusual cases are still possible (even if it takes an entire blog post for me to understand how to do it). I think this is a great example of the kind of complexity trade-off we should strive for when engineering software.

Related Articles