Go Gotcha: Don't take the address of loop variables

about | archive


[ 2015-August-03 19:36 ]

A few months ago I asked a friend who works at Dropbox how writing many of their services in Go was working for them. He said it had worked out pretty well, but being a new language there are a few rough edges that aren't generally widely known. One of these "gotchas" puzzled me: If you take the address of a loop variable and put it in a container, the container is filled with multiple pointers to the last element of the iteration. When I looked at an example, it seemed straightforward, and I didn't understand how this could be a common problem. However, I recently was debugging a small program, and I discovered that I had written a version of this exact bug. This "gotcha" is obvious in toy examples, but it can be harder to spot in bigger loops, particularly when changing some existing code. As an experiment, I decided to see how to write an automated checker for this possible bug. In this article, I'll show what this bug looks like and describe a simple automated checker that finds it. I think a correct version of this check would be a useful addition to one of the existing warning tools, like go vet or golint, since others have run into the same issue (1, 2, 3).

In Go, variables introduced in a for statement are "re-used in each iteration" (see the specification). This means taking the address of one of these variable always produces a pointer to the same location, which is re-used on each iteration. Consider the following snippet, which creates a slice of pointers to structs from a slice of structs (go playground):


values := []MyStruct{MyStruct{1}, MyStruct{2}, MyStruct{3}}
output := []*MyStruct{}
for _, v := range values {
  output1 = append(output, &v)
}
fmt.Println("output: ", output)

This produces [MyStruct{3} MyStruct{3} MyStruct{3}], but it should produce [MyStruct{1} MyStruct{2} MyStruct{3}]. To fix it, we have two options, depending on the semantics we want. If the output should contain pointers to a copy of the values, we need to take the address of a new variable (playground). If the output should contain pointers to the original values, we need to take the address of an index expression (playground).

This problem becomes harder to spot within more complicated expressions (e.g. &v.member also doesn't work), or in longer loops. However, the real challenge is that there are legitimate uses of this expression. In particular, if the pointer is only used inside a single loop iteration, then the value will be what we expect. This means we can accidentally "add" this bug to some code that previously worked, by accidentally holding on to a pointer for too long. In my case, this is exactly how I ran into this bug.

A bad automated checker

I wanted to trying finding this bug automatically, so I looked at the source code for go vet's range loop check, as well as errcheck for an example of a standalone tool. My tool is called loopcheck (github source code), but is really only a proof of concept. It is simplistic and flawed, so should not be used as a real tool. It prints a warning for any uses of the address operator (&), where the operand contains a reference to a variable declared in the loop. It prints the following warning for the original example:

example1.go:9: takes address of loop variable: &v
  range at line 8: for _, v := range values

However, it also produces warnings for complex expressions that are safe (see example.go in the loopcheck source). Ideally, a checker would look for uses that truly "escape" the loop body, so it wouldn't need to warn on valid uses. However, that would require a more sophisticated analysis partial or whole program analysis (see Go Oracle).

Conclusions?