Ken Thompson's "Trusting Trust" Quine

about | archive


[ 2010-May-09 14:51 ]

One of the papers I discussed this week as part of teaching MIT's Computer Systems Engineering was Ken Thompson's Reflections on Trusting Trust [doi]. This short paper discusses an interesting security attack via the compiler. However, it starts with a quine, a program that prints its own source code. Russ Cox recently described quines in great detail, including zip files that contain themselves. However, after reading this paper, I wanted to reproduce Ken Thompson's original quine. This documents how I did it, and includes source code.

  1. Bootstrap the quine by saving this as quine.c:
    #include <stdio.h>
    int main() {
            int i;
            printf("char s[] = {\n");
            for(i = 0; s[i]; i++) printf("\t%d,\n", s[i]);
            printf("%s", s);
            return 0;
    }
  2. Save the following Python script as quine.py. Run python quine.py >> quine.c.
    header = "\t0\n};\n\n"
    d = header + open("quine.c").read()
    for c in d:
        print "\t%s," % repr(c)
    print "\t0"
  3. Now you have the contents of the s array, appended to quine.c. Edit quine.c to make the array:
    char s[] = {
        // generated output goes here
    };
    
    #include <stdio.h>
        // continues ...
  4. Compile this to produce quine1: gcc -o quine1 quine.c
  5. Run quine1 to produce quine2.c: ./quine1 > quine2.c
  6. Compile quine2.c to produce quine2: gcc -o quine2 quine2.c
  7. Run quine2 to produce quine3.c, which is exactly the same as quine2.c: ./quine2 > quine3.c
  8. Compare the two: cmp quine2.c quine3.c or diff -u quine2.c quine3.c. They should be identical.

You can skip the "quine generating program" quine.c by changing quine.py to generate the numbers directly, but this is different from Thompson's version in his paper:

header = "\t0\n};\n\n"
d = header + open("quine.c").read()
for c in d:
    print "\t%d," % ord(c)
print "\t0"

The quine works because the s array contains a representation of the program, starting at the 0 terminator, which is the header inserted by the Python script, through to the end of the program. The program itself prints the definition of the s array (char s[] = {), then loops through each character in s, up to but not including the 0, printing its representation ("\t%d,\n"). This correctly reproduces the first part of the program. Finally, the program prints s, which outputs the 0 line and the rest of the program. Russ Cox explains this better than I do.

The "Trusting Trust" attack is a compiler that not only inserts a back door into the login program, but also inserts itself when compiling the compiler. This means that if you compile the compiler, you get a compiler that produces bad login binaries, without the back door being included in the source code. This quine is presented at the beginning of the paper because you need one to make this attack work. The attack is a security back door that outputs itself.