Frameworks: Necessary for large-scale software (evanjones.ca)

[ 2014-October-25 15:07 ]

The word "framework" is a derogatory term, suggesting over-complicated and unnecessary software. I still don't really understand why Guice is useful (as I wrote a year ago), or what Spring even does. However, since joining Twitter three months ago, I've decided that good frameworks are necessary for large-scale software engineering. When hundreds of engineers work on a single project, there needs to be common high-level building blocks. The two big companies I've worked at (Google and Twitter) both have a common platform for building backend services that I would call frameworks. The Google and Twitter frameworks are hundreds of thousands of lines of code that allow engineers to quickly write services without thinking about the common stuff like logging, debugging, and monitoring (Twitter's is open source). Just as important as reusing code, the frameworks allow people to reuse knowledge, since an engineer familiar with the framework can quickly understand, debug and modify other people's applications.

The challenge is that frameworks are complicated, and make applications harder to understand when you are not familiar with them. This is why I believe that frameworks frequently waste more time to learn than they save in development time. I think this is a significant cause of the Go programming language community's scorn for frameworks. For example, in a recent go-nuts mailing list post, Andrew Gerrand said "Go philosophically prefers small, simple pieces, so naturally the Go community leans away from frameworks." (This article was inspired by a Twitter discussion I had about this statement with Dave Cheney and Andrew Gerrand.) I think "leaning away from frameworks" is the right default. You shouldn't use any dependency until you know that it is going to pull its own weight.

However, when your organization has tens or hundreds of independent applications or modules, you need more structure. You want one way of solving common problems that is the same across all applications. You want the common plumbing code to be written once, and reused everywhere. For large-scale software engineering, you need a framework.

What is a framework anyway?

One cause for disagreement is that "framework" is an imprecise term, and people have different definitions. My definition is fairly broad: A framework is a library that is used by writing code that is called by the framework (e.g. via callbacks or interfaces). (This is sometimes called Inversion of Control.) Frequently, frameworks also connect related components together in a sensible way, but that isn't a requirement. This broad definition comes from looking at a list of projects that call themselves frameworks, ranging from low-level network plumbing through to complex distributed systems: Ruby on Rails (Ruby), Django (Python), Flask (Python), Hadoop (Java), Spring (Java), Guice (Java), Dropwizard (Java), Play (Java/Scala), Netty (Java), AngularJS (Javascript), Revel (Go).

Good frameworks hide lots of complexity behind simple, flexible abstractions

When I look at that list, the ones that I think of as "good" frameworks put a large amount of code and complexity behind a substantially simpler API. This allows the developer to learn the "surface" of the framework fairly much more easily than re-implementing the subset that they need. For example, writing a new Hadoop task is not trivial if you've never done it before. However, while it may take a day to learn the basic commands and APIs, there is a ton of magic behind it, and it can be reused in many different ways. The challenge for framework designers is finding the right abstractions. The right approach is probably to write a few applications in order to deeply understand the common problems, then pull those out in a reusable way. (Martin Fowler calls this harvesting a framework).

The best frameworks also tend to be modular, allowing users to start with a small core and plug-in components as needed. For example, Netty is a Java framework for non-blocking network applications. The core is mostly a wrapper around Java's built-in java.nio API. However, higher level protocol handlers like HTTP and memcached clients and servers have been built on top of this core, and can be easily used inside an application that is already using Netty. Dropwizard and Twitter-Server are similar, in that they connect a set of independent libraries, all of which can be used separately.

Go: net/http is a great micro-framework

Turning back to Go, I would call the standard library net/http package a framework. To use it, you implement callbacks that are called by the server. The package does quite a bit on your behalf: it accepts new connections, spawns goroutines, parses and routes HTTP requests, and catches panics. One of the reasons most Go programs don't need much else is that net/http is a great framework. It exposes an easy to understand API: register callbacks for request paths. Under that simple API is a lot of complexity: tens of thousands of lines of network protocol handling. That API is useful for a huge number of applications, and it can be easily extended by chaining together handlers or by directly using the lower-level components.

However, large applications will need additional pieces, like templates, storage, and monitoring. For many applications, Go's standard library will suffice. However, as organizations build multiple large applications, it is more efficient if these applications all make the same decisions for how to do each of these higher level tasks. As a result, I suspect large organizations using Go will build frameworks that look similar to Google and Twitter's frameworks, to connect these pieces together with sensible defaults. This is necessary for large-scale software engineering, even with Go. However, I think the Go language and philosophy will encourage people to build better frameworks, and the standard net/http package is a great example of what a framework should be.