Andrew Wilcox

The Documentation Trick for Naming Things In Code

Well I’m sitting around trying to write a letter
I’m wracking my brains trying to think
Of another word for horse
I ask my brain for some assistance.
And he says: “Huh...Let’s see...How about, uh, 'cow'?
That’s close!”
– Laurie Anderson, Baby Doll

Naming: packages, interfaces, API’s, classes, methods, functions, templates, variables, constants... for people to understand what’s going on, everything we write in software needs to be given a clear, descriptive, concise, non-confusing name.

Which usually is either super easy or super hard.

I ask my brain, “so, what should we call this?” Quite often a reasonable answer pops out. Other times I’ve called something “Foo” in my code until I can think of a name for it later.

And sometimes it’s impossible to come up with a name at first. Code is a cooperative system of modules / classes / functions, and roles and responsibilities of particular parts often only become clear once other parts of the system are figured out. Figuring out a good name for a part of the system can even drive refactoring as roles are clarified, and that refactoring can in turn expose other clumps of functionality which can in turn be encapsulated and named in turn.

There are only two hard things in Computer Science: cache invalidation and naming things.
– Phil Karlton (via Martin Fowler)

Once we do manage to come up with a good name for something, in retrospect it seems obvious. Of course that’s the reasonable name for it. As in, why would we call it anything else? Whatever struggle and hard creative effort that was needed to land on the right name is now invisible.

Techniques for coming up with good names can’t be seen in the end product (the final code or API) in the same way that, say, algorithms are. I can look at code, see what algorithm is being used, and use that algorithm in my own project. But seeing a good choice of name doesn’t tell me how that name was come up with.

So I thought I’d explain the documentation trick, which is pretty cool technique when it can be used.

The approach is to write up some documentation describing the functionality that needs a better name, and then letting that natural language guide us as to what naming to use.

What’s especially fun about this trick is that it can work even if someone else has done all the hard work of writing up the descriptive documentation :D

I’ll give you a couple of examples from Meteor. Meteor, as an open-source project, benefits from Linus’s law “given enough eyeballs, all bugs are shallow” (that is “given a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix will be obvious to someone”). It this case, there were a couple of namings that I was able to suggest.

In Meteor version 0.6.2 and before, here’s what the documentation looked like for options to the Meteor.Collection constructor:

Options

manager   Object

The Meteor connection that will manage this collection, defaults to Meteor if null. Unmanaged (name is null) collections cannot specify a manager.

The first time I saw this, I didn’t know what a “manager” did or what it was for. Eventually I figured out that what you’d pass for the “manager” option is the object returned by Meteor.connect(). That is, it’s a way to open a collection hosted on a different server.

Can we come up with a better name for the option? Let’s take another look at the description...

The Meteor connection that will

The technique is to observe how the object is described or used, and then see if that natural language can be used for the name. In this case, how about calling the option itself “connection”? Indeed, here’s how the option now appears in Meteor:

Options

connection   Object

The Meteor connection that will manage this collection...

You can see why I call this a “trick”, it’s almost cheating! The Meteor team came up with the vision, the design, the implementation, the tests, the examples, the documentation... and my own contribution was to notice how language was already being used.

An analogy might be that of an editor for a book: an editor, as a second pair of eyes, can see and suggest improvements... but it’s still the author's work.

As another Meteor example, consider this naming in the first draft of the “Deps” package:

Deps.Variable

A Variable represents an atomic unit of reactive data that a computation might depend on. Reactive data sources such as Session or Minimongo internally create different Variable objects for different pieces of data, each of which may be depended on by multiple computations. When the data changes, the computations are invalidated.

Variables don't store data, they just track the set of computations to invalidate if something changes. Typically, a data value will be accompanied by a Variable that that affords dependency tracking...

This already represents an evolution, as it was originally called a “_ContextSet”. So you can see how the concept is emerging from an initially internal data structure to an explicit, reusable part of the API.

But “Variable” isn’t quite hitting the mark yet; a “variable” usually means a container for a value, so it’s an awkward name for something that doesn’t have a value of its own (“Variables don’t store data”).

Are there clues in the draft for a better name? Further on was a code example:

var weatherDeps = new Deps.Variable;

Notice how the variable got called a weatherDeps instead of a weatherVar. We could “fix” this by naming the instance after the class...

var weatherVar = new Deps.Variable;

but what if we went the other way, and instead named the class after the instance...?

var weatherDeps = new Deps.Deps;

Um, well. What is a “Deps” anyway?

Deps

Meteor has a simple dependency tracking system which allows it to automatically rerun templates and other computations...

Hmm, OK, so how does naming it a “Dependency” work out?

var weatherDeps = new Deps.Dependency;

Deps.Dependency

Dependencies don't store data themselves, they just track the set of computations to invalidate if their associated data changes. Typically, a data value will be accompanied by a Dependency that affords dependency tracking...

Cool. The latest documentation makes it more clear that the thing that depends on a Dependency is a computation:

Dependencies don't store data, they just track the set of computations to invalidate if something changes. Typically, a data value will be accompanied by a Dependency object that tracks the computations that depend on it...

This may not be the last word in the evolution of the API. But we can see the incremental progress: saying “dependencies don’t store data themselves” makes more sense than “variables don’t store data themselves”.

The documentation trick looks for mismatches between the API and documentation. Normally of course we strive to make the documentation congruent with the API. This trick goes the other way: to look at the documentation and usage examples for places where the description or usage isn’t quite matching up with the API, and then see if it is suggesting a better naming to us. If so, perhaps we can change the API instead of fixing the documentation! :-)