Weblog entry #81 for dkg

Please use unambiguous tag names in your DVCS
Posted by dkg on Fri 6 May 2011 at 21:05
Tags: ,
One of the nice features of a distributed version control system (DVCS) like git is the ability to tag specific states of your project, and to cryptographically sign your tags.

Many projects use simple tag names with the version string like "0.35". This is a plea to ensure that the tags you make explicitly reference the project you're working on. For example, if you are releasing version 0.35 of project Foo, please make your tag "foo_0.35" for security and for future disambiguation.

There is more than one reason to care about unambiguous tags. I'll give two reasons below, but they come from the same fundamental observation: All git repositories are, in some sense, the same git repository; some just store different commits than others.

it's entirely possible to merge two disjoint git repositories, and to have two unassociated threads of development in the same repo. you can also merge threads of developement later if the projects converge.

Avoid tag replay attacks
Let's assume Alice works on two projects, Foo and Bar. She wraps up work on a new version of Foo, and creates and signs a simple tag "0.32". She publishes this tag to the Foo project's public git repo.

Bob is trying to attack the Bar project, which is currently at version 0.31. Bob can actually merge Alice's work on Foo into the Bar repository, including Alice's new tag.

Now looks like there is a tag for version 0.32 of project Bar, and it has been cryptographically-signed by Alice, a known active developer!

If she had named her tag "foo_0.32" (and if all Bar tags were of the form "bar_X.XX"), it would be clear that this tag did not belong to the Bar project.

Be able to merge projects and keep full history
Consider two related projects with separate development histories that later decide to merge (e.g. a library and its most important downstream application). If they merge their git repos, but both projects have a tag "0.1", then one tag must be removed to make room for the other.

If all tags were unambiguously named, the two repos could be merged cleanly without discarding or rewriting history.

I noticed this because of a general principle i try to keep in mind: when making a cryptographic signature, ensure that the thing you are signing is context-independent -- that is, that it cannot be easily misinterpreted when placed in a different context. For example, do not sign e-mails that just say "I approve" -- say specifically what you are approving of in the body of the signed mail. Otherwise, someone could re-send your "I approve" e-mail In-Reply-To a topic that you do not really approve of.

By extension, signing a simple tag is like saying "the source tree in this specific state (and with this specific history) is version 0.3". A close inspection of the state and the history by a sensitive/intelligent human skilled in the art of looking at source code can probably figure out what the project is from its contents. But it's much less ambiguous to say "the source tree in this specific state (and with this specific history) is version 0.3 of project Foo".

Once you start using unambiguous tag names, you make it safe for people to set up simple tools that do automated scans of your repository and can take action when a new signed tag appears. And you respect (and help preserve) the history of any future project that gets merged with yours.


Comments on this Entry

Posted by Anonymous (89.27.xx.xx) on Fri 6 May 2011 at 23:08
If the merge of a signed tag was not fast-forward, there should be a merge commit on top of that, shouldn't there?

Giving tags appropriate names is a good idea, there's no point backing it up with paranoia.

[ Parent | Reply to this comment ]

Posted by dkg (2001:0xx:0xx:0xxx:0xxx:0xxx:xx) on Sat 7 May 2011 at 17:01
[ Send Message | View dkg's Scratchpad | View Weblogs ]
I don't see why the fast-forward-ness of a merge is necessarily relevant. For example, an automated system processing git updates could just as easily explicitly do checkouts instead of merges if it's processing a newly-discovered signed tag.

I'm not arguing for paranoia; i'm arguing for robust systems with predictable and useful behavior, even (in the worst case) despite malicious agents trying to interfere. This is plan-for-the-worst-case engineering. It doesn't actually take much extra work to do it, and the end result is better for everyone.

[ Parent | Reply to this comment ]

Posted by Anonymous (24.6.xx.xx) on Fri 6 May 2011 at 23:20
If you're doing something as advanced as merging between projects -- which frankly, I've never even contemplated before -- shouldn't you make sure *not* to merge the tags with it? I mean, you have to explicitly push tags up to a remote, you'd think that would be an option with merging too, no?

[ Parent | Reply to this comment ]

Posted by dkg (2001:0xx:0xx:0xxx:0xxx:0xxx:xx) on Sat 7 May 2011 at 17:13
[ Send Message | View dkg's Scratchpad | View Weblogs ]
If i was merging projects, i'd prefer to keep all the old tags of both projects as a way of explicitly representing the history. Why would you want to drop them? Are there other parts of history that you'd like to drop when you merge projects?

[ Parent | Reply to this comment ]

Posted by Anonymous (76.206.xx.xx) on Fri 6 May 2011 at 23:49
It is very easy to rename a signed tag. For example:

git push . v0.15:refs/tags/the-best-release-ever

Oh no! But consider what happens when you verify that tag with "git tag -v": it shows the actual payload that was signed. In other words, it is the annotation (which should certainly explain context) and the content of the commit that git allows a person to sign, not the name used to refer to it.

[ Parent | Reply to this comment ]

Posted by dkg (2001:0xx:0xx:0xxx:0xxx:0xxx:xx) on Sat 7 May 2011 at 15:34
[ Send Message | View dkg's Scratchpad | View Weblogs ]
Wow, thanks for pointing this out; i hadn't realized that the tag name listed by git could be entirely unrelated to the signed tag data. This makes git tag -l seem less useful to me, but I suppose i need to use git tag -v $(git tag -l) anyway if i actually care about the cryptographic properties of the tag.

I still stand by my suggestion of choosing unambiguous names, though -- having an unambiguous tag name makes it easier for automated tools which process tags to ensure that they've gotten the right thing (without requiring them to parse the human-readable commentary in tag annotation).

[ Parent | Reply to this comment ]

Posted by Anonymous (71.58.xx.xx) on Sat 14 May 2011 at 16:04
If you have someone with commit access to the (a) public repository trying to actively subvert your project, you have much, much bigger problems than ambiguous tag names.

[ Parent | Reply to this comment ]

Posted by dkg (2001:0xx:0xx:0xxx:0xxx:0xxx:xx) on Sun 15 May 2011 at 18:13
[ Send Message | View dkg's Scratchpad | View Weblogs ]
If you have that happening, i agree that it's problematic. That's one of the problems that signed tags have the potential to address, though. But it will only address the problem properly if you use context-independent tag names.

[ Parent | Reply to this comment ]

Posted by Anonymous (69.209.xx.xx) on Wed 18 May 2011 at 17:32
In the context of git, I hope you mean "it will only address the problem properly if you use context-independent tag descriptions". If I try to pass off GNU Interactive Tools 4.3.9 as a futuristic version of git, someone can tell me, "no, that says 'GNU interactive tools', not git, and it's signed by Ian Beckwith, not Junio C Hamano".

Otherwise there is just the illusion of security. That a name points to a signed tag means nothing --- in fact, although tag objects store the original name, git allows you to use whatever name you wish to refer to an existing signed tag.

[ Parent | Reply to this comment ]

Posted by dkg (216.254.xx.xx) on Wed 18 May 2011 at 19:06
[ Send Message | View dkg's Scratchpad | View Weblogs ]
Actually, i do think that the names should be independent, because names are concise and simple for automated systems to inspect and act on. I certainly wouldn't want an automated system (e.g. a build-daemon) to parse and interpret the entire human-readable tag description.

The fact that a tag is signed by some arbitrary person shouldn't be relevant; any decent automated system would need to know whose signatures it was looking for, so something signed by Ian Beckwith wouldn't be accepted for a system tuned to look for signatures from Junio C Hamano.

As a developer, i hereby commit to never working on a project with a name that precisely collides with another name of a project i've ever worked on. I don't think this is such a radical position to take. (though maybe Ian Beckwith wouldn't want to adopt it since his project's name was poached by a higher-profile project).

[ Parent | Reply to this comment ]

Posted by Anonymous (69.209.xx.xx) on Sat 21 May 2011 at 03:07
Great, provided your automated system knows where to look.

Many projects use signed tags with annotations that are concise and easy for automated systems to act on. "GNU Interactive Tools 4.3.9" would be an example of a good annotation for humans and computers.

As you just observed, the problem with declaring "I will never work on a project whose name conflicts with another project's" is that you can't always know in advance.

Now, none of what I said above is a reason _not_ to use tags with names like "xorg-server-1.0.1". The only reason I've been repeating this pedantic point is that security requires that your readers understand the system that's being used. Advice that can be easily misread as "please choose an unambiguous name for your git tag refs so the signatures can be more meaningful" has the danger of undermining that. :(

[ Parent | Reply to this comment ]

Posted by Anonymous (91.153.xx.xx) on Tue 17 May 2011 at 23:28
When merging two projects it's always possible to rename tags easily with git filter-branch.

Moreover, there will be support for remote namespaces for tags, just like branches, conflicts can be solved.

See message-id AANLkTi=yFwOAQMHhvLsB1_xmYOE9HHP2YB4H4TQzwwc8@mail.gmail.com ([1.8.0] Remote tag namespace). (I can't post links)

[ Parent | Reply to this comment ]