Yes, there was a bug in my vint64 encapsulation commit. I will neither confirm nor deny any conjecture that I left it in there deliberately to see who would be sharp enough to spot it. I will however note that it is a perfect tutorial example for how you should spot bugs, and why revisions with a simple and provable relationship to their ancestors are best
To do large code changes correctly, factor them into a series of smaller steps such that each revision has a well-defined and provable relationship to the last.
(This is the closest I’ve ever come to a 1-sentence answer to the question “How the fsck do you manage to code with such ridiculously high speed and low defect frequency? I was asked this yet again recently, and trying to translate the general principle into actionable advice has been on my mind. I have two particular NTPsec contributors in mind…)
So here’s a case study, and maybe your chance to catch me in a mistake.
The net benefit of having anything that can be called a software development methodology is in inverse proportion to the quality of your developers – and possibly inverse-squared.
Any software-development methodology works well on sufficiently small projects, but all scale very badly to large ones. The good ones scale slightly less badly.
One thing all methodologies tagged “agile” have in common is that they push developers away from the median. Competent developers work better; mediocre developers don’t change; incompetent ones get worse.
Software metrics have their uses. Unfortunately, their principal use is to create the illusion that you know what’s going on.
Structured development methodologies have their uses, too. Unfortunately, their principal use is to create an illusion of control.
Trust simple, crude metrics over complex ones because the simple ones are less brittle. KLOC is best, though a poor best.
Agile development is efficient only in environments where the cost of flag days is low. Otherwise, slow down, and take time to think and write about your architecture.
Good programmers are difficult to control; great ones are nearly impossible to control. Different methodologies require different kinds and degrees of control; match them to your developers wisely.
Process is not a good substitute for judgment; when you have to use it as one, your project is probably too large. Sometimes this is unavoidable, but don’t fool yourself about what that will cost you.
The difference between O(n**2) and O(n log n) really matters. It’s why lots of small teams working coordinated small projects works better than one big team building a monolith.
A million dollars is roughly a 55-gallon oil drum full of five-dollar bills. Large-scale software development is such a difficult and lossy process that it’s like setting fire to several of these oil drums and hoping a usable product flutters out of the smoke. If this drives you to despair, find a different line of work.
Haven’t been blogging for a while because I’ve been deep in coding and HOWTO-writing. Follows the (slightly edited) text of an email I wrote to the NTPsec devel list that I I think might be of interest to a lot of my audience.
One of the questions I get a lot is: How do you do it? And what is “it”, anyway? The question seems like an inquiry into the mental stance that a systems architect has to have to do his job.
So, um, this is it. If you read carefully, I think you’ll learn a fair bit even if you haven’t a clue about NTP itself.
While most of the NTPsec team was off at Penguicon, the NTP Classic people shipped a release patched for eleven security vulnerabilities in their code. Which might have been pretty embarrassing, if those vulnerabilities were in our code, too. People would be right to wonder, given NTPsec’s security focus, why we didn’t catch all these sooner.
In fact, we actually did pre-empt most of them. The attack surface that eight of these eleven security bugs penetrate isn’t present at all in NTPsec. The vulnerabilities were in bloat and obsolete features we’ve long since removed, like the Mode 7 control channel.
I’m making a big deal about this because it illustrates a general point. One of the most effective ways to harden your code against attack – perhaps the most effective – is to reduce its attack surface.
Thus, NTPsec’s strategy all along has centered on aggressive cruft removal. This strategy has been working extremely well. Back in January our 0.1 release dodged two CVEs because of code we had already removed. This time it was eight foreclosed – and I’m pretty sure it won’t be the last time, either. If only because I ripped out Autokey on Sunday, a notorious nest of bugs.
Simplify, cut, discard. It’s often better hardening than anything else you can do. The percentage of NTP Classic code removed from NTPsec is up to 58% now, and could easily hit 2/3rds before we’re done,
The British have a phrase “Too clever by half”, It needs to go global, especially among hackers. It can have any of several closely related meanings: the one I mean to focus on here has to do with overconfidence in one’s intelligence or skill, and the particular bad consequences that can have. It’s related to Nassim Taleb’s concept of a “fragilista”.
I’ve been getting deeper into timekeeping and calendar-related software the last few years. Besides my work on GPSD, I’m now the tech lead of NTPsec. Accordingly, I have learned a great deal about time mensuration and the many odd problems that beset calendricists. I could tell you more about the flakiness of timezones, leap seconds, and the error budget of UTC than you probably want to know.
Paradoxically, I find that studying the glitches in the system (some of which are quite maddening from a software engineer’s point of view) has left me more opposed to efforts to simplify them out of existence. I am against, as a major example, the efforts to abolish leap seconds.
Last week I decided the time had come to bite the bullet and systematically port the fairly large volume of Python code I maintain from Python 2 to Python 3.
I straightaway ran into a problem, which is that for my purposes the Web resources on on how to do this are pretty awful. And not just in the general, unsurprising sense of being way too full of theory and generality and way too light on practical advice, either.
No, there’s a more specific problem as well. I write systems programs, things like SRC and reposurgeon that have to be able to do string-bashing-like things on binary data without upchucking or (worse) silently mangling that data.
Due to the Python 3 decision that strings are sequences of Unicode code points rather than bytes, this is significantly more difficult in Python 3 than it was in Python 2.
Do we make too many of our software tools automatons when they should be judgment amplifiers? And why don’t we write more DSLs?
Back in the Renaissance there was a literary tradition of explaining natural philosophy via conversations among imaginary characters. I’m going to revive that this evening because I had an IRC conversation this afternoon, about the design insights behind reposurgeon, that pretty much begs to be presented this way.
The person of “Simplicio” was Galileo’s invention in his Dialogue Concerning the Two Chief World Systems. Here he represents four different people, but almost everything he says is something one of them in fact said or very plausibly might have. I’ve cleaned it up, edited, and amplified only a little.
For those of you coming in late, reposurgeon is a tool I wrote for editing version-control histories. It has many applications, including highest-quality repository conversions. Simplicio needed to excise some security-sensitive credentials from a DHS code repository – not just from the tip version but from the entire history. Reposurgeon is pretty much the only practical way to do this.
So, without further ado…
I made a really common and insidious programming mistake recently. I’m going to explain it in detail because every programmer in the world needs the reminder not to do this, and I hope confessing that even “ESR” falls into such a trap will make the less experienced properly wary of it.
Our sutra for today expounds on the sayings of the masters Donald Knuth and Ken Thompson, who in their wisdom have observed “Premature optimization is the root of all evil” and “When in doubt, use brute force.”
The SCCS back end to SRC doesn’t support named symbolic references to numbered revisions, because SCCS masters don’t include a symbol table. This is one of the things RCS added.
Goddess help me, I’ve figured out how to shoehorn in this feature. And probably should not do it.
I just released version 1.7 of SRC, Simple Revision Control.
For those of you late to the party, SRC is a simple version control system for directories full of small standalone files like FAQs, scripts in your ~/bin, dotfiles, and so forth – cases where you don’t want multi-file changesets. It’s actually a Python wrapper around RCS (or, optionally, SCCS) but gives you integer sequential version numbers, lockless operation, and a modern low-friction UI modeled on Subversion’s.
With 1.7, I think it’s finished – the last two user-visible features I had planned were SCCS support and DOT visualization, and those are done now.
I believe SRC is now feature-complete for its functional niche. Am I mistaken? Is anything missing? Did I do anything that seems wrong?
I know SRC has had real users since about 0.3. If you are an SRC user, please check in in the comments. Most importantly, tell me if you need any feature it doesn’t have. I’m also curious if the actual use cases are any different than I expected, and I am all agog to know if anyone actually has a use for the SCCS support.
I needed a break from serious work yesterday, so SRC now speaks SCCS as well as RCS. This wasn’t difficult, I had SRC carefully factored in anticipation from when I originally wrote it.
I can’t say I think this feature will be actually useful; SCCS is pretty primitive, and the SRC support has some annoying limitations as a result. But some hacks you do just because you can, and this is one of them.
If you were reading A&D a year ago, you may recall that I invented a new version-control system to occupy an odd little niche that none of the exiting ones serve very well.
Well, actually, it’s a shell around a very old version-control system that makes a reasonable fast version-storage manager but has a crappy UI. Thus, SRC – RCS reloaded, with a mission to serve cases where you don’t want per-directory changesets but prefer each file to have its own separate change history. Like a directory full of separate FAQs, or your ~/bin full of little scripts.
SRC gives you a modern UI in the svn/hg/git style (but much, much simpler than git’s) and lockless operation. It has full embedded documentation and an Emacs VC backend. If your little project goes multi-file, you can instantly fast-export to git.
Today I shipped Version 1.0. This could have happened sooner, but I’ve been focusing on NTPsec pretty hard in the last year. There was one odd bug in the behavior of multi-file commands that I just hadn’t got around to fixing. (Yes, you can do multi-file commands, but the files still have separate histories.)
The whole thing is just 2KLOC of Python, and that’s with the rather extensive embedded documentation. The sort of person who frequents this blog might find the FAQ entertaining.
I struck a small blow for better security today.
It started last night on an IRC channel with A&D regular Susan Sons admonishing the regulars to rotate their ssh keys regularly – that is, generate and export new key pairs so that is someone cracks the crypto on one out of your sight it won’t be replayable forever.
This is one of those security tasks that doesn’t get done often enough because it’s a fiddly pain in the ass. But (I thought to myself) I have a tool that reduces the pain. Maybe I should try to eliminate it? And started hacking.
The tool was, until yesterday, named ssh-installkeys. It’s a script wrapper written in Python that uses a Python expect engine to login into remote sites and install (or remove) ssh public keys. What makes it useful is that it remembers a lot of annoying details like fixing file and directory permissions so your ssh server won’t see a potential vulnerability and complain. Also, unlike some competing tools, it only requires you to enter your password once per update.
Some time ago I taught this code to log its installations in a config file so you have a record of where you have remote-installed keys. I realized that with a little work this meant I could support a rotate option – mass-install new keys on every site you have recorded. And did that.
I’ve been meaning for some time to change the tool’s name; ssh-installkeys is too long and clumsy. So it’s now sshexport. I also updated it to know about, and generate, ed25519 keys (that being the new hotness in ssh crypto).
In order to reduce the pain, sshexport can now now store your passwords in its list of recorded sites, so you only have to enter the password the first time you install keys and all later rotations are no-hands operations. This doesn’t actually pose much additional security risk because by hypothesis anyone who can read this file has read access to your current private ssh keys already. The correct security measure is whatever you already do to protect other sensitive data in your dot directories, like GPG directories and web passwords stored by your browser. I use drive encryption.
The result is pretty good. Not perfect; the big missing feature is that it doesn’t know how to update your keys on sites like GitLab. That would take a custom method for each such site, probably implemented with curl. Perhaps in a future release.
I released reposurgeon 3.30 today. It has been five years and a month since the first public release.
In those five years, the design concept seems to have proved out very well, finding use in many repository conversions. But the project exhibits an unusual sociology; I don’t get lots of casual contributors, only a few exceptional ones.
You’ve heard me uttering teasers about it for months. Now it’s here. The repository is available for cloning; we’re shipping the 0.9.0 beta of NTPsec. You can browse the web pages or clone the git repository by one of several methods. You can “wget https://github.com/NTPsec/ntpsec/archive/NTPsec_0_9_0.tar.gz” to get a tarball.
This is an initial beta and has some rough edges, mostly due to the rather traumatic (but utterly necessary) replacement of the autoconf build system. Also, our range of ports is still narrow; if you’re on anything but Linux or a recent FreeBSD the build may not work for you yet. These things will be fixed.
However, the core function – syncing your clock via NTP – is solid, and using 0.9.0 for production might be judged a bit adventurous but wouldn’t be crazy. The next few beta releases will rapidly get more polished. Expect them to come quickly, like within weeks.
Most of the changes are under the hood and not user-visible. A few auxiliary tools have been renamed, most notably sntp to ntpdig. If you read documentation, you will notice that what’s there has been massively revised and improved.
The most important change you can’t see is that the code has been very seriously security-hardened, not only by plugging all publicly disclosed holes but by internal preventive measures to close off entire classes of vulnerabilities (by, for example, replacing all function calls that can produce buffer overruns with memory-safe equivalents.)
We’ve already established good relations with security-research and InfoSec communities. Near-future releases will include security fixes currently under embargo.
If you consider this work valuable, please support it by contributing at my Patreon page.
Here’e where I attempt to revive and popularize a fine old word in a new context.
hieratic, adj. Of or concerning priests; priestly. Often used of the ancient Egyptian writing system of abridged hieroglyphics used by priests.
Earlier today I was criticizing the waf build system in email. I wanted to say that its documentation exhibits a common flaw, which is that it reads not much like an explanation but as a memory aid for people who are already initiates of its inner mysteries. But this was not the main thrust of my argument; I wanted to observe it as an aside.
In the wake of the Ars Technica article on NTP vulnerabilities, and Slashdot coverage, there has been sharply increased public interest in the work NTPsec is doing.
A lot of people have gotten the idea that I’m engaged in a full rewrite of the code, however, and that’s not accurate. What’s actually going on is more like a really massive cleanup and hardening effort. To give you some idea how massive, I report that the codebase is now down to about 43% of the size we inherited – in absolute numbers, down from 227KLOC to 97KLOC.
Details, possibly interesting, follow. But this is more than a summary of work; I’m going to use it to talk about good software-engineering practice by example.
NTPsec is preparing for a release, which brought a question to the forefront of my mind. Are tarballs obsolete?