Are tarballs obsolete?

NTPsec is preparing for a release, which brought a question to the forefront of my mind. Are tarballs obsolete?

The center of the open-source software-release ritual used to be making a tarball, dropping it somewhere publicly accessible, and telling the world to download it.

But that was before two things happened: pervasive binary-package managers and pervasive git. Now I wonder if it doesn’t make more sense to just say “Here’s the name of the release tag; git clone and checkout”.

Pervasive binary package managers mean that, generally speaking, people no longer download source code unless they’re either (a) interested in modifying it, or (b) a distributor intending to binary-package it. A repository clone is certainly better for (a) and as good or better for (b).

(Yes, I know about source-based distributions, you can pipe down now. First, they’re too tiny a minority to affect my thinking. Secondly, it would be trivial for their build scripts to include a clone and pull.)

Pervasive git means clones are easy and fast even for projects with a back history as long as NTPsec’s. And we’ve long since passed the point where disk storage is an issue.

Here’s an advantage of the clone/pull distribution system; every clone is implicitly validated by its SHA1 hash chain. It would be much more difficult to insert malicious code in the back history of a repo than it is to bogotify a tarball, because people trying to push to the tip of a modified branch would notice sooner.

What use cases are tarballs still good for? Discuss,,,

70 thoughts on “Are tarballs obsolete?

  1. By merely existing NTPSec is already asking a lot of distros. Asking them to take a chance on a your and daring fork of a long beloved project. Don’t make it any harder for distros to get on board.

    Maybe try it with gpsd? In that case distros are stuck doing it your way.

    RGDS
    GARY

  2. Some obvious downsides to git-only releases:
    – needs a running git server or equivalent
    – needs network connection to same
    – suffers from temptation to use specify releases by tag names rather than hashes, making it non-reproducible (due to possible retagging)

  3. “suffers from temptation to use specify releases by tag names rather than hashes” – which temptation can be neatly observed in the article: “Here’s the name of the release tag”

    Of course, it doesn’t help that ESR seems to love to rebase, so the original commit might not exist anymore. I think that may demonstrate the wider issue that you’ve hit one example of – not everyone uses git the same way.

    @Justin Andrusk A freestart collision is not a second preimage. That said, its interesting that your immediate instinct is to recommend a different plain digest list, rather than a digital signature.

  4. The use of SHA-1 in git is not yet direly broken. The latest and greatest attack on SHA-1 generates collisions for a slightly modified version of the hash function, and they estimate that it would cost about $100k to generate a collision for the real thing. That still wouldn’t be enough; to mess with the history of a git repo would require either (a) a particularly advanced kind of second-preimage attack, which seems a long way off, or (b) a legitimate-looking commit accepted from a malicious programmer, plus a chosen-prefix collision in SHA-1. These are much harder to find than regular collisions.

    I know this isn’t all that reassuring, but if you compare the difficulty of the above to the difficulty of messing with an unsigned tarball or inserting some underhanded code into a commit, I’d say it compares favorably.

  5. That’s at least a 5 year old question.

    FWIW, github can autogenerate a tarball from a tag. I don’t think they are alone in this. It seems to happen automagically on request. A dev can also specifically create a release with a tarball, and then you have additional features available like signing it.

    @Random832: Unless rebase is used to modify an existing tag, that’s probably an orthogonal discussion to this.

  6. I was under the impression that tags could be freely modified (without a rebase or anything); my point is that rebase/reposurgeon/filter-branch/etc would prevent commit hashes from being used for releases either.

  7. git “clones are easy and fast” IFF you have reliable, high-speed internet access.

    If you publish a tarball over http, someone can start to download that, have the download interrupted, and resume the download where they left off. Repeatedly, if necessary.

    `git clone` however, appears to have no such facility since it creates non-deterministic-by-design pack files for the transfer, and does not provide a way to request objects by hash (only by ref) so that the client could walk back through the DAG and incrementally build up a clone. Nor does it provide a way to request a byte range of an object specified by hash.[1] In fact, if the clone fails 99% of the way through, git will delete all the data it just fetched.

    You may decide this concern is not applicable to your use-case. After all, I’d expect distributors to have fat pipes. Potential contributors might not, for a variety of reasons. With determination and resources they could overcome lousy internet access, but each additional hurdle reduces the size of that group. Ideally, “someone” would fix the root cause of git assuming you can clone an entire repo in a single attempt. I hope you will at least give some thought to the limitation before consciously dismissing it.

    [1] If I have missed a way to fetch byte ranges of objects specified by hash from git, I’d love to hear about it.

  8. Well, I for one still make LZMA tarballs and so does Arch (yes I know that qualifies as a binary package, but it’s superior to dealing with .deb and its layouts if you ask me) due to the minimal size and ability to strip out stuff an end user wouldn’t want. Like build system input, git buildup (which is particularly bad in the projects you deal with from what I’ve seen Eric), .gitignored files. Whatever.

    I also have to often pull and build myself even if the webterface has snapshots enabled since work requires it at some point or another as a submodule so I guess my usecase is kind of hybrid. For what it’s worth though, LZMA really kicks ass in general. I have to say something some would consider uh… Heretical, but .tar.xz is pretty much the new generation of tarball. More and more people are using it in this capacity.

    I wonder how Slackware is doing by the way. Heh.

  9. I still do tar xzvf;./configure;make when my package management gets a bit behind. Often there’s a deb available but it requires upgrading hundreds of system libraries, a process which I expect will break something. Source tends to be a lot more flexible about the versions of its dependencies.

    Doing this from git is an option, but it’s nice to have a simple way to get exactly what I need, with some promise that it’s a working version.

  10. Here is how obsolete tarballs are in my mind:

    I wasn’t planning on NTPsec even doing a tarball drop, and I wasn’t even consciously aware of my deciding not to, until reading this.

    GitHub will dynamically generate tarballs on request, like so:
    curl -sL https://github.com/user-or-org/repo/archive/sha1-or-ref.tar.gz

    That’s all we need.

    I know enough about the dark grey world that I would not trust some naked tarball that someone gave me actually is a snapshot of what is in the canonical DVCS repo, without actually pulling from the DVCS and crosschecking myself, if if one is going to do that, just pull from the DVCS in the first place.

  11. Eric:

    I’m a younger user with a marginal programming skillset (I have the feeling I could probably develop that skillset well beyond marginal, but that’s where I am now). My use case, when I download sources from a project’s website, instead binaries of through my distribution’s repositories, is generally that I have been unable to find binary packages in the repositories or online, or, if I’ve found them, that they are an old version that does not contain a feature I want to play with. This doesn’t happen especially often, but often enough.

    I tend to prefer tarball downloads to repository clones because I’m in an age group where I learned “download archive and unpack” in childhood. It’s familiar and uses well rehearsed muscle memory.

    Meanwhile, as my programming skillset is marginal, I don’t feel quite confident working with version control. I can read directions and can read man pages, so if a project doesn’t make tarballs available I *can* clone the repository (and have done so on a few occasions), but, rationally or not, I feel like a bit of an Aunt Tilly.

    I don’t know if people like me are a common use case or not, but that’s just my $0.02.

  12. Mark: “I would not trust some naked tarball that someone gave me actually is a snapshot of what is in the canonical DVCS repo”

    Even when the someone is the official maintainer of the Linux distribution you use?

  13. Unique tarball releases that aren’t a 1:1 correspondence to the source repository as of any given tag (such as those our host has been doing for his projects) pretty much have only one advantage I can think of: they can include pre-generated configure scripts and *.in files for all the autotools scaffolding that are largely backwards- and forwards-incompatible cross-version, whereas the generated files can usually continue running for a good while. Giving the distaste for autotools among this crowd, this is not a compelling argument in favor of these tarball distributions.

    Packagers and distributions are usually in favor of tarballs rather than checking out from a VCS. It reduces build-time dependencies (typically only tar and xz or gzip) and provides easier checksum-based verification for tampering; although even this is somewhat negated by the increasing trend to use OpenPGP signatures.

    Personally, I prefer to not even make the dedicated tarball distributions anymore. Git can generate them, and as noted a few times in these comments, GitHub (among others) can automatically provide them. Contrary to the apparently believe, this not a GitHub feature, it is a function of the git archive command (eg: git archive -o reposurgeon-3.29.tar.gz –prefix reposurgeon-3.29/ 3.29), and that actually makes it more powerful. We can choose to not blindly trust a service like GitHub. If Eric signs tags with an OpenPGP signature (quite easily done with git tag -s), the tarball can be generated by everyone in a deterministic fashion via git-archive, and we can verify whether or not it matches. Most web services for Git support downloading tags (or any tree) by dynamically generating the tarballs on the fly, which is still useful for keeping build-time dependencies low (no Git required), and the packager themselves can go to the effort of verifying a repository’s OpenPGP signature and the checksum of a generated tarball from such. With sha256, this is pretty well-secured against any malice on the end of a Git hosting service.

  14. That was supposed to be “apparently popular belief” — sorry for the lack of proofreading. WordPress seems to have ate some of the HTML I used too (I can never figure out what’s allowed or disallowed…), but the message is still clear. :)

  15. On my projects I link directly to the Gitlab generated tarball of release tag. That way I don’t have to create tarball, but users who want the stable release tarball can download it without Git, if they want to.

    Only thing I have to do (apart from pushing new tags) is to update the download link (which I can automate).

  16. Tarball has the advantage that it can be put on CD / DVD, on USB disk, on BitTorrent (resumable), on FTP(S) (also resumable). You can add to tarball generated files (generated Makefile, generated ./configure script if any, documentation in ready to read format: man, html, info, etc.) which are not and should not be put under version control. Also, the user does not need to have Git installed.

    Auto-generated tarballs by the hosting site or web interface, be it gitweb, GitHub or GitLab, doesn’t allow to add generated files; and if the name of generated tarball is not -. then it doesn’t count. .tar.gz is horrible, especially that they are not ordered so you cannot guess from names which file refers to later version.

    As to signing (you GPG sign tags for releases, don’t you?), there are always detached GPG signatures (*.sig or *.asc files).

  17. What about the case of a corporate network behind a firewall? A tarball is easier to import than punching a whole in the firewall for the github.

  18. Tar balls are very important for people like me living on a mobile internet where every single byte is expensive.

  19. Jakub, I’m having trouble parsing your second paragraph. Git-generated tarballs are most certainly deterministic.

  20. Traditional releases are important for cross-organization collaboration.

    Let me unpack that.

    When you work within a closely knit development team, your best bet is to have a shared repository and to just use HEAD of that repository all the time. Among other things, if one of you makes a change that causes harm to the other people, then you revert the change and then talk through how to fix things in a way that makes everyone happy. Life got way easier for GWT developers once we started using Google’s main repository instead of our own; there was some worry around the team that we would be breaking all of Google all the time, but it turns out that it’s not really a problem so long as you are sure to be available for an hour or so after a commit so that you can back things out if need be. Once your change has been in for a day or two, it is everyone *else’s* responsibility not to break you.

    This kind of collaboration doesn’t work for collaboration across multiple organizations, though. You fundamentally don’t have the high-bandwidth, highly responsive interaction between each other that people within the same organization have. In fact, that’s probably the key characteristic of “being part of an organization” at all: you talk to each other, all the time, in a high-bandwidth fashion.

    For multiple organizations, it’s important to have an actual release. Then the people in one of them can adopt artifacts from the other one by retrieving a release and checking it into their own VCS system.

    Git has a couple of problems for using it as your release mechanism. One problem is that you might want to change in the future to a different system! The other problem is that Git tags are really easy to move. There needs to be some sort of publication mechanism, where once you’ve put the thing out there, it’s very unlikely to change. Even if you informally commit to never moving certain Git tags, a consumer of your GitHub repository has no way to be sure about it.

  21. > I know enough about the dark grey world that I would not trust some naked tarball that someone gave me actually is a snapshot of what is in the canonical DVCS repo

    I don’t know why you should trust that, or should even want it to be so.

    The point is that the tarball is an artifact of its own, standing apart from the repo, and the project’s release manager stands by it (and publishes a SHA512-sum of it, a digital signature, etc, whatever security you want.)

    The fact that git uses SHA-1 (whatever you think of the present and future security of SHA-1 itself) isn’t really a defense against being given a *whole fake git repo* by a MITM. And if the git repo’s origin can be validated by an HTTPS certificate, so can the tarball.

  22. @Mike Swanson “they can include pre-generated” ANYTHING, not just autoconf crap.

    For example, ESR’s own tarball for showkey, to pick a simple example, includes the manpage in nroff format, whereas the repo only contains an XML file which requires a whole docbook toolchain that I don’t want to mess with.

  23. Git commits and tags can be signed by GPG keys, which I would consider an important aspect of making a git-based release, especially for something as important as ntpd.

    Unfortunately, this ends up poorly exposed by many things, including the git command-line API, GitHub (no indication whatsoever that a commit/tag is signed, let alone by who/what), etc., but it can still be useful to packagers.

  24. Lagg,

    Slackware is doing just fine with many contented users including myself. There has been renewed interest in it from the contingent of Linux users intent on resisting with all their might the fact that systemd is the new standard.

  25. Unfortunately, this ends up poorly exposed by many things, including the git command-line API

    Both “git tag” and “git log” make it trivial to display this information and verify the signature. I don’t know why you think it’s poorly exposed here…

  26. A tarball is easier to import than punching a whole in the firewall for the github.

    If your firewall allows HTTP but blocks SSH, your firewall is broken.

  27. “Both “git tag” and “git log” make it trivial to display this information ”

    “Trivial”, in my opinion, would be displaying it without a command-line parameter at all. It’s merely “easy”, in that you must pass a command-line parameter or (I assume) tweak a configuration flag. (Unless they’ve changed this recently.)

    The net result of it being merely “easy” instead of “trivial” is that the average git user, when presented with a repository with signed commits, will never realize anything has been signed, unless explicitly told.

  28. @Mike Swanson: I forgot that WordPress doesn’t escape < and >, and it removes unknown HTML tags.

    It was meant to read that projectname-version.tar.gz is good, sha-1-of-version.tar.gz is bad.

  29. > The other problem is that Git tags are really easy to move.

    Well, that’s maybe technically true (though not in entirety: changing tag in remote repository is not that easy IIRC), but tags, at least signed and annotated tags which are meant for collaboration (as opposed to lightweight tags), really should not change.

  30. Nb. with tarball and spec file (generated from spec.in, for example setting version number) you can very, very easy generate RPM package for RedHat derivatives (RHEL, CentOS, Fedora, Scientific Linux, SuSE,…).

    I guess that DEB path (Debian, Ubuntu) also assumes having tarball…

  31. > Git commits and tags can be signed by GPG keys

    I wonder how reposurgeon deals with this. Or rebase, or filter-branch, or anything else.

    Maybe it’d be better if Git provided a way to sign a tree, or a combination of a tree and message without caring about (or vouching for) what the history behind it is. The hash chain’s strength is also its weakness here. Then if that data doesn’t change, the existing signature can be copied as-is.

  32. @Random832: You can tag, and thus sign a tree in Git, but in many places Git assumes that you are signing commits. The very first tag in Git history, created with proto-Git, is IIRC such tag – pointing to a tree (and providing test case for UIs and tool ;-)

    But rebase / filter-branch / reposurgeon often changes the tree too. Signed tag is here to know “this is ZZZ version, I know its contents”.

  33. Often, but not always (I don’t think a “normal” rebase with no conflicts, no edits, and no discards, will result in the final tree being different from the original final tree, even if the trees in the middle are lost). Plus, in principle, you could stash the original tree in a merge bubble, but you could maybe do that with the original commit too.

  34. > Git commits and tags can be signed by GPG keys

    I wonder how reposurgeon deals with this. Or rebase, or filter-branch, or anything else.

    Largely by ignoring the problem and preserving the signature as-is. If you don’t do anything that messes with the history to the point of a singature, everything will still work out fine when it goes back out as a repository. If you do, well… they now all fail to verify and you probably should either re-sign (by retagging) or remove the signature entirely.

  35. >> Git commits and tags can be signed by GPG keys

    > I wonder how reposurgeon deals with this. Or rebase, or filter-branch, or anything else.

    The git filter-branch command has –tag-name-filter… which is usually used to have the same names of tags. git-filter-branch always strips signatures from rewritten commits.

    reposurgeon could in general, even re-sign tags (with specified private key, which might be different from original key), though I guess that it doesn’t have this capability.

  36. git clone –depth 1 XXX is fast. Try doing a git clone of something like one of your BSD conversions. Many don’t know about that and might not notice until they get overage fees or their disk fills up or it is taking over 10 hours.

    Normally it MUST be an encrypted channel, either ssh or https, but that really isn’t needed (over tarballs and hash-sums). I’m not sure if I can create a shell script with git that works that would
    be the equivalent of “wget http://XXX.tar.bz2…”. Note you open the authentication, certificate, protocol nonsense here. Either git ignores the CA chain (making it useless) or requires you to install or register (ssh) a cert for each repo.

    The git cloned repo also includes git metadata (so things like git diff, git pull, git merge, etc) work, which probably aren’t of interest just for a build.

    Git has a “submodule”, so you could build a system with a bunch of subtrees – subrepos, but then you need to execute the extra commands to handle them. A tarball can have a script or readme to say to unpack another tarball at a specific location. Maybe something similar can be done for a git with dependencies, but it might prove confusing for a while.

    Most places have NOT setup git servers, so is everything going to github (with their community standards)?

    For me, Raspberry PI has been a nightmare because of disparate GITs. The “binary” release tree (Github: Hexxeh/rpi-firmware) has an arcane mechanism to find the tag for the raspian source which it was built from. I tried automating it, but it took forever to get the repos (even with limiting –depth). This was just to build a module for some hardware that wasn’t in the main tree. Every time rpi-firmware updated, I’d lose a half dozen supported devices. Then some were patches which made things worse.

    “Pointing at a tag” isn’t as simple as it sounds. Most would just git clone, then checkout, then build, so it might be the current alpha or beta instead of the release tag.

    When you get dozens of messages about NTPsec not working because they didn’t compile the exact release tag, are you going to patiently explain to everyone where things went wrong?

    Tarballs are static – fixed – there is no question what version. They contain no metadata or long strings of patches to create different versions. They can be bit-torrented if large. Packages like .deb files are merely encapsulated tarballs.

  37. It was mentioned earlier, but I’ll say it short and sweet and loud:

    Dependencies, Dependencies, Dependencies!

    It’s much better these days, but as a person who is primarily a user of Unix/Linux software, both tarballs and repo clones suffer the same problem; Building the software on my machine almost invariably means a bunch of work to go hunt down at least 1/2 a dozen different libraries that are missing on my system for whatever reason. It seems to me that dev/hacker types tend to have these things installed already and often aren’t aware that the rest of us need to go through this suffering.

    Binary packages, RPM’s and .deb’s save us from that work, and that’s why they’re so highly valued.

    But my smell of it is that the cool kids these days are leaving behind even package management are all about containers; loads of stuff on the public Docker registry nowadays. NOT the right thing for low level plumbing like NTPsec, but for applications it’s all the rage.

  38. “Binary packages, RPM’s and .deb’s save us from that work, and that’s why they’re so highly valued.”

    This a thousand times over. Speaking as an engineer/admin/that guy who makes stuff work, it is incredibly frustrating to be dealing with software that is only available as a source repo. Having a real package with real dependency resolution can be the difference between “apt-get install blah blah okay done” and spending days figuring out what the software expects to be present before it will stop screaming ERROR 26 SYSTEM RETURNED CTHULU. It’s bad enough when the offending developers are two cubes over; it’s worse for 3rd-party material.

    Nearly as irritating are python packages that are only available via pip rather than in a system-native form. Sure, it’s cross-platform and relatively easy on the devs. It also means I now have two competing and incompatible package managers to worry about. A recent funhouse episode: On the current Ubuntu LTS, pip installing urllib (or maybe requests, I forget which) will cheerfully break the system-installed pip.

    This is especially offensive for libraries, because now if I write software depending on the library then I can’t provide native packages of my own, unless I feel like repackaging your library for you.

    I’m not a professional dev, but it seems to me that if someone *knows* their package hasn’t made it into the distro pipeline yet, they should at least provide built packages with dependencies for the most common target platforms. Why github et al don’t provide support for apt/yum repo hosting alongside tarballs and whatever else is beyond me.

  39. ESR, I think you’re blinded to the state of the masses by the environment you work in.
    There are still lots of people who build things from source, but don’t know how to use git. It’s more mentally complex than “download, extract”: an average user can click a link, save, open a file manager, and use an archiver to extract it; but with git an ‘average user’ has to open the package manager, install git, open a terminal, and run git clone someurl; and they have a little more work to do before they can point to some sort of results (ie, clicking a link gives you a copy of the source vs. copy-pasting some command into a terminal gets you the source).
    Unless you have a fairly new project, git downloads a lot of history that only developers are likely to be interested in and uses a lot of extra disk space compared to what the user is interested in.
    As a user, if I find a project with a newish tarball release under 2 MB, I might well grab it just in case the project is something that interests me, in which case I often end up compiling it; if I find a project with a git repo saying they ‘released’ something, I probably will not clone unless I find a specific statement that it solves a specific problem that I’ve encountered better than the alternatives.
    As a packager, I’ll comment that while distros can deal with snapshots of git trees, it’s usually with a significant bit of resentment.
    As a programmer, git works well…but I probably won’t start working on a project if I’m not interested as a user, so in the end you’re not likely to get contributions from me without either offering a tarball or making a very good case for your software.

    Long story short…if you don’t have release tarballs, I’m more likely to look at the NTP over TLS that I’ve heard is in openntpd than to look at NTPsec. And if I don’t look at NTPsec, you can probably guess that I won’t be packaging it for Alpine Linux.
    I am aware that github can automatically generate tarballs for tags; using a properly formatted tag (package-version or similar) and advertising the URL where these tarballs can be found is enough of a release for me.
    (FWIW, I’m not threatening to retaliate for an absence of tarballs by refusing to package ntpsec, just saying that the way I work, I probably won’t end up packaging it unless you do.)

  40. Is there a good comparison of different NTP daemon projects: ntpd, NTPsec, ntimed, openntpd,… (and tricks like tlsdate)?

  41. I long for a GPLv2.01 – one that replaces clause 3 (about how the software can be distributed) and mandating that a source code control system be used instead. The minimally GPLv2 compliant tarball dumps I’ve had to sort through in my time, really sucked.

  42. You know now I take it back about distributing via containers. My friendly neighborhood Docker expert educated my today about how yes, indeed, plumbing services such as NTPsec do indeed run quite successfully in containers. So yeah.

  43. Dave: Fat chance. In RMS’s mythology, you need the full protections of GPLv3 in order to protect your precious bodily fluidssoftware freedom. Nothing less will do. If he could, he’d revoke GPLv2 outright and force everyone to v3.

  44. @ESR, shouldn’t this post be in “Software” category (rather than “General”), and perhaps also tagged “NTP” and/or “OSS”?

  45. Several years ago I wrote a script that turns a URL for a tarball into a git remote branch with TAR_HEAD as an alias. As in:

    git init; git add-tarball http://foo.com/bar-1.23.tar.xz

    Remote branch “tarballs/bar-1.23” is created with the tarball contents as tree. TAR_HEAD points to this commit, like FETCH_HEAD does after git fetch $remote $branch. The commit log message records details like URL, tarball size, tarball SHA1, etc. which are really useful years later when I’ve forgotten what project this tarball belonged to and how it got onto my hard drive.

    Then I just check out the tarball branch, or merge it into my local branch if it was based on an earlier tarball.

    After I wrote that script, I purged my memory of tarballs as a thing people used to transport source (or even binaries) around. I just treat tarball releases as a kind of remote repo where upstream has terrible commit log message discipline.

    git’s packing algorithm usually compresses better than gzip, and better than everything else too once you download multiple revisions of tarball.

  46. git’s packing algorithm usually compresses better than gzip

    Git *uses* gzip, although where you’re probably getting this impression comes from Git’s deduplication and deltas before it actually does the compression.

  47. wget an URL tarball is much easier than require different git-releases that keep on changing over and over again

    speed is not always an advantage. Often, people have put more effort into making a tarball work, whereas you don’t have that with git-based projects all the time

  48. “Secondly, it would be trivial for their build scripts to include a clone and pull.” — Gentoo already supports that

  49. Gentoo developer here.

    Offering git as the only option is problematic. While Gentoo supports pulling code from git, having to do so means that lots of users will be fetching from a single server, rather than allowing a tarball to be distributed to (local) mirrors. Mirrors increase reliability in addition to offering better download speeds.

    In practice, if you don’t make tarballs, you’ll just force us to make and distribute our own tarballs. That increases the work required downstream to ship your software. Also, what happens when a second source-based distro does the same? Now there are multiple groups making and distributing (potentially different!) tarballs for your software.

    Just ship tarballs.

  50. In practice, if you don’t make tarballs, you’ll just force us to make and distribute our own tarballs. That increases the work required downstream to ship your software.

    Really, as long as git-archive is used, this isn’t really all that difficult. and as stated, hosts such as GitHub (among others, including any that are hosted by gitweb) automatically use git-archive on the backend anyway.

  51. Tarballs are the way to go, with version numbers.

    I used to maintain what was possibly the least popular Linux distribution ever, and it was built around several pieces of software that were under very active development. The ones that were distributed solely from git were nightmares to maintain. I don’t fault those guys for managing their projects the way that worked best for them, but it was hell on their downstream (me and a few other guys).

    For infrastructure software, I really want to be running foo-x.y, not foo-fe329ada. Or was that the old one? foo-65b77dca? foo-1cb919fc?

  52. Given that git provides tools for exporting a commit as a tarball, and pretty much every git hosting service exports those, I don’t see that you need to choose. Just include both URLs in the release announcement.

    Me, I usually *prefer* to download the repository. It also has the nice property of subtly encouraging me to hack on it locally (and make good commits while I do!) because I already have version control set up. Which in turn encourages me to submit good commits if they’re worthy of wider use.

    Regarding things like prepackaging docbook output, that can easily be done in a repo, too. Just make a special release branch were you commit those, and leave them out of the master branch.

  53. > For infrastructure software, I really want to be running foo-x.y, not foo-fe329ada. Or was that the old one? foo-65b77dca? foo-1cb919fc

    It’s foo-x.y-10-g65b77dc not foo-x.y-7-gab8f334 ;-)

  54. This does bring up a point: Git hashes are useful and, arguably, the only meaningful way to specify a repository point (certainly if you’re not doing something like maintaining one authoritative distribution repository)…but they’re utterly useless to know where the version you’re running lies in the chain of released versions of the software.

    In practice, you wind up having to tag each release…and if you’re going to fix it in time like that anyway, why not tarball?

  55. git doesn’t include permissions or ownerships…. not that ownerships are important in this case.

    tarballs have the benefit that you can do a sha256, md5, or any hash…. even multiple hashes. You can also GPG sign them. That’s better for reproducibility than a git hash.

  56. @Tom Limoncelli – git does have a field for permissions, though it doesn’t actually support anything except for 644 (non-executable) or 755 (executable).

  57. > It’s foo-x.y-10-g65b77dc not foo-x.y-7-gab8f334 ;-)

    Must be a recent change. My build box has a directory full of foo-hash directories that were unpacked from tarballs downloaded from github between 2012 and 2014 (inclusive, heh).

  58. Tarballs remains the way to go for various reasons.

    1) not everything is in a git repository
    2) not everything is in a public VCS, even in open source
    3) many distributions take care of reproductability of builds, one part of it means archiving sources from upstream, it’s far easier to do with tarballs (keep in mind that upstream can rewrite the complete VCS history if they want…)
    4) it’s a far more simple, stable and backward compatible interface than git or any other VCS
    5) it’s less resource intensive than git/cvs/hg/svn (doesn’t make a big difference on one piece of software, but on thousands like in a distribution…)
    6) it’s the de facto interface between upstream and downstream, and it works quite well
    7) it’s a clear, simple and idiot proof way to deliver easily identifiable versions of your software
    8) it’s cheap to automate tarballs generation

  59. C’mon Eric – you’ve been around long enough to know the answer to that.

    You might want to ask the Apache Foundation because all their zillion products still seem to come packaged that way. Why would I want to clone their stuff anyway, I just want a copy of version-xyz to install…

    So I’d say tarballs with a good signature/checksum file are still the most os-neutral scm-tool-neutral packaging method out there.

  60. Must be a recent change. My build box has a directory full of foo-hash directories that were unpacked from tarballs downloaded from github between 2012 and 2014 (inclusive, heh).

    That’s still how they’re generated by GitHub for any random commit. You really should be using tags if available, and if they’re not, go pester the problem projects.

    It would be nice if GitHub used describe to name them instead, but it doesn’t. Possibly because it opens up whole new issues about the tagging conventions in any particular repository (Do they have tags? Are they annotated or lightweight tags? Do they all start with a v, just the number, RCS conventions, something else? Do they even _sort_ naturally?)

  61. Actually, the tarballs have version numbers. It’s been a couple of years, so I don’t remember if they downloaded like that, or if I had to rename them. The directories that unpack from them are all foo-hash.

    It is quite possible that the maintainers of the projects (multiple projects) were unaware of the proper usage of git and/or github and could have generated more useful tarballs.

    Nearly everyone who downloads this is going to get it into the form of a directory named ntpsec-x.y/ , because that is how most users and distribution maintainers manage software. If the project does it once (per release), it saves thousands of people from having to do it themselves.

    So, as one of those thousands of people, let me say that if you could wrap your free gift in the way that makes it easiest for me to open, I’d sure appreciate it.

  62. Pingback: Are tarballs obsolete? | No. Betteridge’s Law

  63. Git/Svn repos are much like websites, they vanish or disappear over time. Will 20 years from now in year 2035? I suspect not. Tarballs will still be floating around on some ftp, http server or on a disk, dvdrom or bluray disk. So the code is not lost to time.

    Tarballs are also included in source-rpms (debs too?) together with the distributors own patches to the source code. Allowing the binary package to be recreated any time in the future.

    I prefer tarballs to be available. But on the other hand I’m one of these that long for the good ol’ days then *NIX admins where real men and compiled their OS from sources and any UNIX admin worth their salt did now how to program in C.

    – bln

Leave a Reply

Your email address will not be published. Required fields are marked *