Skip to content

elfshaker/elfshaker

Repository files navigation

elfshaker

400 GiB -> 100 MiB, with 1s access time†; when applied to clang builds.

elfshaker is a low-footprint, high-performance version control system fine-tuned for binaries.

  • elfshaker is a CLI tool written in the Rust programming language.

  • It stores snapshots of directories into highly-compressed pack files and provides fast on-demand access to the stored files. It is particularly good for storing lots of similar files, for example object files from an incremental build.

  • It allows few-second access to any commit of clang with the manyclangs project. For example, this accelerates bisection of LLVM by a factor of 60x! This is done by extracting builds of LLVM on-demand from locally stored elfshaker packs, each of which contains ~1,800 builds and is about 100 MiB in size, even though the full originals would take TiBs to store! Extracting a single build takes 2-4s on modern hardware.

†Applicability

Or, "how on earth do you get such a phenomenal result?".

It works particularly well for our presented use case because storing pre-link object files has these properties:

  • There are many files,
  • Most of them don't change very often so there are a lot of duplicate files,
  • When they do change, the deltas of the binaries are not huge.

We achieve this in manyclangs by compiling object code with the -ffunction-sections and -fdata-sections compiler flags. This has the effect that if you 'insert' a function into a translation unit, the insertion does not cause all of the addresses to change across the whole object file.

If you looked at the binary delta on the linked executable from such a change, it will be large, because all of the absolute addresses after the insertion will change, and references to those addresses will change. These address changes are not handled well by compression algorithms, resulting in a poor compression ratio. The effect of this is large: if you compress many revisions of clang executables together, you will see a compression ratio of something like 20%. This is pretty good! But elfshaker achieves a ratio of something closer to 0.01% (or 10,000x), amortized across many builds.

Quickstart

  1. Consult our installation and usage guide, make sure you know what you're doing.
  2. elfshaker store <snapshot> -- capture the state of the current working directory into a named snapshot <snapshot>.
  3. elfshaker pack <pack name> -- capture all 'loose' snapshots into a single pack file (this is what gets you the compression win).
  4. elfshaker extract <snapshot> -- restore the state of a previous snapshot into the current working directory.

For more detail, take a look at our workflow documentation and watch the talk from the 2021 LLVM Developers' Meeting:

2021 LLVM Dev Mtg “Manyclangs: Fast bisection with a small storage cost

System Compatibility

The following platforms are used for our CI tests:

  • Ubuntu 20.04 LTS

But we aim to support all popular Linux platforms, macOS and Windows in production.

We officially support the following architectures:

  • AArch64
  • x86-64

Current Status

The file format and directory structure is stable. We intend that pack files created with the current elfshaker version will remain compatible with future versions. Please kick the tyres and do your own validation, and file bugs if you find any. We have done a fair amount of validation for our use cases but there may be things we haven't needed yet, so please start a discussion and file issues/patches.

Contributing

Contributions are highly appreciated. Refer to our Contributing guide.

Contact

The best way to reach us to join the elfshaker/community on Gitter. The original authors of elfshaker are Peter Waller (@peterwaller-arm) <peter.waller@arm.com> and Vesko Karaganev (@veselink1) <vesko.karaganev@gmail.com> and you may also contact us via email.

Security

Refer to our Security policy.

License

elfshaker is licensed under the Apache License 2.0.