From 94e77ad95407993afd18a06c85d77486dff28fa1 Mon Sep 17 00:00:00 2001 From: Jeremy Soller <jeremy@system76.com> Date: Sat, 14 Mar 2020 11:36:45 -0600 Subject: [PATCH] news/pkgar-introduction --- content/news/pkgar-introduction.md | 123 +++++++++++++++++++++++++++++ 1 file changed, 123 insertions(+) create mode 100644 content/news/pkgar-introduction.md diff --git a/content/news/pkgar-introduction.md b/content/news/pkgar-introduction.md new file mode 100644 index 00000000..3a4fe93f --- /dev/null +++ b/content/news/pkgar-introduction.md @@ -0,0 +1,123 @@ ++++ +title = "pkgar introduction" +author = "jackpot51" +date = "2020-03-14T10:00:00-07:00" ++++ + +It has been a while since the +[last Redox OS news](https://www.redox-os.org/news/focusing-on-rustc/), +and I think it is good to provide an update on how things are progressing. + +The dynamic linking support in relibc got to the point where rustc could be +loaded, but hangs occur after loading the LLVM codegen library. Debugging this +issue has been difficult, so I am taking some time to consider other aspects of +Redox OS. Recently, I have been working on a new package format, called +[pkgar](https://gitlab.redox-os.org/redox-os/pkgar). + +## What is `pkgar`? + +`pkgar`, short for package archive, is a file format, library, and command line +executable for creating and extracting cryptographically secure collections of +files, primarly for use in package management on Redox OS. The technical details +are still in development, so I think it is good to instead review the goals of +`pkgar` and some examples that demonstrate its design principles. + +The goals of `pkgar` are as follows: +- Atomic - updates are done atomically if possible +- Economical - transfer of data must only occur when hashes change, allowing for + network and data usage to be minimized +- Fast - encryption and hashing algorithms are chosen for performance, and + packages can potentially be extracted in parallel +- Minimal - unlike other formats such as `tar`, the metadata included in a + `pkgar` file is only what is required to extract the package +- Relocatable - packages can be installed to any directory, by any user, + provided the user can verify the package signature and has access to that + directory. +- Secure - packages are always cryptographically secure, and verification of all + contents must occur before installation of a package completes. + +To demonstrate how the format's design achieves these goals, let's look at some +examples. + +## Example 1: Newly installed package + +In this example, a package is installed that has never been installed on the +system, from a remote repository. We assume that the repository's public key is +already installed on disk, and that the URL to the package's `pkgar` is known. + +First, a small, fixed-size header portion of the `pkgar` is downloaded. This is +currently 136 bytes in size. It contains a NaCL signature, NaCL public key, +BLAKE3 hash of the entry metadata, and 64-bit count of entry metadata structs. + +Before this header can be used, it is verified. The public key must match the +one installed on disk. The signature of the struct must verify against the +public key. If this is true, the hash and entry count are considered valid. + +The entry metadata can now be downloaded to a temporary file. During the +download, the BLAKE3 hash is calculated. If this hash matches the hash in the +header, the metadata is considered valid and is moved atomically to the correct +location for future use. Both the header and metadata are stored in this file. + +Each entry metadata struct contains a BLAKE3 hash of the entry data, a 64-bit +offset of the file data in the data portion of the `pkgar`, a 64-bit size of the +file data, a 32-bit mode identifying Unix permissions, and up to a 256-byte +relative path for the file. + +For each entry, before downloading the file data, the path can be validated for +install permissions. The file data is downloaded to a temporary file, with no +read, write, or execute permissions. While the download is happening, the BLAKE3 +hash is calculated. If this hash matches, the file data is considered valid. + +After downloading all entries, the temporary files have their permissions set +as indicated by the mode in the metadata. They are then moved atomically to the +correct location. At this point, the package is successfully installed. + +## Example 2: Updated package + +In this example, a package is updated, and only one file changes. This is to +demonstrate the capabilities of `pkgar` to minimize disk writes and network +traffic. + +First, the header is downloaded. The header is verified as before. Since a file +has changed, the metadata hash will have changed. The metadata will be +downloaded and verified. Both header and metadata will be atomically updated on +disk. + +The entry metadata will be compared to the previous entry metadata. The hash for +one specific file will have changed. Only the contents for that file will be +downloaded to a temporary file, and verified. Once that is complete, it will be +atomically updated on disk. The package update is successfully completed, and +only the header, entry metadata, and the files that have changed were +downloaded and written. + +## Example 3: Package verification + +In this example, a package is verified against the metadata saved on disk. It is +possible to reconstruct a package from an installed system, for example, in +order to install that package from a live disk. + +First, the header is verified as before. The entry metadata is then verified. +If there is a mismatch, an error is thrown and the package could be reinstalled. + +The entry metadata will be compared to the files on disk. The mode of each file +will be compared to the metadata mode. Then the hash of the file data will be +compared to the hash in the metadata. If there is a mismatch, again, an error +is thrown and the package could be reinstalled. + +It would be possible to perform this process while copying the package to a new +target. This allows the installation of a package from a live disk to a new +install without having to store the entire package contents. + +## Conclusion + +As the examples show, the design of `pkgar` is meant to provide the best +possible package management experience on Redox OS. At no point should invalid +data be installed on disk in accessible files, and installation should be +incredibly fast and efficient. + +Work still continues on determining the repository format, as well as +integrating `pkgar` into the current package management tools. The source for +`pkgar` is fairly lightweight, I highly recommend reading it and contributing +on the Redox OS GitLab: https://gitlab.redox-os.org/redox-os/pkgar. Feel free to +reach out to https://twitter.com/redox_os and https://twitter.com/jeremy_soller +if you have questions. -- GitLab