diff options
Diffstat (limited to 'doc/manual/src/architecture/store')
-rw-r--r-- | doc/manual/src/architecture/store/building.md | 15 | ||||
-rw-r--r-- | doc/manual/src/architecture/store/drvs/ca.md | 0 | ||||
-rw-r--r-- | doc/manual/src/architecture/store/drvs/drvs.md | 59 | ||||
-rw-r--r-- | doc/manual/src/architecture/store/drvs/ia.md | 0 | ||||
-rw-r--r-- | doc/manual/src/architecture/store/input-addressing.md | 1 | ||||
-rw-r--r-- | doc/manual/src/architecture/store/nar.md | 1 | ||||
-rw-r--r-- | doc/manual/src/architecture/store/object-ca.md | 1 | ||||
-rw-r--r-- | doc/manual/src/architecture/store/objects.md | 73 | ||||
-rw-r--r-- | doc/manual/src/architecture/store/paths.md | 78 | ||||
-rw-r--r-- | doc/manual/src/architecture/store/related-work.md | 37 | ||||
-rw-r--r-- | doc/manual/src/architecture/store/relocatability.md | 15 | ||||
-rw-r--r-- | doc/manual/src/architecture/store/store.md | 27 |
12 files changed, 307 insertions, 0 deletions
diff --git a/doc/manual/src/architecture/store/building.md b/doc/manual/src/architecture/store/building.md new file mode 100644 index 000000000..f4f2649a3 --- /dev/null +++ b/doc/manual/src/architecture/store/building.md @@ -0,0 +1,15 @@ +# Building + +## Scanning for references + +Before in the section on [store objects](../objects.md), we talked abstractly about scanning for references. +Now we can make this concrete. + +After the derivation's command is run, Nix needs to process the "raw" output directories to turn them into legit store objects. +There is a few steps of this, but let's start with the simple case of one input-addressed output first. + +\[Overview of things that need to happen.] + +For example, if Nix thinks `/nix/store/asdfasdfasdf-foo` and `/nix/store/qwerqwerqwer-bar` are paths the data might plausibly reference, Nix will scan all the contents of all files recursively for the "hash parts" `asdfasdfasdf`` and `qwerqwerqwer`. + +\[Explain why whitelist.] diff --git a/doc/manual/src/architecture/store/drvs/ca.md b/doc/manual/src/architecture/store/drvs/ca.md new file mode 100644 index 000000000..e69de29bb --- /dev/null +++ b/doc/manual/src/architecture/store/drvs/ca.md diff --git a/doc/manual/src/architecture/store/drvs/drvs.md b/doc/manual/src/architecture/store/drvs/drvs.md new file mode 100644 index 000000000..766a7b47f --- /dev/null +++ b/doc/manual/src/architecture/store/drvs/drvs.md @@ -0,0 +1,59 @@ +# Derivations + +Derivations are recipes to create store objects. + +Derivations are the heart of Nix. +Other system (like Git or IPFS) also store and transfer immutable data, but they don't concern themselves with *how* that data was created. +This is where Nix comes in. + +Derivations produce data by running arbitrary commands, like Make or Ninja rules. +Unlike those systems, derivations do not produce arbitrary files, but only specific store objects. +They cannot modify the store in any way, other than creating those store objects. +This rigid specification of what they do is what allows Nix's caching to be so simple and yet robust. + +Based on the above, we can conceptually break derivations down into 3 parts: + +1. What command will be run? + +2. What existing store objects are needed as inputs? + +3. What store objects will be produced as outputs? + +## What command will be run? + +The original core of Nix was very simple about this, in the mold of traditional Unix. +Commands consist of 3 parts: + +1. Path to executable + +2. Arguments (Except for `argv[0]`, which is taken from the path in the usual way) + +3. Environment variables. + +## What existing store objects are needed as inputs? + +The previous sub-section begs the question "how can we be sure the path to the executable points to what we think it does?" +It's a good questions! + +## What store objects will be produced as outputs? + +## Extra extensions + +### `__structuredAttrs` + +Historically speaking, most users of Nix made GNU Bash with a script the command run, regardless of what they were doing. +Bash variable are automatically created from env vars, but bash also supports array and string-keyed map variables in addition to string variables. +People also usually create derivations using language which also support these richer data types. +It was thus desired a way to get this data from the language "planning" the derivation to language to bash, the language evaluated at "run time". + +`__structuredAttrs` does this by smuggling inside the core derivation format a map of named richer data. +At run time, this becomes two things: + +1. A JSON file containing that map. +2. A bash script setting those variables. + +The bash command can be passed a script which will "source" that Nix-created bash script, setting those variables with the richer data. +The outer script can then do whatever it likes with those richer variables as input. + +However, since derivations can already contain arbitary input sources, the vast majority of `__structuredAttrs` can be handled by upper layers. +We might consider implementing `__structuredAttrs` in higher layers in the future, and simplifying the store layer. diff --git a/doc/manual/src/architecture/store/drvs/ia.md b/doc/manual/src/architecture/store/drvs/ia.md new file mode 100644 index 000000000..e69de29bb --- /dev/null +++ b/doc/manual/src/architecture/store/drvs/ia.md diff --git a/doc/manual/src/architecture/store/input-addressing.md b/doc/manual/src/architecture/store/input-addressing.md new file mode 100644 index 000000000..1333ed77b --- /dev/null +++ b/doc/manual/src/architecture/store/input-addressing.md @@ -0,0 +1 @@ +TODO diff --git a/doc/manual/src/architecture/store/nar.md b/doc/manual/src/architecture/store/nar.md new file mode 100644 index 000000000..1333ed77b --- /dev/null +++ b/doc/manual/src/architecture/store/nar.md @@ -0,0 +1 @@ +TODO diff --git a/doc/manual/src/architecture/store/object-ca.md b/doc/manual/src/architecture/store/object-ca.md new file mode 100644 index 000000000..1333ed77b --- /dev/null +++ b/doc/manual/src/architecture/store/object-ca.md @@ -0,0 +1 @@ +TODO diff --git a/doc/manual/src/architecture/store/objects.md b/doc/manual/src/architecture/store/objects.md new file mode 100644 index 000000000..e4f49a170 --- /dev/null +++ b/doc/manual/src/architecture/store/objects.md @@ -0,0 +1,73 @@ +# Store Objects + +Data in Nix is chiefly organized into *store objects*. +A store object is the pair of + + - A (root) file system object + - A set of references to store objects + +## File system objects + +The Nix store uses a simple filesystem model. + + data FileSystemObject + = Regular Executable ByteString + | Directory (Map FileName FSO) + | SymLink ByteString + + data Executable + = Executable + | NonExecutable + +In particular, every file system object falls into these three cases: + + - File: an executable flag, and arbitrary data + + - Directory: mapping of names to child file system objects. + + - Symlink: may point anywhere. + + In particular, symlinks that do not point within the containing root file system object or that of another store object referenced by the containing store object are allowed, but might not function as intended. + +A bare file or symlink as the "root" file system object is allowed. + +### Comparison with Git + +This is close to Git's model, but with one crucial difference: +Git puts the "permission" info within the directory map's values instead of making it part of the file (blob, in it's parlance) object. + + data GitObject + = Blob ByteString + | Tree (Map FileName (Persission, FSO)) + + data Persission + = Directory -- IFF paired with tree + -- Iff paired with blob, one of: + | RegFile + | ExecutableFile + | Symlink + +So long as the root object is a directory, the representations are isomorphic. +There is no "wiggle room" the git way since whenever the permission info wouldn't matter (e.g. the child object being mapped to is a directory), the permission info must be a sentinel value. + +However, if the root object is a blob, there is loss of fidelity. +Since the permission info is used to distinguish executable files, non-executable files, and symlinks, but there isn't a "parent" directory of the root to contain that info, these 3 cases cannot be distinguished. + +Git's model matches Unix tradition, but Nix's model is more natural. + +## References + +Store objects can refer to both other store objects and themselves. + +Self-reference may seem pointless, but tracking them is in fact useful. +We can best explain why later after more concepts have been established. + +References are normally calculated so as to to record the presence of textual references in store object's file systems obejcts. +This process will be described precisely in the section on [building](./building.md), once more concepts are explained, as building is the primary path new store objects with non-trivial references are created. + +However, scanning for references is not mandatory. +Store objects are allowed to have official references that *don't* correspond to store paths contained in their contents, +and they are also allowed to *not* have references that *do* correspond to store paths contained in their store. +Taken together, this means there is no actual rule relating the store paths contained in the contents to the store paths deemed references. + +This is why it's its necessary for correctness, and not just performance, that Nix remember the references of each store object, rather than try to recompute them on the fly by scanning their contents. diff --git a/doc/manual/src/architecture/store/paths.md b/doc/manual/src/architecture/store/paths.md new file mode 100644 index 000000000..cf51eb866 --- /dev/null +++ b/doc/manual/src/architecture/store/paths.md @@ -0,0 +1,78 @@ +# Store Paths + +A store path is a pair of a 20-byte digest and a name. + +## String representation + +A store path is rendered as the concatenation of + + - a store directory + + - a path-separator (`/`) + + - the digest rendered as Base-32 (20 arbitrary bytes becomes 32 ASCII chars) + + - a hyphen (`-`) + + - the name + +Let's take the store path from the very beginning of this manual as an example: + + /nix/store/b6gvzjyb2pg0kjfwrjmg1vfhh54ad73z-firefox-33.1 + +This parses like so: + + /nix/store/b6gvzjyb2pg0kjfwrjmg1vfhh54ad73z-firefox-33.1 + ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^ + store dir digest name + +We then can discard the store dir to recover the conceptual pair that is a store path: + + { + digest: "b6gvzjyb2pg0kjfwrjmg1vfhh54ad73z", + name: "firefox-33.1", + } + +### Where did the "store directory" come from? + +If you notice, the above references a "store directory", but that is *not* part of the definition of a store path. +We can discard it when parsing, but what about when printing? +We need to get a store directory from *somewhere*. + +The answer is, the store directory is a property of the store that contains the store path. +The explanation for this is simple enough: a store is notionally mounted as a directory at some location, and the store object's root file system likewise mounted at this path within that directory. + +This does, however, mean the string representation of a store path is not derived just from the store path itself, but is in fact "context dependent". + +## The digest + +The calculation of the digest is quite complicated for historical reasons. +The details of the algorithms will be discussed later once more concepts have been introduced. +For now, we just concern ourselves with the *key properties* of those algorithms. + +::: {.note} +**Historical note** The 20 byte restriction is because originally a digests were SHA-1 hashes. +This is no longer true, but longer hashes and other information are still boiled down to 20 bytes. +::: + +Store paths are either *content-addressed* or *input-addressed*. + +::: {.note} +The former is a standard term used elsewhere. +The later is our own creation to evoke a contrast with content addressing. +::: + +Content addressing means that the store path digest ultimately derives from referred store object's contents, namely its file system objects and references. +There is more than one *method* of content-addressing, however. +Still, if one does know the content addressing schema that was used, +(or guesses, there isn't that many yet!) +one can recalculate the store path and thus verify the store object. + +Input addressing means that the store path digest derives from how the store path was produced, namely the "inputs" and plan that it was built from. +Store paths of this sort can *not* be validated from the content of the store object. +Rather, the store object might come with the store path it expects to be referred to by, and a signature of that path, the contents of the store path, and other metadata. +The signature indicates that someone is vouching for the store object really being the results of a plan with that digest. + +While metadata is included in the digest calculation explaining which method it was calculated by, this only serves to thwart pre-image attacks. +That metadata is scrambled with everything else so that it is difficult to tell how a given store path was produced short of a brute-force search. +In the parlance of referencing schemes, this means that store paths are not "self-describing". diff --git a/doc/manual/src/architecture/store/related-work.md b/doc/manual/src/architecture/store/related-work.md new file mode 100644 index 000000000..b64b41988 --- /dev/null +++ b/doc/manual/src/architecture/store/related-work.md @@ -0,0 +1,37 @@ +# Advanced Topic: Related Work + +## Bazel + +TODO skylark and layering. + +TODO being monadic, if RFC 92. + +## Build Systems à la Carte + +TODO user-choosen keys vs keys chosen automatically? +Purity in face of dynamic tasks (no conflicts, guaranteed). + +TODO Does Nix constitute a different way to be be monadic? +Purity of keys, as mentioned. +Dynamic tasks/keys vs dynamic dependencies. +(Not sure yet.) + +## Lazy evaluation + +We clearly have thunks that produce thunks, but less clearly functions that produce functions. + +Do we have open terms? + +Do we hve thunks vs expressions distinction? +c.f. John Shutt's modern fexprs, when the syntax can "leak". + +## Machine models + +TODO +Derivations as store objects via drv files makes Nix a "Von Neumann" archicture. +Can also imagine a "Harvard" archicture where derivations are stored separately? +Can we in general imagine N heaps for N different sorts of objects? + +TODO +Also, leaning on the notion of "builtin builders" more, having multiple different sorts of user-defined builders too. +The builder is a black box as far as the Nix model is concerned. diff --git a/doc/manual/src/architecture/store/relocatability.md b/doc/manual/src/architecture/store/relocatability.md new file mode 100644 index 000000000..c7f869135 --- /dev/null +++ b/doc/manual/src/architecture/store/relocatability.md @@ -0,0 +1,15 @@ +## Advanced Topic: Store object relocation + +Now that we know the fundamentals of the design of the Nix store, let's explore one consequence of that design: the question when it is permissible to relocate a store object to a store with a different mount point. + +Recall from the section on [store paths](./store-paths.md) that concrete store paths look like `<store-dir>/<hash>-<name>`. + +~~The two final restrictions of the previous section yield an alternative view of the same information.~~ +Rather than associating store dirs with the references, we can say a store object itself has a store dir if and only if it has at least one reference. + +This corresponds to the observation that a store object with references, i.e. with a store directory under this interpretation, is confined to stores sharing that same store directory, but a store object without any references, i.e. thus without a store directory, can exist in any store. + +Lastly, this illustrates the purpose of tracking self references. +Store objects without self-references or other references are relocatable, while store paths with self-references aren't. +This is used to tell apart e.g. source code which can be stored anywhere, and pesky non-reloctable executables which assume they are installed to a certain path. +\[The default method of calculating references by scanning for store paths handles these two example cases surprisingly well.\] diff --git a/doc/manual/src/architecture/store/store.md b/doc/manual/src/architecture/store/store.md new file mode 100644 index 000000000..d4add52f5 --- /dev/null +++ b/doc/manual/src/architecture/store/store.md @@ -0,0 +1,27 @@ +# Store + +A Nix store is a collection of *store objects* referred to by *store paths*. +Every store also has a "store directory path", which is a path prefix used for various purposes. + +There are many types of stores, but all of them at least respect this model. +Some however offer additional functionality. + +## A Rosetta stone for the Nix store. + +The design of Nix is comparable to other build systems, even programming languages in general. +Here is a rough [Rosetta stone](https://en.m.wikipedia.org/wiki/Rosetta_Stone) for build system terminology. +If you are familiar with one of these columns, this might help the following sections make more sense. + +generic build system | Nix | Bazel | Build Systems à la Carte | lazy programming language +-- | -- | -- | -- | -- +data (build input, build result) | component | file (source, target) | value | value +build plan | derivation graph | action graph | `Tasks` | thunk +build step | derivation | rule | `Task` | thunk +build instructions | builder | (depends on action type) | `Task` | function +build | build | build | `Build` applied to arguments | evaluation +persistence layer | store | file system | `Store` | heap + +(n.b. Bazel terms gotten from https://docs.bazel.build/versions/main/glossary.html.) + +Plenty more could be said elaborating these comparisons. +We will save that for the end of this chapter, in the [Related Work](./related-work.md) section. |