aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorValentin Gagarin <valentin.gagarin@tweag.io>2022-06-09 11:07:50 +0200
committerValentin Gagarin <valentin.gagarin@tweag.io>2022-08-04 12:37:47 +0200
commitfa7ad4593d09d04afe1d215d2da09be02c2e2836 (patch)
tree3aadef4d97f08046e45cc4fb14469fb21f7a735f
parentf632816cbaefbba9cc27a8e0de6cffd39fa7a8dd (diff)
explain store directory
-rw-r--r--doc/manual/src/SUMMARY.md.in6
-rw-r--r--doc/manual/src/architecture/store/objects.md48
-rw-r--r--doc/manual/src/architecture/store/paths.md123
-rw-r--r--doc/manual/src/architecture/store/store.md21
4 files changed, 91 insertions, 107 deletions
diff --git a/doc/manual/src/SUMMARY.md.in b/doc/manual/src/SUMMARY.md.in
index 5e2d26bf7..997d75444 100644
--- a/doc/manual/src/SUMMARY.md.in
+++ b/doc/manual/src/SUMMARY.md.in
@@ -17,8 +17,10 @@
- [Upgrading Nix](installation/upgrading.md)
- [Architecture](architecture/architecture.md)
- [Store](architecture/store/store.md)
- - [Store Object](architecture/store/objects.md)
- - [Store Path](architecture/store/paths.md)
+ - [Store Path](architecture/store/path.md)
+ - [Digest](architecture/store/path.md#digest)
+ - [Input Addressing](architecture/store/path.md#input-addressing)
+ - [Content Addressing](architecture/store/path.md#content-addressing)
- [Package Management](package-management/package-management.md)
- [Basic Package Management](package-management/basic-package-mgmt.md)
- [Profiles](package-management/profiles.md)
diff --git a/doc/manual/src/architecture/store/objects.md b/doc/manual/src/architecture/store/objects.md
deleted file mode 100644
index 8ab0b9368..000000000
--- a/doc/manual/src/architecture/store/objects.md
+++ /dev/null
@@ -1,48 +0,0 @@
-# Store Object
-
-Nix organizes the data it manages into *store objects*.
-A store object is the pair of
-
- - a [file system object](#file-system-object)
- - a set of [references](#reference) to store objects.
-
-We call a store object's outermost file system object the *root*.
-
-```haskell
-data StoreOject = StoreObject {
- root :: FileSystemObject
-, references :: Set StoreObject
-}
-```
-
-## File system object {#file-system-object}
-
-The Nix store uses a simple file system model.
-
-Every file system object is one of the following:
- - File: an executable flag, and arbitrary data for contents
- - Directory: mapping of names to child file system objects
- - [Symbolic link](https://en.m.wikipedia.org/wiki/Symbolic_link): may point anywhere.
-
-```haskell
-data FileSystemObject
- = File { isExecutable :: Bool, contents :: Bytes }
- | Directory { entries :: Map FileName FileSystemObject }
- | SymLink { target :: Path }
-```
-
-A bare file or symlink can be a root file system object.
-
-Symlinks pointing outside of their own root, or to a store object without a matching reference, are allowed, but might not function as intended.
-
-### Reference scanning
-
-While references could be arbitrary paths, Nix requires them to be store paths to ensure correctness.
-Anything outside a given store is not under control of Nix, and therefore cannot be guaranteed to be present when needed.
-
-However, having references match store paths in files is not enforced by the data model:
-Store objects could have excess or incomplete references with respect to store paths found in their file contents.
-
-Scanning files therefore allows reliably capturing run time dependencies without declaring them explicitly.
-Doing it at build time and persisting references in the store object avoids repeating this time-consuming operation.
-
diff --git a/doc/manual/src/architecture/store/paths.md b/doc/manual/src/architecture/store/paths.md
index 402e55e69..4867e7fd3 100644
--- a/doc/manual/src/architecture/store/paths.md
+++ b/doc/manual/src/architecture/store/paths.md
@@ -1,78 +1,103 @@
# Store Path
-A store path is a pair of a 20-byte digest and a name.
+Nix implements [references](store.md#reference) to [store objects](store.md#store-object) as *store paths*.
-## String representation
+Store paths are pairs of
-A store path is rendered as the concatenation of
+- a 20-byte [digest](#digest) for identification
+- a symbolic name for people to read.
- - a store directory
+Example:
- - a path-separator (`/`)
+ {
+ digest: "b6gvzjyb2pg0kjfwrjmg1vfhh54ad73z",
+ name: "firefox-33.1",
+ }
- - the digest rendered as Base-32 (20 arbitrary bytes becomes 32 ASCII chars)
+It is rendered to a file system path as the concatenation of
- - a hyphen (`-`)
+ - [store directory](#store-directory)
+ - path-separator (`/`)
+ - [digest](#digest) rendered in [base-32](https://en.m.wikipedia.org/wiki/Base32) (20 arbitrary bytes become 32 ASCII characters)
+ - hyphen (`-`)
+ - name
- - the name
+Example:
-Let's take the store path from the very beginning of this manual as an example:
+ /nix/store/b6gvzjyb2pg0kjfwrjmg1vfhh54ad73z-firefox-33.1
+ |--------| |------------------------------| |----------|
+ store directory digest name
- /nix/store/b6gvzjyb2pg0kjfwrjmg1vfhh54ad73z-firefox-33.1
+## Store Directory {#store-directory}
-This parses like so:
+Every [store](./store.md) has a store directory.
- /nix/store/b6gvzjyb2pg0kjfwrjmg1vfhh54ad73z-firefox-33.1
- ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^
- store dir digest name
+If the store has a [file system representation](./store.md#files-and-processes), this directory contains the store’s [file system objects](#file-system-object), which can be addressed by [store paths](#store-path).
-We then can discard the store dir to recover the conceptual pair that is a store path:
+This means a store path is not just derived from the referenced store object itself, but depends on the store the store object is in.
- {
- digest: "b6gvzjyb2pg0kjfwrjmg1vfhh54ad73z",
- name: "firefox-33.1",
- }
+::: {.note}
+The store directory defaults to `/nix/store`, but is in principle arbitrary.
+:::
-### Where did the "store directory" come from?
+It is important which store a given store object belongs to:
+Files in the store object can contain store paths, and processes may read these paths.
+Nix can only guarantee [referential integrity](store.md#closure) if store paths do not cross store boundaries.
-If you notice, the above references a "store directory", but that is *not* part of the definition of a store path.
-We can discard it when parsing, but what about when printing?
-We need to get a store directory from *somewhere*.
+Therefore one can only copy store objects if
-The answer is, the store directory is a property of the store that contains the store path.
-The explanation for this is simple enough: a store is notionally mounted as a directory at some location, and the store object's root file system likewise mounted at this path within that directory.
+- the source and target stores' directories match
-This does, however, mean the string representation of a store path is not derived just from the store path itself, but is in fact "context dependent".
+ or
-## The digest
+- the store object in question has no references, that is, contains no store paths.
-The calculation of the digest is quite complicated for historical reasons.
-The details of the algorithms will be discussed later once more concepts have been introduced.
-For now, we just concern ourselves with the *key properties* of those algorithms.
+To move a store object to a store with a different store directory, it has to be rebuilt, together with all its dependencies.
+It is in general not enough to replace the store directory string in file contents, as this may break internal offsets or content hashes.
-::: {.note}
-**Historical note** The 20 byte restriction is because originally a digests were SHA-1 hashes.
-This is no longer true, but longer hashes and other information are still boiled down to 20 bytes.
-:::
+# Digest {#digest}
+
+In a [store path](#store-path), the [digest][digest] is the output of a [cryptographic hash function][hash] of either all *inputs* involved in building the referenced store object or its actual *contents*.
-Store paths are either *content-addressed* or *input-addressed*.
+Store objects are therefore said to be either [input-addressed](#input-addressing) or [content-addressed](#content-addressing).
::: {.note}
-The former is a standard term used elsewhere.
-The later is our own creation to evoke a contrast with content addressing.
+**Historical note**: The 20 byte restriction is because originally digests were [SHA-1][sha-1] hashes.
+This is no longer true, but longer hashes and other information are still truncated to 20 bytes for compatibility.
:::
-Content addressing means that the store path digest ultimately derives from referred store object's contents, namely its file system objects and references.
-There is more than one *method* of content-addressing, however.
-Still, if one does know the content addressing schema that was used,
-(or guesses, there isn't that many yet!)
-one can recalculate the store path and thus verify the store object.
+[digest]: https://en.m.wiktionary.org/wiki/digest#Noun
+[hash]: https://en.m.wikipedia.org/wiki/Cryptographic_hash_function
+[sha-1]: https://en.m.wikipedia.org/wiki/SHA-1
+
+
+### Reference scanning
+
+While references could be arbitrary paths, Nix requires them to be store paths to ensure correctness.
+Anything outside a given store is not under control of Nix, and therefore cannot be guaranteed to be present when needed.
+
+However, having references match store paths in files is not enforced by the data model:
+Store objects could have excess or incomplete references with respect to store paths found in their file contents.
+
+Scanning files therefore allows reliably capturing run time dependencies without declaring them explicitly.
+Doing it at build time and persisting references in the store object avoids repeating this time-consuming operation.
+
+## Input Addressing {#input-addressing}
+
+Input addressing means that the digest derives from how the store object was produced, namely its build inputs and build plan.
+
+To compute the hash of a store object one needs a deterministic serialisation, i.e., a binary string representation which only changes if the store object changes.
+
+Nix has a custom serialisation format called Nix Archive (NAR)
+
+Store object references of this sort can *not* be validated from the content of the store object.
+Rather, a cryptographic signature has to be used to indicate that someone is vouching for the store object really being produced from a build plan with that digest.
+
+## Content Addressing {#content-addressing}
+
+Content addressing means that the digest derives from the store object's contents, namely its file system objects and references.
+If one knows content addressing was used, one can recalculate the reference and thus verify the store object.
-Input addressing means that the store path digest derives from how the store path was produced, namely the "inputs" and plan that it was built from.
-Store paths of this sort can *not* be validated from the content of the store object.
-Rather, the store object might come with the store path it expects to be referred to by, and a signature of that path, the contents of the store path, and other metadata.
-The signature indicates that someone is vouching for the store object really being the results of a plan with that digest.
+Content addressing is currently only used for the special cases of source files and "fixed-output derivations", where the contents of a store object are known in advance.
+Content addressing of build results is still an [experimental feature subject to some restrictions](https://github.com/tweag/rfcs/blob/cas-rfc/rfcs/0062-content-addressed-paths.md).
-While metadata is included in the digest calculation explaining which method it was calculated by, this only serves to thwart pre-image attacks.
-That metadata is scrambled with everything else so that it is difficult to tell how a given store path was produced short of a brute-force search.
-In the parlance of referencing schemes, this means that store paths are not "self-describing".
diff --git a/doc/manual/src/architecture/store/store.md b/doc/manual/src/architecture/store/store.md
index 68bdadc4a..21a876f75 100644
--- a/doc/manual/src/architecture/store/store.md
+++ b/doc/manual/src/architecture/store/store.md
@@ -67,18 +67,19 @@ As it keeps track of references, it can [garbage-collect][garbage-collection] un
[ store ] --> collect garbage --> [ store' ]
-## Closure
+## Closure {#closure}
-Nix stores have the *closure property*: for each store object in the store, all the store objects it references must also be in the store.
+Nix stores ensure [referential integrity][referential-integrity]: for each store object in the store, all the store objects it references must also be in the store.
-Adding, building, copying and deleting store objects must be done in a way that obeys this property:
+The set of all store objects reachable by following references from a given initial set of store objects is called a *closure*.
+
+Adding, building, copying and deleting store objects must be done in a way that preserves referential integrity:
- A newly added store object cannot have references, unless it is a build task.
- Build results must only refer to store objects in the closure of the build inputs.
Building a store object will add appropriate references, according to the build task.
- These references can only come from declared build inputs.
- Store objects being copied must refer to objects already in the destination store.
@@ -86,16 +87,15 @@ Adding, building, copying and deleting store objects must be done in a way that
- We can only safely delete store objects which are not reachable from any reference still in use.
- Garbage collection will delete those store objects that cannot be reached from any reference in use.
-
<!-- more details in section on garbage collection, link to it once it exists -->
+[referential-integrity]: https://en.m.wikipedia.org/wiki/Referential_integrity
[garbage-collection]: https://en.m.wikipedia.org/wiki/Garbage_collection_(computer_science)
[immutable-object]: https://en.m.wikipedia.org/wiki/Immutable_object
[opaque-data-type]: https://en.m.wikipedia.org/wiki/Opaque_data_type
[unique-identifier]: https://en.m.wikipedia.org/wiki/Unique_identifier
-## Files and Processes
+## Files and Processes {#files-and-processes}
Nix maps between its store model and the [Unix paradigm][unix-paradigm] of [files and processes][file-descriptor], by encoding immutable store objects and opaque identifiers as file system primitives: files and directories, and paths.
That allows processes to resolve references contained in files and thus access the contents of store objects.
@@ -103,11 +103,16 @@ That allows processes to resolve references contained in files and thus access t
Store objects are therefore implemented as the pair of
- a [file system object](fso.md) for data
- - a set of *store paths* for references.
+ - a set of [store paths](paths.md) for references.
[unix-paradigm]: https://en.m.wikipedia.org/wiki/Everything_is_a_file
[file-descriptor]: https://en.m.wikipedia.org/wiki/File_descriptor
+The following diagram shows a radical simplification of how Nix interacts with the operating system:
+It uses files as build inputs, and build outputs are files again.
+On the operating system, files are either "dead" data, or "live" as processes, which in turn operate on files, or can bring them to life.
+A build function also amounts to an operating system process (not depicted).
+
```
+-----------------------------------------------------------------+
| Nix |