diff options
Diffstat (limited to 'doc/manual/src')
-rw-r--r-- | doc/manual/src/contributing/testing.md | 25 |
1 files changed, 25 insertions, 0 deletions
diff --git a/doc/manual/src/contributing/testing.md b/doc/manual/src/contributing/testing.md index e5f80a928..a5253997d 100644 --- a/doc/manual/src/contributing/testing.md +++ b/doc/manual/src/contributing/testing.md @@ -86,6 +86,31 @@ GNU gdb (GDB) 12.1 One can debug the Nix invocation in all the usual ways. For example, enter `run` to start the Nix invocation. +### Characterization testing + +Occasionally, Nix utilizes a technique called [Characterization Testing](https://en.wikipedia.org/wiki/Characterization_test) as part of the functional tests. +This technique is to include the exact output/behavior of a former version of Nix in a test in order to check that Nix continues to produce the same behavior going forward. + +For example, this technique is used for the language tests, to check both the printed final value if evaluation was successful, and any errors and warnings encountered. + +It is frequently useful to regenerate the expected output. +To do that, rerun the failed test with `_NIX_TEST_ACCEPT=1`. +(At least, this is the convention we've used for `tests/lang.sh`. +If we add more characterization testing we should always strive to be consistent.) + +An interesting situation to document is the case when these tests are "overfitted". +The language tests are, again, an example of this. +The expected successful output of evaluation is supposed to be highly stable – we do not intend to make breaking changes to (the stable parts of) the Nix language. +However, the errors and warnings during evaluation (successful or not) are not stable in this way. +We are free to change how they are displayed at any time. + +It may be surprising that we would test non-normative behavior like diagnostic outputs. +Diagnostic outputs are indeed not a stable interface, but they still are important to users. +By recording the expected output, the test suite guards against accidental changes, and ensure the *result* (not just the code that implements it) of the diagnostic code paths are under code review. +Regressions are caught, and improvements always show up in code review. + +To ensure that characterization testing doesn't make it harder to intentionally change these interfaces, there always must be an easy way to regenerate the expected output, as we do with `_NIX_TEST_ACCEPT=1`. + ## Integration tests The integration tests are defined in the Nix flake under the `hydraJobs.tests` attribute. |