1 files changed, 25 insertions, 0 deletions
diff --git a/doc/manual/src/contributing/testing.md b/doc/manual/src/contributing/testing.md
index e5f80a928..a5253997d 100644
--- a/doc/manual/src/contributing/testing.md
+++ b/doc/manual/src/contributing/testing.md
@@ -86,6 +86,31 @@ GNU gdb (GDB) 12.1
 One can debug the Nix invocation in all the usual ways.
 For example, enter `run` to start the Nix invocation.
 
+### Characterization testing
+
+Occasionally, Nix utilizes a technique called [Characterization Testing](https://en.wikipedia.org/wiki/Characterization_test) as part of the functional tests.
+This technique is to include the exact output/behavior of a former version of Nix in a test in order to check that Nix continues to produce the same behavior going forward.
+
+For example, this technique is used for the language tests, to check both the printed final value if evaluation was successful, and any errors and warnings encountered.
+
+It is frequently useful to regenerate the expected output.
+To do that, rerun the failed test with `_NIX_TEST_ACCEPT=1`.
+(At least, this is the convention we've used for `tests/lang.sh`.
+If we add more characterization testing we should always strive to be consistent.)
+
+An interesting situation to document is the case when these tests are "overfitted".
+The language tests are, again, an example of this.
+The expected successful output of evaluation is supposed to be highly stable – we do not intend to make breaking changes to (the stable parts of) the Nix language.
+However, the errors and warnings during evaluation (successful or not) are not stable in this way.
+We are free to change how they are displayed at any time.
+
+It may be surprising that we would test non-normative behavior like diagnostic outputs.
+Diagnostic outputs are indeed not a stable interface, but they still are important to users.
+By recording the expected output, the test suite guards against accidental changes, and ensure the *result* (not just the code that implements it) of the diagnostic code paths are under code review.
+Regressions are caught, and improvements always show up in code review.
+
+To ensure that characterization testing doesn't make it harder to intentionally change these interfaces, there always must be an easy way to regenerate the expected output, as we do with `_NIX_TEST_ACCEPT=1`.
+
 ## Integration tests
 
 The integration tests are defined in the Nix flake under the `hydraJobs.tests` attribute.