aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJade Lovelace <lix@jade.fyi>2024-08-07 02:00:50 -0700
committerJade Lovelace <lix@jade.fyi>2024-08-07 02:52:00 -0700
commit1437d3df15c1efae3164ae45c3285bd9959def5f (patch)
treee2eac9bba68e1976d4ce747102a3ee4664a93ce6
parent529eed74c477eee8567f28379210cd47f0b4e18f (diff)
darwin: workaround PROC_PIDLISTFDS on processes with no fds
This has been causing various seemingly spurious CI failures as well as some failures on people running tests on beta builds. lix> ++(nix-collect-garbage-dry-run.sh:20) nix-store --gc --print-dead lix> ++(nix-collect-garbage-dry-run.sh:20) wc -l lix> finding garbage collector roots... lix> error: Listing pid 87261 file descriptors: Undefined error: 0 There is no real way to write a proper test for this, other than to start a process like the following: int main(void) { for (int i = 0; i < 1000; ++i) { close(i); } sleep(10000); } and then let Lix's gc look at it. I have a relatively high confidence this *will* fix the problem since I have manually confirmed the behaviour of the libproc call is as-unexpected, and it would perfectly explain the observed symptom. Fixes: https://git.lix.systems/lix-project/lix/issues/446 Change-Id: I67669b98377af17895644b3bafdf42fc33abd076
-rw-r--r--doc/manual/rl-next/haunted-gc-macos.md15
-rw-r--r--src/libstore/platform/darwin.cc17
2 files changed, 31 insertions, 1 deletions
diff --git a/doc/manual/rl-next/haunted-gc-macos.md b/doc/manual/rl-next/haunted-gc-macos.md
new file mode 100644
index 000000000..3ce912b2d
--- /dev/null
+++ b/doc/manual/rl-next/haunted-gc-macos.md
@@ -0,0 +1,15 @@
+---
+synopsis: "Fix unexpectedly-successful GC failures on macOS"
+cls: 1723
+issues: fj#446
+credits: jade
+category: Fixes
+---
+
+Has the following happened to you on macOS? This failure has been successfully eliminated, thanks to our successful deployment of advanced successful-failure detection technology (it's just `if (failed && errno == 0)`. Patent pending<sup>not really</sup>):
+
+```
+$ nix-store --gc --print-dead
+finding garbage collector roots...
+error: Listing pid 87261 file descriptors: Undefined error: 0
+```
diff --git a/src/libstore/platform/darwin.cc b/src/libstore/platform/darwin.cc
index 1b591fde3..1f7e9be23 100644
--- a/src/libstore/platform/darwin.cc
+++ b/src/libstore/platform/darwin.cc
@@ -56,12 +56,27 @@ void DarwinLocalStore::findPlatformRoots(UncheckedRoots & unchecked)
while (fdBufSize > fds.size() * sizeof(struct proc_fdinfo)) {
// Reserve some extra size so we don't fail too much
fds.resize((fdBufSize + fdBufSize / 8) / sizeof(struct proc_fdinfo));
+ errno = 0;
fdBufSize = proc_pidinfo(
pid, PROC_PIDLISTFDS, 0, fds.data(), fds.size() * sizeof(struct proc_fdinfo)
);
+ // errno == 0???! Yes, seriously. This is because macOS has a
+ // broken syscall wrapper for proc_pidinfo that has no way of
+ // dealing with the system call successfully returning 0. It
+ // takes the -1 error result from the errno-setting syscall
+ // wrapper and turns it into a 0 result. But what if the system
+ // call actually returns 0? Then you get an errno of success.
+ //
+ // https://github.com/apple-opensource/xnu/blob/4f43d4276fc6a87f2461a3ab18287e4a2e5a1cc0/libsyscall/wrappers/libproc/libproc.c#L100-L110
+ // https://git.lix.systems/lix-project/lix/issues/446#issuecomment-5483
+ // FB14695751
if (fdBufSize <= 0) {
- throw SysError("Listing pid %1% file descriptors", pid);
+ if (errno == 0) {
+ break;
+ } else {
+ throw SysError("Listing pid %1% file descriptors", pid);
+ }
}
}
fds.resize(fdBufSize / sizeof(struct proc_fdinfo));