aboutsummaryrefslogtreecommitdiff
path: root/src/libutil/monitor-fd.hh
AgeCommit message (Collapse)Author
2024-07-13daemon: remove workaround for macOS kernel bug that seems fixedJade Lovelace
This was filed as https://github.com/nixos/nix/issues/7584, but as far as I can tell, the previous solution of POLLHUP works just fine on macOS 14. I've also tested on an ancient machine with macOS 10.15.7, which also has POLLHUP work correctly. It's possible this might regress some older versions of macOS that have a kernel bug, but I went looking through the history on the sources and didn't find anything that looked terribly convincingly like a bug fix between 2020 and today. If such a broken version exists, it seems pretty reasonable to suggest simply updating the OS. Change-Id: I178a038baa000f927ea2cbc4587d69d8ab786843
2024-07-13daemon: fix a crash bug "FATAL: exception not rethrown"Jade Lovelace
This is caused by pthread_cancel effectively throwing a not-specifically-identifiable C++ exception into the targeted thread, which, if it is not rethrown, terminates the process entirely. This is rather "impolite" behaviour, we would say. But thread cancellation is *always* busted, and we should simply not use it where unnecessary. It's particularly unnecessary when what we *actually* need it for is, err, interrupting a poll(2). That can in turn be achieved by simply listening to more stuff in the poll, namely, a pipe, which we send a character to when needing to stop the thread. While looking at this code, we also investigated whether any of the poll() madness is required, or was even *ever* required. Curiously we found in the XNU kernel source code that the thing about needing to listen to POLLHUP is probably *correct*, but switching it to POLLRDNORM should not have made any difference at all. We've left a FIXME to look into that further because what's written here is super janky. https://github.com/apple-oss-distributions/xnu/blob/94d3b452840153a99b38a3a9659680b2a006908e/bsd/kern/sys_generic.c#L1751-L1758 This is the crash on some Hydra machines: Thread 1 (Thread 0x7f56b77776c0 (LWP 955542) (Exiting)): 0 0x00007f56b8e9b7dc in __pthread_kill_implementation () from /nix/store/m71p7f0nymb19yn1dascklyya2i96jfw-glibc-2.39-52/lib/libc.so.6 1 0x00007f56b8e49516 in raise () from /nix/store/m71p7f0nymb19yn1dascklyya2i96jfw-glibc-2.39-52/lib/libc.so.6 2 0x00007f56b8e31935 in abort () from /nix/store/m71p7f0nymb19yn1dascklyya2i96jfw-glibc-2.39-52/lib/libc.so.6 3 0x00007f56b8e327f3 in __libc_message_impl.cold () from /nix/store/m71p7f0nymb19yn1dascklyya2i96jfw-glibc-2.39-52/lib/libc.so.6 4 0x00007f56b8e8e8e9 in __libc_fatal () from /nix/store/m71p7f0nymb19yn1dascklyya2i96jfw-glibc-2.39-52/lib/libc.so.6 5 0x00007f56b8ea23c4 in unwind_cleanup () from /nix/store/m71p7f0nymb19yn1dascklyya2i96jfw-glibc-2.39-52/lib/libc.so.6 6 0x00007f56b9d2a1b8 in nix::triggerInterrupt() [clone .cold] () from /nix/store/sahgw550p621m9dy1pd7whl9c5g1g0p7-lix-2.90.0-rc1/lib/liblixutil.so 7 0x00007f56b990ac9d in std::thread::_State_impl<std::thread::_Invoker<std::tuple<nix::MonitorFdHup::MonitorFdHup(int)::{lambda()#1}> > >::_M_run() () from /nix/store/sahgw550p621m9dy1pd7whl9c5g1g0p7-lix-2.90.0-rc1/lib/liblixstore.so 8 0x00007f56b90e86d3 in execute_native_thread_routine () from /nix/store/c6r62m84hywf4i6qq1h28f13zv38yqyp-gcc-13.3.0-lib/lib/libstdc++.so.6 9 0x00007f56b8e99a42 in start_thread () from /nix/store/m71p7f0nymb19yn1dascklyya2i96jfw-glibc-2.39-52/lib/libc.so.6 10 0x00007f56b8f1905c in clone3 () from /nix/store/m71p7f0nymb19yn1dascklyya2i96jfw-glibc-2.39-52/lib/libc.so.6 As for testing, we've started a daemon with this change and verified it deals with HUPs correctly on x86_64-linux, but I don't think we can easily test the destructor behaviour without whatever Hydra was doing that broke. Change-Id: I29c7de0425674494b6e43c075810126c3ff77363
2024-03-11util.hh: split out signals stuffJade Lovelace
Copies part of the changes of ac89bb064aeea85a62b82a6daf0ecca7190a28b7 Change-Id: I9ce601875cd6d4db5eb1132d7835c5bab9f126d8
2023-03-31Ensure all headers have `#pragma once` and are in API docsJohn Ericson
`///@file` makes them show up in the internal API dos. A tiny few were missing `#pragma once`.
2023-01-11MonitorFdHup: Make it work on macOS againEelco Dolstra
It appears that on current macOS versions, our use of poll() to detect client disconnects no longer works. As a workaround, poll() for POLLRDNORM, since this *will* wake up when the client has disconnected. The downside is that it also wakes up when input is available. So just sleep for a bit in that case. This means that on macOS, a client disconnect may take up to a second to be detected, but that's better than not being detected at all. Fixes #7584.
2018-02-14monitor-fds: Fix on macOS.Shea Levy
Fixes #1871.
2017-01-26Fix interrupt handlingEelco Dolstra
2014-12-14PedantryEelco Dolstra
2014-12-09Explicitly include required C headersMarko Durkovic
2014-07-24Use pthread_cancel instead of a signalEelco Dolstra
Signal handlers are process-wide, so sending SIGINT to the monitor thread will cause the normal SIGINT handler to run. This sets the isInterrupted flag, which is not what we want. So use pthread_cancel instead.
2014-07-24Fix bogus pass by referenceEelco Dolstra
http://hydra.nixos.org/build/12711659
2014-07-24More debuggingEelco Dolstra
2014-07-24Add some assertionsEelco Dolstra
2014-07-23nix-daemon: Use a thread instead of SIGPOLL to catch client disconnectsEelco Dolstra
The thread calls poll() to wait until a HUP (or other error event) happens on the client connection. If so, it sends SIGINT to the main thread, which is then cleaned up normally. This is much nicer than messing around with SIGPOLL.