Age | Commit message (Collapse) | Author |
|
This was filed as https://github.com/nixos/nix/issues/7584, but as far
as I can tell, the previous solution of POLLHUP works just fine on macOS
14. I've also tested on an ancient machine with macOS 10.15.7, which
also has POLLHUP work correctly.
It's possible this might regress some older versions of macOS that have
a kernel bug, but I went looking through the history on the sources and
didn't find anything that looked terribly convincingly like a bug fix
between 2020 and today. If such a broken version exists, it seems pretty
reasonable to suggest simply updating the OS.
Change-Id: I178a038baa000f927ea2cbc4587d69d8ab786843
|
|
This is caused by pthread_cancel effectively throwing a
not-specifically-identifiable C++ exception into the targeted thread,
which, if it is not rethrown, terminates the process entirely.
This is rather "impolite" behaviour, we would say. But thread
cancellation is *always* busted, and we should simply not use it where
unnecessary. It's particularly unnecessary when what we *actually* need
it for is, err, interrupting a poll(2).
That can in turn be achieved by simply listening to more stuff in the
poll, namely, a pipe, which we send a character to when needing to
stop the thread.
While looking at this code, we also investigated whether any of the
poll() madness is required, or was even *ever* required. Curiously we
found in the XNU kernel source code that the thing about needing to
listen to POLLHUP is probably *correct*, but switching it to POLLRDNORM
should not have made any difference at all. We've left a FIXME to look
into that further because what's written here is super janky.
https://github.com/apple-oss-distributions/xnu/blob/94d3b452840153a99b38a3a9659680b2a006908e/bsd/kern/sys_generic.c#L1751-L1758
This is the crash on some Hydra machines:
Thread 1 (Thread 0x7f56b77776c0 (LWP 955542) (Exiting)):
0 0x00007f56b8e9b7dc in __pthread_kill_implementation () from /nix/store/m71p7f0nymb19yn1dascklyya2i96jfw-glibc-2.39-52/lib/libc.so.6
1 0x00007f56b8e49516 in raise () from /nix/store/m71p7f0nymb19yn1dascklyya2i96jfw-glibc-2.39-52/lib/libc.so.6
2 0x00007f56b8e31935 in abort () from /nix/store/m71p7f0nymb19yn1dascklyya2i96jfw-glibc-2.39-52/lib/libc.so.6
3 0x00007f56b8e327f3 in __libc_message_impl.cold () from /nix/store/m71p7f0nymb19yn1dascklyya2i96jfw-glibc-2.39-52/lib/libc.so.6
4 0x00007f56b8e8e8e9 in __libc_fatal () from /nix/store/m71p7f0nymb19yn1dascklyya2i96jfw-glibc-2.39-52/lib/libc.so.6
5 0x00007f56b8ea23c4 in unwind_cleanup () from /nix/store/m71p7f0nymb19yn1dascklyya2i96jfw-glibc-2.39-52/lib/libc.so.6
6 0x00007f56b9d2a1b8 in nix::triggerInterrupt() [clone .cold] () from /nix/store/sahgw550p621m9dy1pd7whl9c5g1g0p7-lix-2.90.0-rc1/lib/liblixutil.so
7 0x00007f56b990ac9d in std::thread::_State_impl<std::thread::_Invoker<std::tuple<nix::MonitorFdHup::MonitorFdHup(int)::{lambda()#1}> > >::_M_run() () from /nix/store/sahgw550p621m9dy1pd7whl9c5g1g0p7-lix-2.90.0-rc1/lib/liblixstore.so
8 0x00007f56b90e86d3 in execute_native_thread_routine () from /nix/store/c6r62m84hywf4i6qq1h28f13zv38yqyp-gcc-13.3.0-lib/lib/libstdc++.so.6
9 0x00007f56b8e99a42 in start_thread () from /nix/store/m71p7f0nymb19yn1dascklyya2i96jfw-glibc-2.39-52/lib/libc.so.6
10 0x00007f56b8f1905c in clone3 () from /nix/store/m71p7f0nymb19yn1dascklyya2i96jfw-glibc-2.39-52/lib/libc.so.6
As for testing, we've started a daemon with this change and verified it
deals with HUPs correctly on x86_64-linux, but I don't think we can
easily test the destructor behaviour without whatever Hydra was
doing that broke.
Change-Id: I29c7de0425674494b6e43c075810126c3ff77363
|
|
Copies part of the changes of ac89bb064aeea85a62b82a6daf0ecca7190a28b7
Change-Id: I9ce601875cd6d4db5eb1132d7835c5bab9f126d8
|
|
`///@file` makes them show up in the internal API dos. A tiny few were
missing `#pragma once`.
|
|
It appears that on current macOS versions, our use of poll() to detect
client disconnects no longer works. As a workaround, poll() for
POLLRDNORM, since this *will* wake up when the client has
disconnected. The downside is that it also wakes up when input is
available. So just sleep for a bit in that case. This means that on
macOS, a client disconnect may take up to a second to be detected,
but that's better than not being detected at all.
Fixes #7584.
|
|
Fixes #1871.
|
|
|
|
|
|
|
|
Signal handlers are process-wide, so sending SIGINT to the monitor
thread will cause the normal SIGINT handler to run. This sets the
isInterrupted flag, which is not what we want. So use pthread_cancel
instead.
|
|
http://hydra.nixos.org/build/12711659
|
|
|
|
|
|
The thread calls poll() to wait until a HUP (or other error event)
happens on the client connection. If so, it sends SIGINT to the main
thread, which is then cleaned up normally. This is much nicer than
messing around with SIGPOLL.
|