diff options
author | Ed Santiago <santiago@redhat.com> | 2021-03-11 13:12:11 -0700 |
---|---|---|
committer | Ed Santiago <santiago@redhat.com> | 2021-03-11 16:21:51 -0700 |
commit | 660a72993c425d9242641a846596b5ca33d6368f (patch) | |
tree | 6a5ccf294bdbbca5c056df8dcffad242f370789b /test/system | |
parent | 8d33bfabaa20612718d909494c2ceec26482d279 (diff) | |
download | podman-660a72993c425d9242641a846596b5ca33d6368f.tar.gz podman-660a72993c425d9242641a846596b5ca33d6368f.tar.bz2 podman-660a72993c425d9242641a846596b5ca33d6368f.zip |
sdnotify tests: try real hard to kill socat processes
podman gating tests are hanging in the new Fedora CI setup;
long and tedious investigation suggests that 'socat' processes
are being left unkilled, which then causes BATS to hang when
it (presumably) runs a final 'wait' in its end cleanup.
The two principal changes are to exec socat in a subshell
with fd3 closed, and to pkill its child processes before
killing the process itself. I don't know if both are needed.
The pkill definitely is; the exec may just be superstition.
Since I've wasted more than a day of PTO time on this, I'm
okay with a little superstition. What I do know is that with
these two changes, my reproducer fails to reproduce in over
one hour of trying (normally it fails within 5 minutes).
AND, update: only rawhide (f35) leaves stray socat processes
behind. f33 and ubuntu do not, so 'pkill -P' fails.
I really have no idea what's going on.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Diffstat (limited to 'test/system')
-rw-r--r-- | test/system/260-sdnotify.bats | 13 |
1 files changed, 11 insertions, 2 deletions
diff --git a/test/system/260-sdnotify.bats b/test/system/260-sdnotify.bats index a5fa0f4e6..8bf49eb1d 100644 --- a/test/system/260-sdnotify.bats +++ b/test/system/260-sdnotify.bats @@ -42,14 +42,22 @@ function _start_socat() { _SOCAT_LOG="$PODMAN_TMPDIR/socat.log" rm -f $_SOCAT_LOG - socat unix-recvfrom:"$NOTIFY_SOCKET",fork \ - system:"(cat;echo) >> $_SOCAT_LOG" & + # Execute in subshell so we can close fd3 (which BATS uses). + # This is a superstitious ritual to try to avoid leaving processes behind, + # and thus prevent CI hangs. + (exec socat unix-recvfrom:"$NOTIFY_SOCKET",fork \ + system:"(cat;echo) >> $_SOCAT_LOG" 3>&-) & _SOCAT_PID=$! } # Stop the socat background process and clean up logs function _stop_socat() { if [[ -n "$_SOCAT_PID" ]]; then + # Kill all child processes, then the process itself. + # This is a superstitious incantation to avoid leaving processes behind. + # The '|| true' is because only f35 leaves behind socat processes; + # f33 (and perhaps others?) behave nicely. ARGH! + pkill -P $_SOCAT_PID || true kill $_SOCAT_PID fi _SOCAT_PID= @@ -57,6 +65,7 @@ function _stop_socat() { if [[ -n "$_SOCAT_LOG" ]]; then rm -f $_SOCAT_LOG fi + _SOCAT_LOG= } # Check that MAINPID=xxxxx points to a running conmon process |