Ensure Conmon is alive before waiting for exit file

This came out of a conversation with Valentin about systemd-managed Podman. He discovered that unit files did not properly handle cases where Conmon was dead - the ExecStopPost `podman rm --force` line was not actually removing the container, but interestingly, adding a `podman cleanup --rm` line would remove it. Both of these commands do the same thing (minus the `podman cleanup --rm` command not force-removing running containers). Without a running Conmon instance, the container process is still running (assuming you killed Conmon with SIGKILL and it had no chance to kill the container it managed), but you can still kill the container itself with `podman stop` - Conmon is not involved, only the OCI Runtime. (`podman rm --force` and `podman stop` use the same code to kill the container). The problem comes when we want to get the container's exit code - we expect Conmon to make us an exit file, which it's obviously not going to do, being dead. The first `podman rm` would fail because of this, but importantly, it would (after failing to retrieve the exit code correctly) set container status to Exited, so that the second `podman cleanup` process would succeed. To make sure the first `podman rm --force` succeeds, we need to catch the case where Conmon is already dead, and instead of waiting for an exit file that will never come, immediately set the Stopped state and remove an error that can be caught and handled. Signed-off-by: Matthew Heon <mheon@redhat.com>
author: Matthew Heon <mheon@redhat.com> 2020-06-08 13:34:12 -0400
committer: Matthew Heon <mheon@redhat.com> 2020-06-08 13:48:29 -0400
commit: 9d964ffb9fc98ed13f6fec974c917b84c175d39a (patch)
tree: d8865dd121fb910363c17c11e6822811388562d1 /libpod/oci_conmon_linux.go
parent: b8acc851bb472c7ea9674bf1bf8ca3812ff2ab24 (diff)
download: podman-9d964ffb9fc98ed13f6fec974c917b84c175d39a.tar.gz
podman-9d964ffb9fc98ed13f6fec974c917b84c175d39a.tar.bz2
podman-9d964ffb9fc98ed13f6fec974c917b84c175d39a.zip
1 files changed, 25 insertions, 0 deletions
diff --git a/libpod/oci_conmon_linux.go b/libpod/oci_conmon_linux.go
index 9c92b036e..0921a532b 100644
--- a/libpod/oci_conmon_linux.go
+++ b/libpod/oci_conmon_linux.go
@@ -669,6 +669,31 @@ func (r *ConmonOCIRuntime) CheckpointContainer(ctr *Container, options Container
 	return utils.ExecCmdWithStdStreams(os.Stdin, os.Stdout, os.Stderr, nil, r.path, args...)
 }
 
+func (r *ConmonOCIRuntime) CheckConmonRunning(ctr *Container) (bool, error) {
+	if ctr.state.ConmonPID == 0 {
+		// If the container is running or paused, assume Conmon is
+		// running. We didn't record Conmon PID on some old versions, so
+		// that is likely what's going on...
+		// Unusual enough that we should print a warning message though.
+		if ctr.ensureState(define.ContainerStateRunning, define.ContainerStatePaused) {
+			logrus.Warnf("Conmon PID is not set, but container is running!")
+			return true, nil
+		}
+		// Container's not running, so conmon PID being unset is
+		// expected. Conmon is not running.
+		return false, nil
+	}
+
+	// We have a conmon PID. Ping it with signal 0.
+	if err := unix.Kill(ctr.state.ConmonPID, 0); err != nil {
+		if err == unix.ESRCH {
+			return false, nil
+		}
+		return false, errors.Wrapf(err, "error pinging container %s conmon with signal 0", ctr.ID())
+	}
+	return true, nil
+}
+
 // SupportsCheckpoint checks if the OCI runtime supports checkpointing
 // containers.
 func (r *ConmonOCIRuntime) SupportsCheckpoint() bool {
author	Matthew Heon <mheon@redhat.com>	2020-06-08 13:34:12 -0400
committer	Matthew Heon <mheon@redhat.com>	2020-06-08 13:48:29 -0400
commit	9d964ffb9fc98ed13f6fec974c917b84c175d39a (patch)
tree	d8865dd121fb910363c17c11e6822811388562d1 /libpod/oci_conmon_linux.go
parent	b8acc851bb472c7ea9674bf1bf8ca3812ff2ab24 (diff)
download	podman-9d964ffb9fc98ed13f6fec974c917b84c175d39a.tar.gz podman-9d964ffb9fc98ed13f6fec974c917b84c175d39a.tar.bz2 podman-9d964ffb9fc98ed13f6fec974c917b84c175d39a.zip