summaryrefslogtreecommitdiff
path: root/libpod/oci_conmon_linux.go
Commit message (Collapse)AuthorAge
* Merge pull request #14654 from cdoern/cgroupopenshift-ci[bot]2022-06-27
|\ | | | | podman cgroup enhancement
| * podman cgroup enhancementcdoern2022-06-24
| | | | | | | | | | | | | | | | | | | | | | currently, setting any sort of resource limit in a pod does nothing. With the newly refactored creation process in c/common, podman ca now set resources at a pod level meaning that resource related flags can now be exposed to podman pod create. cgroupfs and systemd are both supported with varying completion. cgroupfs is a much simpler process and one that is virtually complete for all resource types, the flags now just need to be added. systemd on the other hand has to be handeled via the dbus api meaning that the limits need to be passed as recognized properties to systemd. The properties added so far are the ones that podman pod create supports as well as `cpuset-mems` as this will be the next flag I work on. Signed-off-by: Charlie Doern <cdoern@redhat.com>
* | libpod: fix wait and exit-code logicValentin Rothberg2022-06-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit addresses three intertwined bugs to fix an issue when using Gitlab runner on Podman. The three bug fixes are not split into separate commits as tests won't pass otherwise; avoidable noise when bisecting future issues. 1) Podman conflated states: even when asking to wait for the `exited` state, Podman returned as soon as a container transitioned to `stopped`. The issues surfaced in Gitlab tests to fail [1] as `conmon`'s buffers have not (yet) been emptied when attaching to a container right after a wait. The race window was extremely narrow, and I only managed to reproduce with the Gitlab runner [1] unit tests. 2) The clearer separation between `exited` and `stopped` revealed a race condition predating the changes. If a container is configured for autoremoval (e.g., via `run --rm`), the "run" process competes with the "cleanup" process running in the background. The window of the race condition was sufficiently large that the "cleanup" process has already removed the container and storage before the "run" process could read the exit code and hence waited indefinitely. Address the exit-code race condition by recording exit codes in the main libpod database. Exit codes can now be read from a database. When waiting for a container to exit, Podman first waits for the container to transition to `exited` and will then query the database for its exit code. Outdated exit codes are pruned during cleanup (i.e., non-performance critical) and when refreshing the database after a reboot. An exit code is considered outdated when it is older than 5 minutes. While the race condition predates this change, the waiting process has apparently always been fast enough in catching the exit code due to issue 1): `exited` and `stopped` were conflated. The waiting process hence caught the exit code after the container transitioned to `stopped` but before it `exited` and got removed. 3) With 1) and 2), Podman is now waiting for a container to properly transition to the `exited` state. Some tests did not pass after 1) and 2) which revealed the third bug: `conmon` was executed with its working directory pointing to the OCI runtime bundle of the container. The changed working directory broke resolving relative paths in the "cleanup" process. The "cleanup" process error'ed before actually cleaning up the container and waiting "main" process ran indefinitely - or until hitting a timeout. Fix the issue by executing `conmon` with the same working directory as Podman. Note that fixing 3) *may* address a number of issues we have seen in the past where for *some* reason cleanup processes did not fire. [1] https://gitlab.com/gitlab-org/gitlab-runner/-/issues/27119#note_970712864 Signed-off-by: Valentin Rothberg <vrothberg@redhat.com> [MH: Minor reword of commit message] Signed-off-by: Matthew Heon <mheon@redhat.com>
* | conmon: silence json-file errorValentin Rothberg2022-06-23
|/ | | | | | | We should just silently fall through. The log was flooding the system-service logs when running Gitlab runner. Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
* Merge pull request #14596 from ↵openshift-ci[bot]2022-06-15
|\ | | | | | | | | giuseppe/move-conmon-different-cgroup-system-service libpod: improve check to create conmon cgroup
| * libpod: improve check to create conmon cgroupGiuseppe Scrivano2022-06-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 1951ff168a63157fa2f4711fde283edfc4981ed3 introduced a check so that conmon is not moved to a new cgroup when podman is running inside of a systemd service. This is helpful to integrate podman in systemd so that the spawned conmon lives in the same cgroup as the service that created it. Unfortunately this breaks when podman daemon is running in a systemd service since the same check is in place thus all the conmon processes end up in the same cgroup as the podman daemon. When the podman daemon systemd service stops the conmon processes are also terminated as well as the containers they monitor. Improve the check to exclude podman running as a daemon. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2052697 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* | golangci-lint: enable nolintlintPaul Holzinger2022-06-14
| | | | | | | | | | | | | | | | | | | | The nolintlint linter does not deny the use of `//nolint` Instead it allows us to enforce a common nolint style: - force that a linter name must be specified - do not add a space between `//` and `nolint` - make sure nolint is only used when there is actually a problem Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* | Do not error on signalling a just-stopped containerMatthew Heon2022-06-09
|/ | | | | | | | | | | | | | | | | | | | Previous PR #12394 tried to address this, but made a mistake: containers that have just exited do not move to the Exited state but rather the Stopped state - as such, the code would never have run (there is no way we start `podman kill`, and the container transitions to Exited while we are doing it - that requires holding the container lock, which Kill already does). Fix the code to check Stopped as well (we omit Exited entirely but it's a cheap check and our state logic could change in the future). Also, return an error, instead of exiting cleanly - the Kill failed, after all. ErrCtrStateInvalid is already handled by the sig-proxy logic so there won't be issues. [NO NEW TESTS NEEDED] This fixes a race that I cannot reproduce myself, and I have no idea how we'd repro in CI. Signed-off-by: Matthew Heon <mheon@redhat.com>
* sdnotify: send MAINPID only onceValentin Rothberg2022-05-12
| | | | | | | | | Send the main PID only once. Previously, `(*Container).start()` and the conmon handler sent them ~simultaneously and went into a race. I noticed the issue while debugging a WIP PR. Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
* enable gocritic linterPaul Holzinger2022-04-26
| | | | | | | | | | | | | | | | | | | | | | The linter ensures a common code style. - use switch/case instead of else if - use if instead of switch/case for single case statement - add space between comment and text - detect the use of defer with os.Exit() - use short form var += "..." instead of var = var + "..." - detect problems with append() ``` newSlice := append(orgSlice, val) ``` This could lead to nasty bugs because the orgSlice will be changed in place if it has enough capacity too hold the new elements. Thus we newSlice might not be a copy. Of course most of the changes are just cosmetic and do not cause any logic errors but I think it is a good idea to enforce a common style. This should help maintainability. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* enable unparam linterPaul Holzinger2022-04-25
| | | | | | | The unparam linter is useful to detect unused function parameters and return values. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* Merge pull request #13093 from 0xC0ncord/selinux-conmon-agnosticOpenShift Merge Robot2022-04-12
|\ | | | | selinux: remove explicit range transition when starting conmon
| * selinux: remove explicit range transition when starting conmonKenton Groombridge2022-03-08
| | | | | | | | | | | | | | | | | | | | | | | | Do not explicitly transition to s0 when starting conmon. Instead, the policy should implement this behavior. [NO NEW TESTS NEEDED] This is dependent on the SELinux policy to implement the desired behavior. Additionally, entirely custom SELinux policies may choose to implement the behavior differently. Signed-off-by: Kenton Groombridge <me@concord.sh>
* | Add option for pod logs to display different colors per container.gcalin2022-03-29
| | | | | | | | | | Signed-off-by: Krzysztof Baran <krysbaran@gmail.com> Signed-off-by: gcalin <caling@protonmail.com>
* | readConmonPipeData: try to improve errorPaul Holzinger2022-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue #10927 reports `container create failed (no logs from conmon): EOF` errors. Since we do not know the root cause it would be helpful to try to get as much info as possible out of the error. (buffer).ReadBytes() will return the bytes read even when an error occurs. So when we get an EOF we could still have some valuable information in the buffer. Lets try to unmarshal them and if this fails we add the bytes to the error message. This does not fix the issue but it might help us getting a better error. [NO NEW TESTS NEEDED] Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* | linter: enable nilerrValentin Rothberg2022-03-22
| | | | | | | | | | | | | | A number of cases looked suspicious, so I marked them with `FIXME`s to leave some breadcrumbs. Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
* | linter: enable wastedassignValentin Rothberg2022-03-22
| | | | | | | | Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
* | bump golangci-lint to v1.45.0Valentin Rothberg2022-03-21
| | | | | | | | | | | | | | | | | | | | * supports Go 1.18 * disable a number of new linters * fix minor stylecheck issues [NO NEW TESTS NEEDED] Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
* | Merge pull request #13552 from vrothberg/go1.18OpenShift Merge Robot2022-03-18
|\ \ | | | | | | go fmt: use go 1.18 conditional-build syntax
| * | go fmt: use go 1.18 conditional-build syntaxValentin Rothberg2022-03-18
| |/ | | | | | | Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
* / podman machine: remove hostip from portPaul Holzinger2022-03-17
|/ | | | | | | | | | | | | | | | Inside the podman machine vm we always remove the hostip from the port mapping because this should only be used on the actual host. Otherwise you run into issues when we would bind 127.0.0.1 or try to bind a host address that is not available in the VM. This was already done for cni/netavark ports and slirp4netns but not for the port bindings inside libpod which are only used as root. [NO NEW TESTS NEEDED] We still do not have machine tests! Fixes #13543 Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* Propagate $CONTAINERS_CONF to conmonDavid Gibson2022-02-18
| | | | | | | | | | | | | | | | | | | | | The CONTAINERS_CONF environment variable can be used to override the configuration file, which is useful for testing. However, at the moment this variable is not propagated to conmon. That means in particular, that conmon can't propagate it back to podman when invoking its --exit-command. The mismatch in configuration between the starting and cleaning up podman instances can cause a variety of errors. This patch also adds two related test cases. One checks explicitly that the correct CONTAINERS_CONF value appears in conmon's environment. The other checks for a possible specific impact of this bug: if we use a nonstandard name for the runtime (even if its path is just a regular crun), then the podman container cleanup invoked at container exit will fail. That has the effect of meaning that a container started with -d --rm won't be correctly removed once complete. Fixes #12917 Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* compat attach: fix write on closed channelValentin Rothberg2022-01-18
| | | | | | | | | Waiting on an initialized sync.WaitGroup returns immediately. Hence, move the goroutine to wait and close *after* reading the logs. Fixes: #12904 Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
* bump go module to version 4Valentin Rothberg2022-01-18
| | | | | | | | | | | | | Automated for .go files via gomove [1]: `gomove github.com/containers/podman/v3 github.com/containers/podman/v4` Remaining files via vgrep [2]: `vgrep github.com/containers/podman/v3` [1] https://github.com/KSubedi/gomove [2] https://github.com/vrothberg/vgrep Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
* Standardize on capatalized CgroupsDaniel J Walsh2022-01-14
| | | | Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* add OCI Runtime name to errorsDaniel J Walsh2022-01-06
| | | | | | | | | It would be easier to diagnose OCI runtime errors if the error actually had the name of the OCI runtime that produced the error. [NO NEW TESTS NEEDED] Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* Update vendor or containers/common moving pkg/cgroups thereDaniel J Walsh2021-12-07
| | | | | | | [NO NEW TESTS NEEDED] This is just moving pkg/cgroups out so existing tests should be fine. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* checkpoint do not modify XDG_RUNTIME_DIRPaul Holzinger2021-11-24
| | | | | | | | | | We should not modify the XDG_RUNTIME_DIR env value during runtime of libpod, this can cause hard to find bugs. Only set it for the OCI runtime, this matches the other commands such as start, stop, kill... [NO NEW TESTS NEEDED] Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* libpod: leave thread locked on errorsGiuseppe Scrivano2021-11-24
| | | | | | | | | | if the SELinux label could not be restored correctly, leave the OS thread locked so that it is terminated once it returns to the threads pool. [NO NEW TESTS NEEDED] the failure is hard to reproduce Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* Unset SocketLabel after system finishes checkpointingDaniel J Walsh2021-11-23
| | | | | | | | | | | | This should fix the SELinux issue we are seeing with talking to /run/systemd/private. Fixes: https://github.com/containers/podman/issues/12362 Also unset the XDG_RUNTIME_DIR if set, since we don't know when running as a service if this will cause issue.s Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* Warn on failing to update container statusMatthew Heon2021-11-23
| | | | | | | | | | | | | failed to send a signal to the container's PID1, but ignored the results of that update. That's generally bad practice, since even if we can't directly take action on an error, we should still make an effort to report it for debugging purposes. I used Infof instead of something more serious to avoid duplicate reporting to the user if something has gone seriously wrong. [NO NEW TESTS NEEDED] this is just adding additional error reporting. Signed-off-by: Matthew Heon <mheon@redhat.com>
* oci: ack crun output when container is not thereAditya Rajan2021-11-23
| | | | | | | | | `crun status ctrid` outputs `No such file or directory` when container is not there so podman much ack it. [NO NEW TESTS NEEDED] Signed-off-by: Aditya Rajan <arajan@redhat.com>
* oci: exit gracefully if container is already deadAditya Rajan2021-11-23
| | | | | | | | | | While trying to kill a container with a `signal` we cant do anything if container is already dead so `exit` gracefully instead of trying to delete container again. Get container status from runtime. [ NO NEW TESTS NEEDED ] Signed-off-by: Aditya Rajan <arajan@redhat.com>
* Merge pull request #12354 from Luap99/exit-commandOpenShift Merge Robot2021-11-18
|\ | | | | Do not store the exit command in container config
| * Do not store the exit command in container configPaul Holzinger2021-11-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a problem with creating and storing the exit command when the container was created. It only contains the options the container was created with but NOT the options the container is started with. One example would be a CNI network config. If I start a container once, then change the cni config dir with `--cni-config-dir` ans start it a second time it will start successfully. However the exit command still contains the wrong `--cni-config-dir` because it was not updated. To fix this we do not want to store the exit command at all. Instead we create it every time the conmon process for the container is startet. This guarantees us that the container cleanup process is startet with the correct settings. [NO NEW TESTS NEEDED] Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* | Add --file-locks checkpoint/restore optionRadostin Stoyanov2021-11-18
|/ | | | | | | | CRIU supports checkpoint/restore of file locks. This feature is required to checkpoint/restore containers running applications such as MySQL. Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
* Added optional container restore statisticsAdrian Reber2021-11-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds the parameter '--print-stats' to 'podman container restore'. With '--print-stats' Podman will measure how long Podman itself, the OCI runtime and CRIU requires to restore a checkpoint and print out these information. CRIU already creates process restore statistics which are just read in addition to the added measurements. In contrast to just printing out the ID of the restored container, Podman will now print out JSON: # podman container restore --latest --print-stats { "podman_restore_duration": 305871, "container_statistics": [ { "Id": "47b02e1d474b5d5fe917825e91ac653efa757c91e5a81a368d771a78f6b5ed20", "runtime_restore_duration": 140614, "criu_statistics": { "forking_time": 5, "restore_time": 67672, "pages_restored": 14 } } ] } The output contains 'podman_restore_duration' which contains the number of microseconds Podman required to restore the checkpoint. The output also includes 'runtime_restore_duration' which is the time the runtime needed to restore that specific container. Each container also includes 'criu_statistics' which displays the timing information collected by CRIU. Signed-off-by: Adrian Reber <areber@redhat.com>
* Added optional container checkpointing statisticsAdrian Reber2021-11-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds the parameter '--print-stats' to 'podman container checkpoint'. With '--print-stats' Podman will measure how long Podman itself, the OCI runtime and CRIU requires to create a checkpoint and print out these information. CRIU already creates checkpointing statistics which are just read in addition to the added measurements. In contrast to just printing out the ID of the checkpointed container, Podman will now print out JSON: # podman container checkpoint --latest --print-stats { "podman_checkpoint_duration": 360749, "container_statistics": [ { "Id": "25244244bf2efbef30fb6857ddea8cb2e5489f07eb6659e20dda117f0c466808", "runtime_checkpoint_duration": 177222, "criu_statistics": { "freezing_time": 100657, "frozen_time": 60700, "memdump_time": 8162, "memwrite_time": 4224, "pages_scanned": 20561, "pages_written": 2129 } } ] } The output contains 'podman_checkpoint_duration' which contains the number of microseconds Podman required to create the checkpoint. The output also includes 'runtime_checkpoint_duration' which is the time the runtime needed to checkpoint that specific container. Each container also includes 'criu_statistics' which displays the timing information collected by CRIU. Signed-off-by: Adrian Reber <areber@redhat.com>
* oci: rename sub-cgroup to runtime instead of supervisorGiuseppe Scrivano2021-10-28
| | | | | | | | | | | | | we are having a hard time figuring out a failure in the CI: https://github.com/containers/podman/issues/11191 Rename the sub-cgroup created here, so we can be certain the error is caused by this part. [NO NEW TESTS NEEDED] we need this for the CI. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* Warn if podman stop timeout expires that sigkill was sentDaniel J Walsh2021-10-08
| | | | | | | | | Note: the Warning message will not come to podman-remote. It would be difficult to plumb, and not really worth the effort. Fixes: https://github.com/containers/podman/issues/11854 Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* Merge pull request #11878 from mheon/stop_stoppingOpenShift Merge Robot2021-10-06
|\ | | | | Allow `podman stop` to be run on Stopping containers
| * Ensure `podman ps --sync` functionsMatthew Heon2021-10-06
| | | | | | | | | | | | | | | | | | | | | | | | The backend for `ps --sync` has been nonfunctional for a long while now - probably since v2.0. It's questionable how useful the flag is in modern Podman (the original case it was intended to catch, Conmon gone via SIGKILL, should be handled now via pinging the process with a signal to ensure it's still alive) but having the ability to force a refresh of container state from the OCI runtime is still useful. Signed-off-by: Matthew Heon <mheon@redhat.com>
* | libpod: fix race when closing STDINPaul Holzinger2021-10-06
|/ | | | | | | | | | | | | | | | There is a race where `conn.Close()` was called before `conn.CloseWrite()`. In this case `CloseWrite` will fail and an useless error is printed. To fix this we move the the `CloseWrite()` call to the same goroutine to remove the race. This ensures that `CloseWrite()` is called before `Close()` and never afterwards. Also fixed podman-remote run where the STDIN was never was closed. This is causing flakes in CI testing. [NO TESTS NEEDED] Fixes #11856 Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* Merge pull request #11390 from giuseppe/logging-passthroughOpenShift Merge Robot2021-09-29
|\ | | | | logging: new mode -l passthrough
| * logging: new mode -l passthroughGiuseppe Scrivano2021-09-27
| | | | | | | | | | | | | | | | | | | | it allows to pass the current std streams down to the container. conmon support: https://github.com/containers/conmon/pull/289 [NO TESTS NEEDED] it needs a new conmon. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* | libpod: do not call (*container).Config()Valentin Rothberg2021-09-28
|/ | | | | | | | | | | | Access the container's config field directly inside of libpod instead of calling `Config()` which in turn creates expensive JSON deep copies. Accessing the field directly drops memory consumption of a simple `podman run --rm busybox true` from 1245kB to 410kB. [NO TESTS NEEDED] Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
* standardize logrus messages to upper caseDaniel J Walsh2021-09-22
| | | | | | | | Remove ERROR: Error stutter from logrus messages also. [ NO TESTS NEEDED] This is just code cleanup. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* libpod: rootful close binded portsPaul Holzinger2021-09-14
| | | | | | | | | | | | | | | | | For rootful users ports are forwarded via iptables. To make sure no other process tries to use them, libpod will bind the ports and pass the fds to conmon. There seems to be race when a container is restarted because libpod tries to bind the port before the conmon process exited. The problem only hapens with the podman service because it keeps the connection open. Once we have the fd and passed it to conmon the podman service should close the connection. To verify run `sudo ss -tulpn` and check that only the conmon process keeps the port open. Previously you would also see the podman server process listed. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* Fix conmon attach socket buffer sizePaul Holzinger2021-09-09
| | | | | | | | | | | | | | | The conmon buffer size is 8192, however the attach socket needs two extra bytes. The first byte of each message will be the STREAM type. The last byte is a null byte. So when we want to read 8192 message bytes we need to read 8193 bytes since the first one is special. check https://github.com/containers/conmon/blob/1ef246896b4f6566964ed861b98cd32d0e7bf7a2/src/ctr_stdio.c#L101-L107 This problem can be seen in podman-remote run/exec when it prints output with 8192 or more bytes. The output will miss the 8192 byte. Fixes #11496 Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* pass LISTEN_* environment into containerValentin Rothberg2021-08-31
| | | | | | | | | | | | | | | Make sure that Podman passes the LISTEN_* environment into containers. Similar to runc, LISTEN_PID is set to 1. Also remove conditionally passing the LISTEN_FDS as extra files. The condition was wrong (inverted) and introduced to fix #3572 which related to running under varlink which has been dropped entirely with Podman 3.0. Note that the NOTIFY_SOCKET and LISTEN_* variables are cleared when running `system service`. Fixes: #10443 Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> Signed-off-by: Valentin Rothberg <rothberg@redhat.com>