aboutsummaryrefslogtreecommitdiff
path: root/libpod/oci_conmon_linux.go
Commit message (Collapse)AuthorAge
* Store cgroup manager on a per-container basisMatthew Heon2020-10-08
| | | | | | | | | | | | | | | | | | | | | When we create a container, we assign a cgroup parent based on the current cgroup manager in use. This parent is only usable with the cgroup manager the container is created with, so if the default cgroup manager is later changed or overridden, the container will not be able to start. To solve this, store the cgroup manager that created the container in container configuration, so we can guarantee a container with a systemd cgroup parent will always be started with systemd cgroups. Unfortunately, this is very difficult to test in CI, due to the fact that we hard-code cgroup manager on all invocations of Podman in CI. Fixes #7830 Signed-off-by: Matthew Heon <matthew.heon@pm.me>
* Merge pull request #7929 from kolyshkin/nits-errOpenShift Merge Robot2020-10-06
|\ | | | | Nits
| * Lowercase some errorsKir Kolyshkin2020-10-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit is courtesy of ``` for f in $(git ls-files *.go | grep -v ^vendor/); do \ sed -i 's/\(errors\..*\)"Error /\1"error /' $f; done for f in $(git ls-files *.go | grep -v ^vendor/); do \ sed -i 's/\(errors\..*\)"Failed to /\1"failed to /' $f; done ``` etc. Self-reviewed using `git diff --word-diff`, found no issues. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
| * Remove excessive error wrappingKir Kolyshkin2020-10-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case os.Open[File], os.Mkdir[All], ioutil.ReadFile and the like fails, the error message already contains the file name and the operation that fails, so there is no need to wrap the error with something like "open %s failed". While at it - replace a few places with os.Open, ioutil.ReadAll with ioutil.ReadFile. - replace errors.Wrapf with errors.Wrap for cases where there are no %-style arguments. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
* | Support max_size logoptionsDaniel J Walsh2020-10-05
|/ | | | | | | | Docker supports log-opt max_size and so does conmon (ALthough poorly). Adding support for this allows users to at least make sure their containers logs do not become a DOS vector. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* HTTP Attach: Wait until both STDIN and STDOUT finishMatthew Heon2020-09-24
| | | | | | | | | | | | | | | In the old code, there was a chance that we could return when only one of STDIN or STDOUT had finished - this could lead to us dropping either input to the container, or output from it, in the case that one stream terminated early. To resolve this, use separate channels to return STDOUT and STDIN errors, and track which ones have returned cleanly to ensure that we need bith in order to return from the HTTP attach function and pass control back to the HTTP handler (which would assume we exited cleanly and close the client's attach connection). Signed-off-by: Matthew Heon <mheon@redhat.com>
* Fix a bug where log-driver json-file was made no logsMatthew Heon2020-09-23
| | | | | | | | | When we added the None log driver, it was accidentally added in the middle of a set of Fallthrough stanzas which all should have led to k8s-file, so that JSON file logging accidentally caused no logging to be selected instead of k8s-file. Signed-off-by: Matthew Heon <mheon@redhat.com>
* Merge pull request #7403 from QiWang19/runtime-flagOpenShift Merge Robot2020-09-11
|\ | | | | Add global options --runtime-flags
| * Add global options --runtime-flagsQi Wang2020-09-04
| | | | | | | | | | | | Add global options --runtime-flags for setting options to container runtime. Signed-off-by: Qi Wang <qiwan@redhat.com>
* | rootless: support `podman network create` (CNI-in-slirp4netns)Akihiro Suda2020-09-09
|/ | | | | | | | | | | | | | | | | Usage: ``` $ podman network create foo $ podman run -d --name web --hostname web --network foo nginx:alpine $ podman run --rm --network foo alpine wget -O - http://web.dns.podman Connecting to web.dns.podman (10.88.4.6:80) ... <h1>Welcome to nginx!</h1> ... ``` See contrib/rootless-cni-infra for the design. Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
* Fix up some error messagesMatthew Heon2020-08-27
| | | | | | | | | We have a lot of 'cannot stat %s' errors in our codebase. These are terrible and confusing and utterly useless without context. Add some context to a few of them so we actually know what part of the code is failing. Signed-off-by: Matthew Heon <mheon@redhat.com>
* Send HTTP Hijack headers after successful attachMatthew Heon2020-08-27
| | | | | | | | | | | | | | | | | | | | | | | | | Our previous flow was to perform a hijack before passing a connection into Libpod, and then Libpod would attach to the container's attach socket and begin forwarding traffic. A problem emerges: we write the attach header as soon as the attach complete. As soon as we write the header, the client assumes that all is ready, and sends a Start request. This Start may be processed *before* we successfully finish attaching, causing us to lose output. The solution is to handle hijacking inside Libpod. Unfortunately, this requires a downright extensive refactor of the Attach and HTTP Exec StartAndAttach code. I think the result is an improvement in some places (a lot more errors will be handled with a proper HTTP error code, before the hijack occurs) but other parts, like the relocation of printing container logs, are just *bad*. Still, we need this fixed now to get CI back into good shape... Fixes #7195 Signed-off-by: Matthew Heon <matthew.heon@pm.me>
* fix close fds of run --preserve-fdsQi Wang2020-07-30
| | | | | | | Test flakes mentioned in #6987 might be caused by uncorrect closing of file descriptor. Fix the code to close file descriptors for podman run since it may close those used by other processes. Signed-off-by: Qi Wang <qiwan@redhat.com>
* Switch all references to github.com/containers/libpod -> podmanDaniel J Walsh2020-07-28
| | | | Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* Remove all instances of named return "err" from LibpodMatthew Heon2020-07-09
| | | | | | | | | | | | | | | | | This was inspired by https://github.com/cri-o/cri-o/pull/3934 and much of the logic for it is contained there. However, in brief, a named return called "err" can cause lots of code confusion and encourages using the wrong err variable in defer statements, which can make them work incorrectly. Using a separate name which is not used elsewhere makes it very clear what the defer should be doing. As part of this, remove a large number of named returns that were not used anywhere. Most of them were once needed, but are no longer necessary after previous refactors (but were accidentally retained). Signed-off-by: Matthew Heon <matthew.heon@pm.me>
* Implement --sdnotify cmdline option to control sd-notify behaviorJoseph Gooch2020-07-06
| | | | | | | | | | | | | | | | | | | --sdnotify container|conmon|ignore With "conmon", we send the MAINPID, and clear the NOTIFY_SOCKET so the OCI runtime doesn't pass it into the container. We also advertise "ready" when the OCI runtime finishes to advertise the service as ready. With "container", we send the MAINPID, and leave the NOTIFY_SOCKET so the OCI runtime passes it into the container for initialization, and let the container advertise further metadata. This is the default, which is closest to the behavior podman has done in the past. The "ignore" option removes NOTIFY_SOCKET from the environment, so neither podman nor any child processes will talk to systemd. This removes the need for hardcoded CID and PID files in the command line, and the PIDFile directive, as the pid is advertised directly through sd-notify. Signed-off-by: Joseph Gooch <mrwizard@dok.org>
* move go module to v2Valentin Rothberg2020-07-06
| | | | | | | | | | | | | | | With the advent of Podman 2.0.0 we crossed the magical barrier of go modules. While we were able to continue importing all packages inside of the project, the project could not be vendored anymore from the outside. Move the go module to new major version and change all imports to `github.com/containers/libpod/v2`. The renaming of the imports was done via `gomove` [1]. [1] https://github.com/KSubedi/gomove Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
* podman: add new cgroup mode splitGiuseppe Scrivano2020-06-25
| | | | | | | | | | | | | | | | | | | When running under systemd there is no need to create yet another cgroup for the container. With conmon-delegated the current cgroup will be split in two sub cgroups: - supervisor - container The supervisor cgroup will hold conmon and the podman process, while the container cgroup is used by the OCI runtime (using the cgroupfs backend). Closes: https://github.com/containers/libpod/issues/6400 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* Add --preservefds to podman runQi Wang2020-06-19
| | | | | | Add --preservefds to podman run. close https://github.com/containers/libpod/issues/6458 Signed-off-by: Qi Wang <qiwan@redhat.com>
* Do not share container log driver for execMatthew Heon2020-06-17
| | | | | | | | | | | | | | | | | | | | | When the container uses journald logging, we don't want to automatically use the same driver for its exec sessions. If we do we will pollute the journal (particularly in the case of healthchecks) with large amounts of undesired logs. Instead, force exec sessions logs to file for now; we can add a log-driver flag later (we'll probably want to add a `podman logs` command that reads exec session logs at the same time). As part of this, add support for the new 'none' logs driver in Conmon. It will be the default log driver for exec sessions, and can be optionally selected for containers. Great thanks to Joe Gooch (mrwizard@dok.org) for adding support to Conmon for a null log driver, and wiring it in here. Fixes #6555 Signed-off-by: Matthew Heon <matthew.heon@pm.me>
* Ensure Conmon is alive before waiting for exit fileMatthew Heon2020-06-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This came out of a conversation with Valentin about systemd-managed Podman. He discovered that unit files did not properly handle cases where Conmon was dead - the ExecStopPost `podman rm --force` line was not actually removing the container, but interestingly, adding a `podman cleanup --rm` line would remove it. Both of these commands do the same thing (minus the `podman cleanup --rm` command not force-removing running containers). Without a running Conmon instance, the container process is still running (assuming you killed Conmon with SIGKILL and it had no chance to kill the container it managed), but you can still kill the container itself with `podman stop` - Conmon is not involved, only the OCI Runtime. (`podman rm --force` and `podman stop` use the same code to kill the container). The problem comes when we want to get the container's exit code - we expect Conmon to make us an exit file, which it's obviously not going to do, being dead. The first `podman rm` would fail because of this, but importantly, it would (after failing to retrieve the exit code correctly) set container status to Exited, so that the second `podman cleanup` process would succeed. To make sure the first `podman rm --force` succeeds, we need to catch the case where Conmon is already dead, and instead of waiting for an exit file that will never come, immediately set the Stopped state and remove an error that can be caught and handled. Signed-off-by: Matthew Heon <mheon@redhat.com>
* Add exit commands to exec sessionsMatthew Heon2020-05-20
| | | | | | | | | | | These are required for detached exec, where they will be used to clean up and remove exec sessions when they exit. As part of this, move all Exec related functionality for the Conmon OCI runtime into a separate file; the existing one was around 2000 lines. Signed-off-by: Matthew Heon <matthew.heon@pm.me>
* oci conmon: tell conmon to log container namePeter Hunt2020-05-20
| | | | | | | | | | specifying `-n=ctr-name` tells conmon to log CONTAINER_NAME=name if the log driver is journald add this, and a test! also, refactor the args slice creation to not append() unnecessarily. Signed-off-by: Peter Hunt <pehunt@redhat.com>
* Drop a debug line which could print very large messagesMatthew Heon2020-05-15
| | | | Signed-off-by: Matthew Heon <matthew.heon@pm.me>
* Remove duplicated exec handling codeMatthew Heon2020-05-14
| | | | | | | | | | | During the initial workup of HTTP exec, I duplicated most of the existing exec handling code so I could work on it without breaking normal exec (and compare what I was doing to the nroaml version). Now that it's done and working, we can switch over to the refactored version and ditch the original, removing a lot of duplicated code. Signed-off-by: Matthew Heon <matthew.heon@pm.me>
* Fix start order for APIv2 exec start endpointMatthew Heon2020-05-14
| | | | | | This makes the endpoint (mostly) functional. Signed-off-by: Matthew Heon <matthew.heon@pm.me>
* Add an initial implementation of HTTP-forwarded execMatthew Heon2020-05-14
| | | | | | | | | | | | | | | This is heavily based off the existing exec implementation, but does not presently share code with it, to try and ensure we don't break anything. Still to do: - Add code sharing with existing exec implementation - Wire in the frontend (exec HTTP endpoint) - Move all exec-related code in oci_conmon_linux.go into a new file - Investigate code sharing between HTTP attach and HTTP exec. Signed-off-by: Matthew Heon <mheon@redhat.com>
* WIP V2 attach bindings and testJhon Honce2020-05-13
| | | | | | | | * Add ErrLostSync to report lost of sync when de-mux'ing stream * Add logus.SetLevel(logrus.DebugLevel) when `go test -v` given * Add context to debugging messages Signed-off-by: Jhon Honce <jhonce@redhat.com>
* Fix errors found in coverity scanDaniel J Walsh2020-05-01
| | | | Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* podman v2 remove bloat v2Brent Baude2020-04-16
| | | | | | rid ourseleves of libpod references in v2 client Signed-off-by: Brent Baude <bbaude@redhat.com>
* Add support for selecting kvm and systemd labelsDaniel J Walsh2020-04-15
| | | | | | | | | | | | In order to better support kata containers and systemd containers container-selinux has added new types. Podman should execute the container with an SELinux process label to match the container type. Traditional Container process : container_t KVM Container Process: containre_kvm_t PID 1 Init process: container_init_t Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* Improve APIv2 support for AttachMatthew Heon2020-04-13
| | | | | | | | | | | | | | A few major fixes here: - Support for attaching to Configured containers, to match Docker behavior. - Support for stream parameter has been improved (we now properly handle cases where it is not set). - Initial support for logs parameter has been added. - Setting attach streams when the container has a terminal is now supported. - Errors are properly reported once the hijack has begun. Signed-off-by: Matthew Heon <mheon@redhat.com>
* refactor infoBrent Baude2020-04-06
| | | | | | the current implementation of info, while typed, is very loosely done so. we need stronger types for our apiv2 implmentation and bindings. Signed-off-by: Brent Baude <bbaude@redhat.com>
* Merge pull request #5707 from adrianreber/crun-checkpoint-1OpenShift Merge Robot2020-04-03
|\ | | | | Prepare for crun checkpoint support
| * checkpoint: handle XDG_RUNTIME_DIRAdrian Reber2020-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For (almost) all commands which podman passes on to a OCI runtime XDG_RUNTIME_DIR is set to the same value. This does not happen for the checkpoint command. Using crun to checkpoint a container without this change will lead to crun using XDG_RUNTIME_DIR of the currently logged in user and so it will not find the container Podman wants to checkpoint. This bascially just copies a few lines from on of the other commands to handle 'checkpoint' as all the other commands. Thanks to Giuseppe for helping me with this. For 'restore' it is not needed as restore goes through conmon and for calling conmon Podman already configures XDG_RUNTIME_DIR correctly. Signed-off-by: Adrian Reber <areber@redhat.com>
| * checkpoint: change runtime checkpoint support testAdrian Reber2020-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Podman was checking if the runtime support checkpointing by running 'runtime checkpoint -h'. That works for runc. crun, however, does not use '-h, --help' for help output but, '-?, --help'. This commit switches both checkpoint support detection from 'runtime checkpoint -h' to 'runtime checkpoint --help'. Podman can now correctly detect if 'crun' also support checkpointing. Signed-off-by: Adrian Reber <areber@redhat.com>
* | Pass path environment down to the OCI runtimeDaniel J Walsh2020-04-03
|/ | | | Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* attach: fix hang if control path is deletedGiuseppe Scrivano2020-04-02
| | | | | | | | if the control path file is deleted, libpod hangs waiting for a reader to open it. Attempt to open it as non blocking until it returns an error different than EINTR or EAGAIN. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* Add support for containers.confDaniel J Walsh2020-03-27
| | | | | | | vendor in c/common config pkg for containers.conf Signed-off-by: Qi Wang qiwan@redhat.com Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* Ensure that exec sends resize eventsMatthew Heon2020-03-25
| | | | | | | | | | | | | | | We previously tried to send resize events only after the exec session successfully started, which makes sense (we might drop an event or two that came in before the exec session started otherwise). However, the start function blocks, so waiting actually means we send no resize events at all, which is obviously worse than losing a few.. Sending resizes before attach starts seems to work fine in my testing, so let's do that until we get bug reports that it doesn't work. Fixes #5584 Signed-off-by: Matthew Heon <mheon@redhat.com>
* Add structure for new exec session tracking to DBMatthew Heon2020-03-18
| | | | | | | | | | | | | | | | | | | | | | | As part of the rework of exec sessions, we need to address them independently of containers. In the new API, we need to be able to fetch them by their ID, regardless of what container they are associated with. Unfortunately, our existing exec sessions are tied to individual containers; there's no way to tell what container a session belongs to and retrieve it without getting every exec session for every container. This adds a pointer to the container an exec session is associated with to the database. The sessions themselves are still stored in the container. Exec-related APIs have been restructured to work with the new database representation. The originally monolithic API has been split into a number of smaller calls to allow more fine-grained control of lifecycle. Support for legacy exec sessions has been retained, but in a deprecated fashion; we should remove this in a few releases. Signed-off-by: Matthew Heon <matthew.heon@pm.me>
* podman: avoid conmon zombie on execGiuseppe Scrivano2020-03-18
| | | | | | | conmon forks itself, so make sure we reap the first process and not leave a zombie process. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* update systemd & dbus dependenciesValentin Rothberg2020-03-10
| | | | | | | | Update the outdated systemd and dbus dependencies which are now provided as go modules. This will further tighten our dependencies and releases and pave the way for the upcoming auto-update feature. Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
* Revert "exec: get the exit code from sync pipe instead of file"Matthew Heon2020-03-09
| | | | | | | | | This reverts commit 4b72f9e4013411208751df2a92ab9f322d4da5b2. Continues what began with revert of d3d97a25e8c87cf741b2e24ac01ef84962137106 in previous commit. Signed-off-by: Matthew Heon <matthew.heon@pm.me>
* Revert "Exec: use ErrorConmonRead"Matthew Heon2020-03-09
| | | | | | | | | This reverts commit d3d97a25e8c87cf741b2e24ac01ef84962137106. This does not resolve the issues we expected it would, and has some unexpected side effects with the upcoming exec rework. Signed-off-by: Matthew Heon <matthew.heon@pm.me>
* Exec: use ErrorConmonReadPeter Hunt2020-03-03
| | | | | | | | Before, we were using -1 as a bogus value in podman to signify something went wrong when reading from a conmon pipe. However, conmon uses negative values to indicate the runtime failed, and return the runtime's exit code. instead, we should use a bogus value that is actually bogus. Define that value in the define package as MinInt32 (-1<< 31 - 1), which is outside of the range of possible pids (-1 << 31) Signed-off-by: Peter Hunt <pehunt@redhat.com>
* exec: get the exit code from sync pipe instead of filePeter Hunt2020-03-03
| | | | | | | | Before, we were getting the exit code from the file, in which we waited an arbitrary amount of time (5 seconds) for the file, and segfaulted if we didn't find it. instead, we should be a bit more certain conmon has sent the exit code. Luckily, it sends the exit code along the sync pipe fd, so we can read it from there Adapt the ExecContainer interface to pass along a channel to get the pid and exit code from conmon, to be able to read both from the pipe Signed-off-by: Peter Hunt <pehunt@redhat.com>
* Ensure that exec sessions inherit supplemental groupsMatthew Heon2020-02-28
| | | | | | | | This corrects a regression from Podman 1.4.x where container exec sessions inherited supplemental groups from the container, iff the exec session did not specify a user. Signed-off-by: Matthew Heon <matthew.heon@pm.me>
* rootless: fix a regression when using -dGiuseppe Scrivano2020-02-18
| | | | | | | | | | | | when using -d and port mapping, make sure the correct fd is injected into conmon. Move the pipe creation earlier as the fd must be known at the time we create the container through conmon. Closes: https://github.com/containers/libpod/issues/5167 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* Merge pull request #4861 from giuseppe/add-cgroups-disabled-conmonOpenShift Merge Robot2020-01-22
|\ | | | | oci_conmon: do not create a cgroup under systemd