summaryrefslogtreecommitdiff
path: root/libpod/oci.go
Commit message (Collapse)AuthorAge
* read conmon output and convert to json in two stepsbaude2018-10-23
| | | | | | | | | when reading the output from conmon using the JSON methods, it appears that JSON marshalling is higher in pprof than it really is because the pipe is "waiting" for a response. this gives us a clearer look at the real CPU/time consumers. Signed-off-by: baude <bbaude@redhat.com>
* oci: cleanup process statusGiuseppe Scrivano2018-10-23
| | | | | | | | I've seen a runc zombie process hanging around, it is caused by not cleaning up the "$OCI status" process. Also adjust another location that has the same issue. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* Move rootless directory handling to the libpod/pkg/util directoryDaniel J Walsh2018-10-22
| | | | | | This should allow us to share this code with buildah. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* Merge pull request #1570 from giuseppe/fix-gvisorOpenShift Merge Robot2018-10-04
|\ | | | | podman: allow usage of gVisor as OCI runtime
| * oci: split the stdout and stderr pipesGiuseppe Scrivano2018-10-03
| | | | | | | | | | | | | | read the OCI status from stdout, not the combined stdout+stderr stream. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
| * oci: always set XDG_RUNTIME_DIRGiuseppe Scrivano2018-10-03
| | | | | | | | | | | | | | | | | | Fix an issue when using gVisor that couldn't start the container since the XDG_RUNTIME_DIR env variable used for the "create" and "start" commands is different. Set the environment variable for each command so that the OCI runtime gets always the same value. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* | Add support to checkpoint/restore containersAdrian Reber2018-10-03
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | runc uses CRIU to support checkpoint and restore of containers. This brings an initial checkpoint/restore implementation to podman. None of the additional runc flags are yet supported and container migration optimization (pre-copy/post-copy) is also left for the future. The current status is that it is possible to checkpoint and restore a container. I am testing on RHEL-7.x and as the combination of RHEL-7 and CRIU has seccomp troubles I have to create the container without seccomp. With the following steps I am able to checkpoint and restore a container: # podman run --security-opt="seccomp=unconfined" -d registry.fedoraproject.org/f27/httpd # curl -I 10.22.0.78:8080 HTTP/1.1 403 Forbidden # <-- this is actually a good answer # podman container checkpoint <container> # curl -I 10.22.0.78:8080 curl: (7) Failed connect to 10.22.0.78:8080; No route to host # podman container restore <container> # curl -I 10.22.0.78:8080 HTTP/1.1 403 Forbidden I am using CRIU, runc and conmon from git. All required changes for checkpoint/restore support in podman have been merged in the corresponding projects. To have the same IP address in the restored container as before checkpointing, CNI is told which IP address to use. If the saved network configuration cannot be found during restore, the container is restored with a new IP address. For CRIU to restore established TCP connections the IP address of the network namespace used for restore needs to be the same. For TCP connections in the listening state the IP address can change. During restore only one network interface with one IP address is handled correctly. Support to restore containers with more advanced network configuration will be implemented later. v2: * comment typo * print debug messages during cleanup of restore files * use createContainer() instead of createOCIContainer() * introduce helper CheckpointPath() * do not try to restore a container that is paused * use existing helper functions for cleanup * restructure code flow for better readability * do not try to restore if checkpoint/inventory.img is missing * git add checkpoint.go restore.go v3: * move checkpoint/restore under 'podman container' v4: * incorporated changes from latest reviews Signed-off-by: Adrian Reber <areber@redhat.com>
* Merge pull request #1531 from mheon/add_exited_stateOpenShift Merge Robot2018-10-03
|\ | | | | Add ContainerStateExited and OCI delete() in cleanup()
| * Fix Wait() to allow Exited state as well as StoppedMatthew Heon2018-10-02
| | | | | | | | Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
* | Add container runlabel commandbaude2018-09-28
|/ | | | | | | | | | | | | Execute the command as described by a container image. The value of the label is processed into a command by: 1. Ensuring the first argument of the command is podman. 2. Substituting any variables with those defined by the environment or otherwise. If no label exists in the container image, nothing is done. podman container runlabel LABEL IMAGE extra_args Signed-off-by: baude <bbaude@redhat.com>
* Add a way to disable port reservationMatthew Heon2018-09-13
| | | | | | | | | | | We've increased the default rlimits to allow Podman to hold many ports open without hitting limits and crashing, but this doesn't solve the amount of memory that holding open potentially thousands of ports will use. Offer a switch to optionally disable port reservation for performance- and memory-constrained use cases. Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
* rootless, exec: use the new function to join the usernsGiuseppe Scrivano2018-08-29
| | | | | | | | | | since we have a way for joining an existing userns use it instead of nsenter. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Closes: #1371 Approved by: rhatdan
* rootless: don't use kill --allGiuseppe Scrivano2018-08-26
| | | | | | | | | | | | | The OCI runtime might use the cgroups to see what PIDs are inside the container, but that doesn't work with rootless containers. Closes: https://github.com/containers/libpod/issues/1337 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Closes: #1331 Approved by: rhatdan
* rootless: exec handle processes that create an user namespaceGiuseppe Scrivano2018-08-26
| | | | | | | | | | | | | | Manage the case where the main process of the container creates and joins a new user namespace. In this case we want to join only the first child in the new hierarchy, which is the user namespace that was used to create the container. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Closes: #1331 Approved by: rhatdan
* rootless: fix execGiuseppe Scrivano2018-08-26
| | | | | | | | | | | | | | | | | | | | | We cannot re-exec into a new user namespace to gain privileges and access an existing as the new namespace is not the owner of the existing container. "unshare" is used to join the user namespace of the target container. The current implementation assumes that the main process of the container didn't create a new user namespace. Since in the setup phase we are not running with euid=0, we must skip the setup for containers/storage. Closes: https://github.com/containers/libpod/issues/1329 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Closes: #1331 Approved by: rhatdan
* switch projectatomic to containersDaniel J Walsh2018-08-16
| | | | | | | | | | Need to get some small changes into libpod to pull back into buildah to complete buildah transition. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> Closes: #1270 Approved by: mheon
* oci.go: syslog: fix debug formattingValentin Rothberg2018-08-09
| | | | | | | Signed-off-by: Valentin Rothberg <vrothberg@suse.com> Closes: #1242 Approved by: rhatdan
* Pass newly-added --log-level flag to ConmonMatthew Heon2018-08-08
| | | | | | | Signed-off-by: Matthew Heon <matthew.heon@gmail.com> Closes: #1232 Approved by: rhatdan
* network: add support for rootless network with slirp4netnsGiuseppe Scrivano2018-07-31
| | | | | | | | | | | slirp4netns is required to setup the network namespace: https://github.com/rootless-containers/slirp4netns Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Closes: #1156 Approved by: rhatdan
* oci: keep exposed ports busy and leak the fd into conmonGiuseppe Scrivano2018-07-19
| | | | | | | | | | | | | Bind all the specified TCP and UDP ports so that another process cannot reuse them. The fd of the listener is then leaked into conmon so that the socket is kept busy until the container exits. Closes: https://github.com/projectatomic/libpod/issues/210 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Closes: #1100 Approved by: mheon
* Record whether the container has exitedMatthew Heon2018-07-13
| | | | | | | | Use this to supplement exit codes returned from containers, to make sure we know when exit codes are invalid (as the container has not yet exited) Signed-off-by: Matthew Heon <mheon@redhat.com>
* rootless: propagate errors from GetRootlessRuntimeDir()Giuseppe Scrivano2018-07-11
| | | | Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* pkg/ctime: Factor libpod/finished* into a separate packageW. Trevor King2018-07-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This removes some boilerplate from the libpod package, so we can focus on container stuff there. And it gives us a tidy sub-package for focusing on ctime extraction, so we can focus on unit testing and portability of the extraction utility there. For the unsupported implementation, I'm falling back to Go's ModTime [1]. That's obviously not the creation time, but it's likely to be closer than the uninitialized Time structure from cc6f0e85 (more changes to compile darwin, 2018-07-04, #1047). Especially for our use case in libpod/oci, where we're looking at write-once exit files. The test is more complicated than I initially expected, because on Linux filesystem timestamps come from a truncated clock without interpolation [2] (and network filesystems can be completely decoupled [3]). So even for local disks, creation times can be up to a jiffie earlier than 'before'. This test ensures at least monotonicity by creating two files and ensuring the reported creation time for the second is greater than or equal to the reported creation time for the first. It also checks that both creation times are within the window from one second earlier than 'before' through 'after'. That should be enough of a window for local disks, even if the kernel for those systems has an abnormally large jiffie. It might be ok on network filesystems, although it will not be very resilient to network clock lagging behind the local system clock. [1]: https://golang.org/pkg/os/#FileInfo [2]: https://groups.google.com/d/msg/linux.kernel/mdeXx2TBYZA/_4eJEuJoAQAJ Subject: Re: Apparent backward time travel in timestamps on file creation Date: Thu, 30 Mar 2017 20:20:02 +0200 Message-ID: <tqMPU-1Sb-21@gated-at.bofh.it> [3]: https://groups.google.com/d/msg/linux.kernel/mdeXx2TBYZA/cTKj4OBuAQAJ Subject: Re: Apparent backward time travel in timestamps on file creation Date: Thu, 30 Mar 2017 22:10:01 +0200 Message-ID: <tqOyl-36A-1@gated-at.bofh.it> Signed-off-by: W. Trevor King <wking@tremily.us> Closes: #1050 Approved by: mheon
* more changes to compile darwinbaude2018-07-05
| | | | | | | | | | | | | | | | this should represent the last major changes to get darwin to **compile**. again, the purpose here is to get darwin to compile so that we can eventually implement a ci task that would protect against regressions for darwin compilation. i have left the manual darwin compilation largely static still and in fact now only interject (manually) two build tags to assist with the build. trevor king has great ideas on how to make this better and i will defer final implementation of those to him. Signed-off-by: baude <bbaude@redhat.com> Closes: #1047 Approved by: rhatdan
* rootless: set XDG_RUNTIME_DIR also for state and execGiuseppe Scrivano2018-07-05
| | | | | | | Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Closes: #1048 Approved by: mheon
* changes to allow for darwin compilationbaude2018-06-29
| | | | | | | Signed-off-by: baude <bbaude@redhat.com> Closes: #1015 Approved by: baude
* Add `podman container cleanup` to CLIDaniel J Walsh2018-06-29
| | | | | | | | | | | | | When we run containers in detach mode, nothing cleans up the network stack or the mount points. This patch will tell conmon to execute the cleanup code when the container exits. It can also be called to attempt to cleanup previously running containers. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> Closes: #942 Approved by: mheon
* conmon no longer writes to syslogDaniel J Walsh2018-06-29
| | | | | | | | | | | If the caller sets up the app to be in logrus.DebugLevel, then we will add the --syslog flag to conmon to get all of the messages. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> Closes: #1014 Approved by: TomSweeneyRedHat
* oci: set XDG_RUNTIME_DIR to the runtime from GetRootlessRuntimeDir()Giuseppe Scrivano2018-06-27
| | | | | | | Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Closes: #936 Approved by: rhatdan
* oci: do not set the cgroup path in Rootless modeGiuseppe Scrivano2018-06-15
| | | | | | | Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Closes: #871 Approved by: mheon
* oci: pass XDG_RUNTIME_DIR down to the OCI runtimeGiuseppe Scrivano2018-06-15
| | | | | | | Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Closes: #871 Approved by: mheon
* Remove SELinux transition rule after conmon is started.Daniel J Walsh2018-06-06
| | | | | | | | | | | | We have an issue where iptables command is being executed by podman and attempted to run with a different label. This fix changes podman to only change the label on the conmon command and then set the SELinux interface back to the default. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> Closes: #906 Approved by: giuseppe
* Catch does not exist errorDaniel J Walsh2018-05-31
| | | | | | | | | | There was a new line at the end of does not exist which was causing this to fail. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> Closes: #863 Approved by: baude
* We need to change the SELinux label of the conmon process to s0Daniel J Walsh2018-05-31
| | | | | | | | | | | | | If SELinux is enabled, we are leaking in pipes into the container owned by conmon. The container processes are not allowed to use these pipes, if the calling process is fully ranged. By changing the level of the conmon process to s0, this allows container processes to use the pipes. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> Closes: #854 Approved by: mheon
* Use container cleanup() functions when removingMatthew Heon2018-05-17
| | | | | | | | | | | | Instead of manually calling the individual functions that cleanup uses to tear down a container's resources, just call the cleanup function to make sure that cleanup only needs to happen in one place. Signed-off-by: Matthew Heon <matthew.heon@gmail.com> Closes: #790 Approved by: rhatdan
* Place Conmon and Container in separate CGroupsMatthew Heon2018-05-11
| | | | | | | Signed-off-by: Matthew Heon <matthew.heon@gmail.com> Closes: #507 Approved by: baude
* Add --cgroup-manager flag to Podman binaryMatthew Heon2018-05-11
| | | | | | | Signed-off-by: Matthew Heon <matthew.heon@gmail.com> Closes: #507 Approved by: baude
* Major fixes to systemd cgroup handlingMatthew Heon2018-05-11
| | | | | | | Signed-off-by: Matthew Heon <matthew.heon@gmail.com> Closes: #507 Approved by: baude
* Should not error out if container no longer exists in ociDaniel J Walsh2018-05-04
| | | | | | | | | | This prevents you from cleaning up the container database, if some how runc and friends db gets screwed up. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> Closes: #725 Approved by: mheon
* podman, userNS: configure an intermediate mount namespaceGiuseppe Scrivano2018-05-04
| | | | | | | Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Closes: #690 Approved by: mheon
* Prevent a potential race when stopping containersMatthew Heon2018-04-04
| | | | | | | | | | | If sending a signal fails, check if the container is alive. If it is not, it probably stopped on its own before we could send the signal, so don't error out. Signed-off-by: Matthew Heon <matthew.heon@gmail.com> Closes: #591 Approved by: rhatdan
* podman: new option --conmon-pidfile=Giuseppe Scrivano2018-03-29
| | | | | | | | | | | | | | | | | | | so that it is possible to use systemd to automatically restart the container: [Service] Type=forking PIDFile=/run/awesome-service.pid ExecStart=/usr/bin/podman run --conmon-pidfile=/run/awesome-service.pid --name awesome -d IMAGE /usr/bin/do-something ExecStopPost=/usr/bin/podman rm awesome Restart=always Closes: https://github.com/projectatomic/libpod/issues/534 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Closes: #549 Approved by: rhatdan
* Include error in error messageMatthew Heon2018-03-02
| | | | | | | Signed-off-by: Matthew Heon <matthew.heon@gmail.com> Closes: #438 Approved by: rhatdan
* Instead of erroring on exit file not being found, warnMatthew Heon2018-03-02
| | | | | | | | | | | Erroring can cause us to get into an state where a container which has no exit file cannot be shown in PS, cannot be removed, etc. Signed-off-by: Matthew Heon <matthew.heon@gmail.com> Closes: #438 Approved by: rhatdan
* Replace usage of runc with runtimeMatthew Heon2018-03-01
| | | | | | | Signed-off-by: Matthew Heon <matthew.heon@gmail.com> Closes: #412 Approved by: baude
* Convert exec session tracking to use a dedicated structMatthew Heon2018-03-01
| | | | | | | | | | | This will behave better if we need to add anything to it at a later date - we can add fields to the struct without breaking existing BoltDB databases. Signed-off-by: Matthew Heon <matthew.heon@gmail.com> Closes: #412 Approved by: baude
* Handle removing containers with active exec sessionsMatthew Heon2018-03-01
| | | | | | | | | | | | | For containers without --force set, an error will be returned For containers with --force, all pids in the container will be stopped, first with SIGTERM and then with SIGKILL after a timeout (this mimics the behavior of stopping a container). Signed-off-by: Matthew Heon <matthew.heon@gmail.com> Closes: #412 Approved by: baude
* Add tracking for exec session IDsMatthew Heon2018-03-01
| | | | | | | | | | | Exec sessions now have an ID generated and assigned to their PID and stored in the database state. This allows us to track what exec sessions are currently active. Signed-off-by: Matthew Heon <matthew.heon@gmail.com> Closes: #412 Approved by: baude
* Rework exec to enable splitting to retrieve exec PIDMatthew Heon2018-03-01
| | | | | | | Signed-off-by: Matthew Heon <matthew.heon@gmail.com> Closes: #412 Approved by: baude
* Ensure we don't repeatedly poll disk for exit codesMatthew Heon2018-02-20
| | | | | | | | | | | | Change logic for refreshing our state using runc to only poll for conmon exit files when we first transition to the Stopped state. After that, we should already have the exit code stored in the database, so we don't need to look it up again. Signed-off-by: Matthew Heon <matthew.heon@gmail.com> Closes: #363 Approved by: TomSweeneyRedHat