summaryrefslogtreecommitdiff
path: root/libpod/container_internal.go
Commit message (Collapse)AuthorAge
* hooks: Add pre-create hooks for runtime-config manipulationW. Trevor King2019-01-08
| | | | | | | | | | | | | | | | | | | | | | | There's been a lot of discussion over in [1] about how to support the NVIDIA folks and others who want to be able to create devices (possibly after having loaded kernel modules) and bind userspace libraries into the container. Currently that's happening in the middle of runc's create-time mount handling before the container pivots to its new root directory with runc's incorrectly-timed prestart hook trigger [2]. With this commit, we extend hooks with a 'precreate' stage to allow trusted parties to manipulate the config JSON before calling the runtime's 'create'. I'm recycling the existing Hook schema from pkg/hooks for this, because we'll want Timeout for reliability and When to avoid the expense of fork/exec when a given hook does not need to make config changes [3]. [1]: https://github.com/opencontainers/runc/pull/1811 [2]: https://github.com/opencontainers/runc/issues/1710 [3]: https://github.com/containers/libpod/issues/1828#issuecomment-439888059 Signed-off-by: W. Trevor King <wking@tremily.us>
* Convert pods to SHM locksMatthew Heon2019-01-04
| | | | Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
* Convert containers to SHM lockingMatthew Heon2019-01-04
| | | | Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
* Log container command before starting the containerMatthew Heon2019-01-02
| | | | | | | Runc does not produce helpful error messages when the container's command is not found, so print the command ourselves. Signed-off-by: Matthew Heon <matthew.heon@pm.me>
* Fixes to handle /dev/shm correctly.Daniel J Walsh2018-12-24
| | | | | | | | | | | | | | | | | | We had two problems with /dev/shm, first, you mount the container read/only then /dev/shm was mounted read/only. This is a bug a tmpfs directory should be read/write within a read-only container. The second problem is we were ignoring users mounted /dev/shm from the host. If user specified podman run -d -v /dev/shm:/dev/shm ... We were dropping this mount and still using the internal mount. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* Switch all referencs to image.ContainerConfig to image.ConfigDaniel J Walsh2018-12-21
| | | | | | This will more closely match what Docker is doing. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* Prevent a second lookup of user for image volumesMatthew Heon2018-12-11
| | | | | | | | | | Instead of forcing another user lookup when mounting image volumes, just use the information we looked up when we started generating the spec. This may resolve #1817 Signed-off-by: Matthew Heon <mheon@redhat.com>
* Fix errors where OCI hooks directory does not existMatthew Heon2018-12-07
| | | | Signed-off-by: Matthew Heon <mheon@redhat.com>
* bind mount /etc/resolv.conf|hosts in podsbaude2018-12-06
| | | | | | | containers inside pods need to make sure they get /etc/resolv.conf and /etc/hosts bind mounted when network is expected Signed-off-by: baude <bbaude@redhat.com>
* libpod/container_internal: Deprecate implicit hook directoriesW. Trevor King2018-12-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Part of the motivation for 800eb863 (Hooks supports two directories, process default and override, 2018-09-17, #1487) was [1]: > We only use this for override. The reason this was caught is people > are trying to get hooks to work with CoreOS. You are not allowed to > write to /usr/share... on CoreOS, so they wanted podman to also look > at /etc, where users and third parties can write. But we'd also been disabling hooks completely for rootless users. And even for root users, the override logic was tricky when folks actually had content in both directories. For example, if you wanted to disable a hook from the default directory, you'd have to add a no-op hook to the override directory. Also, the previous implementation failed to handle the case where there hooks defined in the override directory but the default directory did not exist: $ podman version Version: 0.11.2-dev Go Version: go1.10.3 Git Commit: "6df7409cb5a41c710164c42ed35e33b28f3f7214" Built: Sun Dec 2 21:30:06 2018 OS/Arch: linux/amd64 $ ls -l /etc/containers/oci/hooks.d/test.json -rw-r--r--. 1 root root 184 Dec 2 16:27 /etc/containers/oci/hooks.d/test.json $ podman --log-level=debug run --rm docker.io/library/alpine echo 'successful container' 2>&1 | grep -i hook time="2018-12-02T21:31:19-08:00" level=debug msg="reading hooks from /usr/share/containers/oci/hooks.d" time="2018-12-02T21:31:19-08:00" level=warning msg="failed to load hooks: {}%!(EXTRA *os.PathError=open /usr/share/containers/oci/hooks.d: no such file or directory)" With this commit: $ podman --log-level=debug run --rm docker.io/library/alpine echo 'successful container' 2>&1 | grep -i hook time="2018-12-02T21:33:07-08:00" level=debug msg="reading hooks from /usr/share/containers/oci/hooks.d" time="2018-12-02T21:33:07-08:00" level=debug msg="reading hooks from /etc/containers/oci/hooks.d" time="2018-12-02T21:33:07-08:00" level=debug msg="added hook /etc/containers/oci/hooks.d/test.json" time="2018-12-02T21:33:07-08:00" level=debug msg="hook test.json matched; adding to stages [prestart]" time="2018-12-02T21:33:07-08:00" level=warning msg="implicit hook directories are deprecated; set --hooks-dir="/etc/containers/oci/hooks.d" explicitly to continue to load hooks from this directory" time="2018-12-02T21:33:07-08:00" level=error msg="container create failed: container_linux.go:336: starting container process caused "process_linux.go:399: container init caused \"process_linux.go:382: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: oh, noes!\\\\n\\\"\"" (I'd setup the hook to error out). You can see that it's silenly ignoring the ENOENT for /usr/share/containers/oci/hooks.d and continuing on to load hooks from /etc/containers/oci/hooks.d. When it loads the hook, it also logs a warning-level message suggesting that callers explicitly configure their hook directories. That will help consumers migrate, so we can drop the implicit hook directories in some future release. When folks *do* explicitly configure hook directories (via the newly-public --hooks-dir and hooks_dir options), we error out if they're missing: $ podman --hooks-dir /does/not/exist run --rm docker.io/library/alpine echo 'successful container' error setting up OCI Hooks: open /does/not/exist: no such file or directory I've dropped the trailing "path" from the old, hidden --hooks-dir-path and hooks_dir_path because I think "dir(ectory)" is already enough context for "we expect a path argument". I consider this name change non-breaking because the old forms were undocumented. Coming back to rootless users, I've enabled hooks now. I expect they were previously disabled because users had no way to avoid /usr/share/containers/oci/hooks.d which might contain hooks that required root permissions. But now rootless users will have to explicitly configure hook directories, and since their default config is from ~/.config/containers/libpod.conf, it's a misconfiguration if it contains hooks_dir entries which point at directories with hooks that require root access. We error out so they can fix their libpod.conf. [1]: https://github.com/containers/libpod/pull/1487#discussion_r218149355 Signed-off-by: W. Trevor King <wking@tremily.us>
* Merge pull request #1317 from rhatdan/privilegedOpenShift Merge Robot2018-11-30
|\ | | | | Disable mount options when running --privileged
| * Disable mount options when running --privilegedDaniel J Walsh2018-11-28
| | | | | | | | | | | | | | | | We now default to setting storage options to "nodev", when running privileged containers, we need to turn this off so the processes can manipulate the image. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* | /dev/shm should be mounted even in rootless mode.Daniel J Walsh2018-11-28
| | | | | | | | | | | | | | | | Currently we are mounting /dev/shm from disk, it should be from a tmpfs. User Namespace supports tmpfs mounts for nonroot users, so this section of code should work fine in bother root and rootless mode. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* | Fix golang formatting issuesbaude2018-11-28
| | | | | | | | | | | | | | Whe running unittests on newer golang versions, we observe failures with some formatting types when no declared correctly. Signed-off-by: baude <bbaude@redhat.com>
* | Merge pull request #1848 from adrianreber/masterOpenShift Merge Robot2018-11-28
|\ \ | | | | | | Add tcp-established to checkpoint/restore
| * | Use also a struct to pass options to Restore()Adrian Reber2018-11-28
| |/ | | | | | | | | | | | | | | | | | | | | | | | | This is basically the same change as ff47a4c2d5485fc49f937f3ce0c4e2fd6bdb1956 (Use a struct to pass options to Checkpoint()) just for the Restore() function. It is used to pass multiple restore options to the API and down to conmon which is used to restore containers. This is for the upcoming changes to support checkpointing and restoring containers with '--tcp-established'. Signed-off-by: Adrian Reber <areber@redhat.com>
* / network: allow slirp4netns mode also for root containersGiuseppe Scrivano2018-11-28
|/ | | | Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* libpod should know if the network is disabledDaniel J Walsh2018-11-13
| | | | | | | | | | | | | | | | | /etc/resolv.conf and /etc/hosts should not be created and mounted when the network is disabled. We should not be calling the network setup and cleanup functions when it is disabled either. In doing this patch, I found that all of the bind mounts were particular to Linux along with the generate functions, so I moved them to container_internal_linux.go Since we are checking if we are using a network namespace, we need to check after the network namespaces has been created in the spec. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* Merge pull request #1764 from rhatdan/nopasswdOpenShift Merge Robot2018-11-07
|\ | | | | Don't fail if /etc/passwd or /etc/group does not exists
| * Don't fail if /etc/passwd or /etc/group does not existsDaniel J Walsh2018-11-07
| | | | | | | | | | | | | | | | | | | | | | Container images can be created without passwd or group file, currently if one of these containers gets run with a --user flag the container blows up complaining about t a missing /etc/passwd file. We just need to check if the error on read is ENOEXIST then allow the read to return, not fail. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* | Merge pull request #1771 from baude/prepareOpenShift Merge Robot2018-11-07
|\ \ | | | | | | move defer'd function declaration ahead of prepare error return
| * | move defer'd function declaration ahead of prepare error returnbaude2018-11-07
| |/ | | | | | | Signed-off-by: baude <bbaude@redhat.com>
* | Merge pull request #1689 from mheon/add_runc_timeoutOpenShift Merge Robot2018-11-07
|\ \ | | | | | | Do not call out to runc for sync
| * | Print error status code if we fail to parse itMatthew Heon2018-11-07
| | | | | | | | | | | | | | | | | | | | | | | | When we read the conmon error status file, if Atoi fails to parse the string we read from the file as an int, print the string as part of the error message so we know what might have gone wrong. Signed-off-by: Matthew Heon <mheon@redhat.com>
| * | Properly set Running state when starting containersMatthew Heon2018-11-07
| | | | | | | | | | | | Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
| * | Retrieve container PID from conmonMatthew Heon2018-11-07
| | | | | | | | | | | | | | | | | | | | | Instead of running a full sync after starting a container to pick up its PID, grab it from Conmon instead. Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
| * | EXPERIMENTAL: Do not call out to runc for syncMatthew Heon2018-11-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When syncing container state, we normally call out to runc to see the container's status. This does have significant performance implications, though, and we've seen issues with large amounts of runc processes being spawned. This patch attempts to use stat calls on the container exit file created by Conmon instead to sync state. This massively decreases the cost of calling updateContainer (it has gone from an almost-unconditional fork/exec of runc to a single stat call that can be avoided in most states). Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
| * | Actually save changes from post-stop syncMatthew Heon2018-11-07
| |/ | | | | | | | | | | | | | | | | | | After stopping containers, we run updateContainerStatus to sync our state with runc (pick up exit code, for example). Then we proceed to not save this to the database, requiring us to grab it again on the next sync. This should remove the need to read the exit file more than once. Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
* / Add hostname to /etc/hostsQi Wang2018-11-07
|/ | | | Signed-off-by: Qi Wang <qiwan@redhat.com>
* get user and group information using securejoin and runc's user librarybaude2018-10-29
| | | | | | | | | | | for the purposes of performance and security, we use securejoin to contstruct the root fs's path so that symlinks are what they appear to be and no pointing to something naughty. then instead of chrooting to parse /etc/passwd|/etc/group, we now use the runc user/group methods which saves us quite a bit of performance. Signed-off-by: baude <bbaude@redhat.com>
* run prepare in parallelbaude2018-10-25
| | | | | | | run prepare() -- which consists of creating a network namespace and mounting the container image is now run in parallel. This saves 25-40ms. Signed-off-by: baude <bbaude@redhat.com>
* Allow containers/storage to handle on SELinux labelingDaniel J Walsh2018-10-23
| | | | Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* Merge pull request #1609 from giuseppe/fix-volume-rootlessMatthew Heon2018-10-16
|\ | | | | volume: resolve symlink paths in volumes
| * volume: resolve symlinks in pathsGiuseppe Scrivano2018-10-14
| | | | | | | | | | | | | | | | | | | | ensure the volume paths are resolved in the mountpoint scope. Otherwise we might end up using host paths. Closes: https://github.com/containers/libpod/issues/1608 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
| * volume: write the correct ID of the container in error messagesGiuseppe Scrivano2018-10-14
| | | | | | | | Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* | Touchup fileo typoTomSweeneyRedHat2018-10-15
|/ | | | Signed-off-by: TomSweeneyRedHat <tsweeney@redhat.com>
* Generate a passwd file for users not in containerDaniel J Walsh2018-10-12
| | | | | | | If someone runs podman as a user (uid) that is not defined in the container we want generate a passwd file so that getpwuid() will work inside of container. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* Ensure resolv.conf has the right label and pathMatthew Heon2018-10-04
| | | | | | | | Adds a few missing things from writeStringToRundir() to the new resolv.conf function, specifically relabelling and returning a path compatible with rootless podman Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
* Drop libnetwork vendor and move the code into pkg/Matthew Heon2018-10-04
| | | | | | | | | | | The vendoring issues with libnetwork were significant (it was dragging in massive amounts of code) and were just not worth spending the time to work through. Highly unlikely we'll ever end up needing to update this code, so move it directly into pkg/ so we don't need to vendor libnetwork. Make a few small changes to remove the need for the remainder of libnetwork. Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
* Switch to using libnetwork's resolvconf packageMatthew Heon2018-10-04
| | | | | | | | | Libnetwork provides a well-tested package for generating resolv.conf from the host's that has some features our current implementation does not. Swap to using their code and remove our built-in implementation. Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
* Add support to checkpoint/restore containersAdrian Reber2018-10-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | runc uses CRIU to support checkpoint and restore of containers. This brings an initial checkpoint/restore implementation to podman. None of the additional runc flags are yet supported and container migration optimization (pre-copy/post-copy) is also left for the future. The current status is that it is possible to checkpoint and restore a container. I am testing on RHEL-7.x and as the combination of RHEL-7 and CRIU has seccomp troubles I have to create the container without seccomp. With the following steps I am able to checkpoint and restore a container: # podman run --security-opt="seccomp=unconfined" -d registry.fedoraproject.org/f27/httpd # curl -I 10.22.0.78:8080 HTTP/1.1 403 Forbidden # <-- this is actually a good answer # podman container checkpoint <container> # curl -I 10.22.0.78:8080 curl: (7) Failed connect to 10.22.0.78:8080; No route to host # podman container restore <container> # curl -I 10.22.0.78:8080 HTTP/1.1 403 Forbidden I am using CRIU, runc and conmon from git. All required changes for checkpoint/restore support in podman have been merged in the corresponding projects. To have the same IP address in the restored container as before checkpointing, CNI is told which IP address to use. If the saved network configuration cannot be found during restore, the container is restored with a new IP address. For CRIU to restore established TCP connections the IP address of the network namespace used for restore needs to be the same. For TCP connections in the listening state the IP address can change. During restore only one network interface with one IP address is handled correctly. Support to restore containers with more advanced network configuration will be implemented later. v2: * comment typo * print debug messages during cleanup of restore files * use createContainer() instead of createOCIContainer() * introduce helper CheckpointPath() * do not try to restore a container that is paused * use existing helper functions for cleanup * restructure code flow for better readability * do not try to restore if checkpoint/inventory.img is missing * git add checkpoint.go restore.go v3: * move checkpoint/restore under 'podman container' v4: * incorporated changes from latest reviews Signed-off-by: Adrian Reber <areber@redhat.com>
* Merge pull request #1578 from baude/addubuntuciOpenShift Merge Robot2018-10-03
|\ | | | | Add Ubuntu-18.04 to CI testing
| * Add ability for ubuntu to be testedbaude2018-10-03
| | | | | | | | | | | | | | | | unfortunately the papr CI system cannot test ubuntu as a VM; therefore, this PR still keeps travis. but it does include fixes that will be required for running on modern versions of ubuntu. Signed-off-by: baude <bbaude@redhat.com>
* | selinux: drop superflous relabelGiuseppe Scrivano2018-10-03
|/ | | | | | | | | | The same relabel is already done in writeStringToRundir so we don't need to do it twice. The version in writeStringToRundir takes into account the correct file path when using user namespaces. Closes: https://github.com/containers/libpod/pull/1584 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* Merge pull request #1531 from mheon/add_exited_stateOpenShift Merge Robot2018-10-03
|\ | | | | Add ContainerStateExited and OCI delete() in cleanup()
| * Fix Wait() to allow Exited state as well as StoppedMatthew Heon2018-10-02
| | | | | | | | Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
| * Fix cleanupRuntime to only save if container is validMatthew Heon2018-10-02
| | | | | | | | | | | | | | | | | | | | | | We call cleanup() (which calls cleanupRuntime()) as part of removing containers, after the container has already been removed from the database. cleanupRuntime() tries to update and save the state, which obviously fails if the container no longer exists. Make the save() conditional on the container not being in the process of being removed. Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
| * Add ContainerStateExited and OCI delete() in cleanup()Matthew Heon2018-10-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To work better with Kata containers, we need to delete() from the OCI runtime as a part of cleanup, to ensure resources aren't retained longer than they need to be. To enable this, we need to add a new state to containers, ContainerStateExited. Containers transition from ContainerStateStopped to ContainerStateExited via cleanupRuntime which is invoked as part of cleanup(). A container in the Exited state is identical to Stopped, except it has been removed from the OCI runtime and thus will be handled differently when initializing the container. Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
* | Merge pull request #1562 from mheon/update_install_instructionsOpenShift Merge Robot2018-10-02
|\ \ | |/ |/| Update docs to build a runc that works with systemd
| * Update docs to build a runc that works with systemdMatthew Heon2018-10-01
| | | | | | | | | | | | | | | | | | | | | | Runc disables systemd cgroup support when build statically, so don't tell people to do that now that we're defaulting to systemd for cgroup management. Also, fix some error messages to use the proper ID() call for containers. Signed-off-by: Matthew Heon <matthew.heon@gmail.com>