summaryrefslogtreecommitdiff
path: root/libpod/container_internal_linux.go
Commit message (Collapse)AuthorAge
* Merge pull request #1940 from wking/numeric-gidOpenShift Merge Robot2018-12-05
|\ | | | | libpod/container_internal_linux: Allow gids that aren't in the group file
| * libpod/container_internal_linux: Allow gids that aren't in the group fileW. Trevor King2018-12-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an image config sets config.User [1] to a numeric group (like 1000:1000), but those values do not exist in the container's /etc/group, libpod is currently breaking: $ podman run --rm registry.svc.ci.openshift.org/ci-op-zvml7cd6/pipeline:installer --help error creating temporary passwd file for container 228f6e9943d6f18b93c19644e9b619ec4d459a3e0eb31680e064eeedf6473678: unable to get gid 1000 from group file: no matching entries in group file However, the OCI spec requires converters to copy numeric uid and gid to the runtime config verbatim [2]. With this commit, I'm frontloading the "is groupspec an integer?" check and only bothering with lookup.GetGroup when it was not. I've also removed a few .Mounted checks, which are originally from 00d38cb3 (podman create/run need to load information from the image, 2017-12-18, #110). We don't need a mounted container filesystem to translate integers. And when the lookup code needs to fall back to the mounted root to translate names, it can handle erroring out internally (and looking it over, it seems to do that already). [1]: https://github.com/opencontainers/image-spec/blame/v1.0.1/config.md#L118-L123 [2]: https://github.com/opencontainers/image-spec/blame/v1.0.1/conversion.md#L70 Signed-off-by: W. Trevor King <wking@tremily.us>
* | libpod/container_internal: Deprecate implicit hook directoriesW. Trevor King2018-12-03
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Part of the motivation for 800eb863 (Hooks supports two directories, process default and override, 2018-09-17, #1487) was [1]: > We only use this for override. The reason this was caught is people > are trying to get hooks to work with CoreOS. You are not allowed to > write to /usr/share... on CoreOS, so they wanted podman to also look > at /etc, where users and third parties can write. But we'd also been disabling hooks completely for rootless users. And even for root users, the override logic was tricky when folks actually had content in both directories. For example, if you wanted to disable a hook from the default directory, you'd have to add a no-op hook to the override directory. Also, the previous implementation failed to handle the case where there hooks defined in the override directory but the default directory did not exist: $ podman version Version: 0.11.2-dev Go Version: go1.10.3 Git Commit: "6df7409cb5a41c710164c42ed35e33b28f3f7214" Built: Sun Dec 2 21:30:06 2018 OS/Arch: linux/amd64 $ ls -l /etc/containers/oci/hooks.d/test.json -rw-r--r--. 1 root root 184 Dec 2 16:27 /etc/containers/oci/hooks.d/test.json $ podman --log-level=debug run --rm docker.io/library/alpine echo 'successful container' 2>&1 | grep -i hook time="2018-12-02T21:31:19-08:00" level=debug msg="reading hooks from /usr/share/containers/oci/hooks.d" time="2018-12-02T21:31:19-08:00" level=warning msg="failed to load hooks: {}%!(EXTRA *os.PathError=open /usr/share/containers/oci/hooks.d: no such file or directory)" With this commit: $ podman --log-level=debug run --rm docker.io/library/alpine echo 'successful container' 2>&1 | grep -i hook time="2018-12-02T21:33:07-08:00" level=debug msg="reading hooks from /usr/share/containers/oci/hooks.d" time="2018-12-02T21:33:07-08:00" level=debug msg="reading hooks from /etc/containers/oci/hooks.d" time="2018-12-02T21:33:07-08:00" level=debug msg="added hook /etc/containers/oci/hooks.d/test.json" time="2018-12-02T21:33:07-08:00" level=debug msg="hook test.json matched; adding to stages [prestart]" time="2018-12-02T21:33:07-08:00" level=warning msg="implicit hook directories are deprecated; set --hooks-dir="/etc/containers/oci/hooks.d" explicitly to continue to load hooks from this directory" time="2018-12-02T21:33:07-08:00" level=error msg="container create failed: container_linux.go:336: starting container process caused "process_linux.go:399: container init caused \"process_linux.go:382: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: oh, noes!\\\\n\\\"\"" (I'd setup the hook to error out). You can see that it's silenly ignoring the ENOENT for /usr/share/containers/oci/hooks.d and continuing on to load hooks from /etc/containers/oci/hooks.d. When it loads the hook, it also logs a warning-level message suggesting that callers explicitly configure their hook directories. That will help consumers migrate, so we can drop the implicit hook directories in some future release. When folks *do* explicitly configure hook directories (via the newly-public --hooks-dir and hooks_dir options), we error out if they're missing: $ podman --hooks-dir /does/not/exist run --rm docker.io/library/alpine echo 'successful container' error setting up OCI Hooks: open /does/not/exist: no such file or directory I've dropped the trailing "path" from the old, hidden --hooks-dir-path and hooks_dir_path because I think "dir(ectory)" is already enough context for "we expect a path argument". I consider this name change non-breaking because the old forms were undocumented. Coming back to rootless users, I've enabled hooks now. I expect they were previously disabled because users had no way to avoid /usr/share/containers/oci/hooks.d which might contain hooks that required root permissions. But now rootless users will have to explicitly configure hook directories, and since their default config is from ~/.config/containers/libpod.conf, it's a misconfiguration if it contains hooks_dir entries which point at directories with hooks that require root access. We error out so they can fix their libpod.conf. [1]: https://github.com/containers/libpod/pull/1487#discussion_r218149355 Signed-off-by: W. Trevor King <wking@tremily.us>
* Merge pull request #1880 from baude/f29fixesOpenShift Merge Robot2018-11-28
|\ | | | | Fix golang formatting issues
| * Fix golang formatting issuesbaude2018-11-28
| | | | | | | | | | | | | | Whe running unittests on newer golang versions, we observe failures with some formatting types when no declared correctly. Signed-off-by: baude <bbaude@redhat.com>
* | Merge pull request #1846 from cgwalters/netns-dns-localhostOpenShift Merge Robot2018-11-28
|\ \ | |/ |/| Use host's resolv.conf if no network namespace enabled
| * Use host's resolv.conf if no network namespace enabledColin Walters2018-11-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | My host system runs Fedora Silverblue 29 and I have NetworkManager's `dns=dnsmasq` setting enabled, so my `/etc/resolv.conf` only has `127.0.0.1`. I also run my development podman containers with `--net=host` for various reasons. If we have a host network namespace, there's no reason not to just use the host's nameserver configuration either. This fixes e.g. accessing content on a VPN, and is also faster since the container is using cached DNS. I know this doesn't solve the bigger picture issue of localhost-DNS conflicting with bridged networking, but that's far more involved, probably requiring a DNS proxy in the container. This patch makes my workflow a lot nicer and was easy to write. Signed-off-by: Colin Walters <walters@verbum.org>
* | Use also a struct to pass options to Restore()Adrian Reber2018-11-28
|/ | | | | | | | | | | | | This is basically the same change as ff47a4c2d5485fc49f937f3ce0c4e2fd6bdb1956 (Use a struct to pass options to Checkpoint()) just for the Restore() function. It is used to pass multiple restore options to the API and down to conmon which is used to restore containers. This is for the upcoming changes to support checkpointing and restoring containers with '--tcp-established'. Signed-off-by: Adrian Reber <areber@redhat.com>
* Merge pull request #1850 from vrothberg/mount-propagationOpenShift Merge Robot2018-11-27
|\ | | | | set root propagation based on volume properties
| * set root propagation based on volume propertiesValentin Rothberg2018-11-26
| | | | | | | | | | | | | | | | | | | | | | | | Set the root propagation based on the properties of volumes and default mounts. To remain compatibility, follow the semantics of Docker. If a volume is shared, keep the root propagation shared which works for slave and private volumes too. For slave volumes, it can either be shared or rshared. Do not change the root propagation for private volumes and stick with the default. Fixes: #1834 Signed-off-by: Valentin Rothberg <vrothberg@suse.com>
* | Merge pull request #1734 from rhatdan/networkOpenShift Merge Robot2018-11-27
|\ \ | |/ |/| libpod should know if the network is disabled
| * libpod should know if the network is disabledDaniel J Walsh2018-11-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | /etc/resolv.conf and /etc/hosts should not be created and mounted when the network is disabled. We should not be calling the network setup and cleanup functions when it is disabled either. In doing this patch, I found that all of the bind mounts were particular to Linux along with the generate functions, so I moved them to container_internal_linux.go Since we are checking if we are using a network namespace, we need to check after the network namespaces has been created in the spec. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* | Added option to keep containers running after checkpointingAdrian Reber2018-11-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CRIU supports to leave processes running after checkpointing: -R|--leave-running leave tasks in running state after checkpoint runc also support to leave containers running after checkpointing: --leave-running leave the process running after checkpointing With this commit the support to leave a container running after checkpointing is brought to Podman: --leave-running, -R leave the container running after writing checkpoint to disk Now it is possible to checkpoint a container at some point in time without stopping the container. This can be used to rollback the container to an early state: $ podman run --tmpfs /tmp --name podman-criu-test -d docker://docker.io/yovfiatbeb/podman-criu-test $ curl 10.88.64.253:8080/examples/servlets/servlet/HelloWorldExample 3 $ podman container checkpoint -R -l $ curl 10.88.64.253:8080/examples/servlets/servlet/HelloWorldExample 4 $ curl 10.88.64.253:8080/examples/servlets/servlet/HelloWorldExample 5 $ podman stop -l $ podman container restore -l $ curl 10.88.64.253:8080/examples/servlets/servlet/HelloWorldExample 4 So after checkpointing the container kept running and was stopped after some time. Restoring this container will restore the state right at the checkpoint. Signed-off-by: Adrian Reber <areber@redhat.com>
* | Use a struct to pass options to Checkpoint()Adrian Reber2018-11-20
|/ | | | | | | | | For upcoming changes to the Checkpoint() functions this commit switches checkpoint options from a boolean to a struct, so that additional options can be passed easily to Checkpoint() without changing the function parameters all the time. Signed-off-by: Adrian Reber <areber@redhat.com>
* Accurately update state if prepare() partially failsMatthew Heon2018-11-08
| | | | | | | | | We are seeing some issues where, when part of prepare() fails (originally noticed due to a bad static IP), the other half does not successfully clean up, and the state can be left in a bad place (not knowing about an active SHM mount for example). Signed-off-by: Matthew Heon <mheon@redhat.com>
* Merge pull request #1771 from baude/prepareOpenShift Merge Robot2018-11-07
|\ | | | | move defer'd function declaration ahead of prepare error return
| * move defer'd function declaration ahead of prepare error returnbaude2018-11-07
| | | | | | | | Signed-off-by: baude <bbaude@redhat.com>
* | rootless: mount /sys/fs/cgroup/systemd from the hostGiuseppe Scrivano2018-11-07
| | | | | | | | | | | | systemd requires /sys/fs/cgroup/systemd to be writeable. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* | rootless: don't bind mount /sys/fs/cgroup/systemd in systemd modeGiuseppe Scrivano2018-11-07
|/ | | | | | | it is not writeable by non-root users so there is no point in having access to it from a container. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* get user and group information using securejoin and runc's user librarybaude2018-10-29
| | | | | | | | | | | for the purposes of performance and security, we use securejoin to contstruct the root fs's path so that symlinks are what they appear to be and no pointing to something naughty. then instead of chrooting to parse /etc/passwd|/etc/group, we now use the runc user/group methods which saves us quite a bit of performance. Signed-off-by: baude <bbaude@redhat.com>
* Merge pull request #1699 from baude/rundOpenShift Merge Robot2018-10-25
|\ | | | | run performance improvements
| * run prepare in parallelbaude2018-10-25
| | | | | | | | | | | | | | run prepare() -- which consists of creating a network namespace and mounting the container image is now run in parallel. This saves 25-40ms. Signed-off-by: baude <bbaude@redhat.com>
* | Increase security and performance when looking up groupsbaude2018-10-25
|/ | | | | | | | | | We implement the securejoin method to make sure the paths to /etc/passwd and /etc/group are not symlinks to something naughty or outside the container image. And then instead of actually chrooting, we use the runc functions to get information about a user. The net result is increased security and a a performance gain from 41ms to 100us. Signed-off-by: baude <bbaude@redhat.com>
* Use the CRIU version check in checkpoint/restoreAdrian Reber2018-10-23
| | | | | | | | The newly introduced CRIU version check is now used to make sure checkpointing and restoring is only used if the CRIU version is new enough. Signed-off-by: Adrian Reber <areber@redhat.com>
* Fix CGroup paths used for systemd CGroup mountMatthew Heon2018-10-17
| | | | | | | We already have functions for retrieving the container's CGroup path, so use them instead of manually generating a path. Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
* Mount proper cgroup for systemd to manage inside of the container.Daniel J Walsh2018-10-15
| | | | | | | | | | | | We are still requiring oci-systemd-hook to be installed in order to run systemd within a container. This patch properly mounts /sys/fs/cgroup/systemd/libpod_parent/libpod-UUID on /sys/fs/cgroup/systemd inside of container. Since we need the UUID of the container, we needed to move Systemd to be a config option of the container. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* Add support to checkpoint/restore containersAdrian Reber2018-10-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | runc uses CRIU to support checkpoint and restore of containers. This brings an initial checkpoint/restore implementation to podman. None of the additional runc flags are yet supported and container migration optimization (pre-copy/post-copy) is also left for the future. The current status is that it is possible to checkpoint and restore a container. I am testing on RHEL-7.x and as the combination of RHEL-7 and CRIU has seccomp troubles I have to create the container without seccomp. With the following steps I am able to checkpoint and restore a container: # podman run --security-opt="seccomp=unconfined" -d registry.fedoraproject.org/f27/httpd # curl -I 10.22.0.78:8080 HTTP/1.1 403 Forbidden # <-- this is actually a good answer # podman container checkpoint <container> # curl -I 10.22.0.78:8080 curl: (7) Failed connect to 10.22.0.78:8080; No route to host # podman container restore <container> # curl -I 10.22.0.78:8080 HTTP/1.1 403 Forbidden I am using CRIU, runc and conmon from git. All required changes for checkpoint/restore support in podman have been merged in the corresponding projects. To have the same IP address in the restored container as before checkpointing, CNI is told which IP address to use. If the saved network configuration cannot be found during restore, the container is restored with a new IP address. For CRIU to restore established TCP connections the IP address of the network namespace used for restore needs to be the same. For TCP connections in the listening state the IP address can change. During restore only one network interface with one IP address is handled correctly. Support to restore containers with more advanced network configuration will be implemented later. v2: * comment typo * print debug messages during cleanup of restore files * use createContainer() instead of createOCIContainer() * introduce helper CheckpointPath() * do not try to restore a container that is paused * use existing helper functions for cleanup * restructure code flow for better readability * do not try to restore if checkpoint/inventory.img is missing * git add checkpoint.go restore.go v3: * move checkpoint/restore under 'podman container' v4: * incorporated changes from latest reviews Signed-off-by: Adrian Reber <areber@redhat.com>
* Disable problematic SELinux code causing runc issuesMatthew Heon2018-09-25
| | | | | | | Signed-off-by: Matthew Heon <matthew.heon@gmail.com> Closes: #1541 Approved by: baude
* Add --mount option for `create` & `run` commandDaniel J Walsh2018-09-21
| | | | | | | | Signed-off-by: Kunal Kushwaha <kushwaha_kunal_v7@lab.ntt.co.jp> Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> Closes: #1524 Approved by: mheon
* Add new field to libpod to indicate whether or not to use labellingDaniel J Walsh2018-09-20
| | | | | | | | | | | | | | | Also update some missing fields libpod.conf obtions in man pages. Fix sort order of security options and add a note about disabling labeling. When a process requests a new label. libpod needs to reserve all labels to make sure that their are no conflicts. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> Closes: #1406 Approved by: mheon
* Bind Mounts should be mounted read-only when in read-only modeDaniel J Walsh2018-09-20
| | | | | | | | | | We don't want to allow users to write to /etc/resolv.conf or /etc/hosts if in read only mode. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> Closes: #1510 Approved by: TomSweeneyRedHat
* Fix Mount PropagationGiuseppe Scrivano2018-08-27
| | | | | | | | | Default mount propagation inside of containes should be private Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> Closes: #1305 Approved by: mheon
* Fixing network ns segfaulthaircommander2018-08-23
| | | | | | | | | As well as small style corrections, update pod_top_test to use CreatePod, and move handling of adding a container to the pod's namespace from container_internal_linux to libpod/option. Signed-off-by: haircommander <pehunt@redhat.com> Closes: #1187 Approved by: mheon
* Change pause container to infra containerhaircommander2018-08-23
| | | | | | | Signed-off-by: haircommander <pehunt@redhat.com> Closes: #1187 Approved by: mheon
* Added option to share kernel namespaces in libpod and podmanhaircommander2018-08-23
| | | | | | | | | A pause container is added to the pod if the user opts in. The default pause image and command can be overridden. Pause containers are ignored in ps unless the -a option is present. Pod inspect and pod ps show shared namespaces and pause container. A pause container can't be removed with podman rm, and a pod can be removed if it only has a pause container. Signed-off-by: haircommander <pehunt@redhat.com> Closes: #1187 Approved by: mheon
* podman: fix --uts=hostGiuseppe Scrivano2018-08-17
| | | | | | | | | | | | | | | Do not set any hostname value in the OCI configuration when --uts=host is used and the user didn't specify any value. This prevents an error from the OCI runtime as it cannot set the hostname without a new UTS namespace. Differently, the HOSTNAME environment variable is always set. When --uts=host is used, HOSTNAME gets the value from the host. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Closes: #1280 Approved by: baude
* switch projectatomic to containersDaniel J Walsh2018-08-16
| | | | | | | | | | Need to get some small changes into libpod to pull back into buildah to complete buildah transition. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> Closes: #1270 Approved by: mheon
* We need to sort mounts so that one mount does not over mount another.Daniel J Walsh2018-08-10
| | | | | | | | | | | | | | | | | Currently we add mounts from images, volumes and internal. We can accidently over mount an existing mount. This patch sorts the mounts to make sure a parent directory is always mounted before its content. Had to change the default propagation on image volume mounts from shared to private to stop mount points from leaking out of the container. Also switched from using some docker/docker/pkg to container/storage/pkg to remove some dependencies on Docker. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> Closes: #1243 Approved by: mheon
* Support multiple networksbaude2018-07-12
| | | | | | | | | | | | | This is a refresh of Dan William's PR #974 with a rebase and proper vendoring of ocicni and containernetworking/cni. It adds the ability to define multiple networks as so: podman run --network=net1,net2,foobar ... Signed-off-by: baude <bbaude@redhat.com> Closes: #1082 Approved by: baude
* Add --volumes-from flag to podman run and createumohnani82018-07-09
| | | | | | | | | | podman now supports --volumes-from flag, which allows users to add all the volumes an existing container has to a new one. Signed-off-by: umohnani8 <umohnani@redhat.com> Closes: #931 Approved by: mheon
* Remove per-container CGroup parentsMatthew Heon2018-07-06
| | | | | | | | | | | | | | | | | | | Originally, it seemed like a good idea to place Conmon and the container it managed under a shared CGroup, so we could manage the two together. It's become increasingly clear that this is a potential performance sore point, gains us little practical benefit in managing Conmon, and adds extra steps to container cleanup that interfere with Conmon postrun hooks. Revert back to a shared CGroup for conmon processes under the CGroup parent. This will retain per-pod conmon CGroups as well if the pod is set to create a CGroup and act as CGroup parent for its containers. Signed-off-by: Matthew Heon <matthew.heon@gmail.com> Closes: #1051 Approved by: umohnani8
* more changes to compile darwinbaude2018-07-05
| | | | | | | | | | | | | | | | this should represent the last major changes to get darwin to **compile**. again, the purpose here is to get darwin to compile so that we can eventually implement a ci task that would protect against regressions for darwin compilation. i have left the manual darwin compilation largely static still and in fact now only interject (manually) two build tags to assist with the build. trevor king has great ideas on how to make this better and i will defer final implementation of those to him. Signed-off-by: baude <bbaude@redhat.com> Closes: #1047 Approved by: rhatdan
* changes to allow for darwin compilationbaude2018-06-29
Signed-off-by: baude <bbaude@redhat.com> Closes: #1015 Approved by: baude