aboutsummaryrefslogtreecommitdiff
path: root/libpod/container_internal_linux.go
Commit message (Collapse)AuthorAge
* Merge pull request #2585 from giuseppe/build-honor-netOpenShift Merge Robot2019-03-12
|\ | | | | build: honor --net
| * slirp4netns: add builtin DNS server to resolv.confGiuseppe Scrivano2019-03-11
| | | | | | | | Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* | Ensure that tmpfs mounts do not have symlinksMatthew Heon2019-03-11
|/ | | | | | | | | | | | When mounting a tmpfs, runc attempts to make the directory it will be mounted at. Unfortunately, Golang's os.MkdirAll deals very poorly with symlinks being part of the path. I looked into fixing this in runc, but it's honestly much easier to just ensure we don't trigger the issue on our end. Fixes BZ #1686610 Signed-off-by: Matthew Heon <matthew.heon@pm.me>
* Don't delete another container's resolv and hosts filesMatthew Heon2019-03-10
| | | | | | | | | | | The logic of deleting and recreating /etc/hosts and /etc/resolv.conf only makes sense when we're the one that creates the files - when we don't, it just removes them, and there's nothing left to use. Fixes #2602 Signed-off-by: Matthew Heon <matthew.heon@pm.me>
* Move secrets package to buildahDaniel J Walsh2019-03-08
| | | | | | | | Trying to remove circular dependencies between libpod and buildah. First step to move pkg content from libpod to buildah. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* Append hosts to dependency container's /etc/hosts filePeter Hunt2019-03-05
| | | | | | Before, any container with a netNS dependency simply used its dependency container's hosts file, and didn't abide its configuration (mainly --add-host). Fix this by always appending to the dependency container's hosts file, creating one if necessary. Signed-off-by: Peter Hunt <pehunt@redhat.com>
* Verify that used OCI runtime supports checkpointAdrian Reber2019-03-01
| | | | | | | | | | To be able to use OCI runtimes which do not implement checkpoint/restore this adds a check to the checkpoint code path and the checkpoint/restore tests to see if it knows about the checkpoint subcommand. If the used OCI runtime does not implement checkpoint/restore the tests are skipped and the actual 'podman container checkpoint' returns an error. Signed-off-by: Adrian Reber <areber@redhat.com>
* Label CRIU log files correctlyAdrian Reber2019-02-26
| | | | | | | | | | | | | | | CRIU creates a log file during checkpointing in .../userdata/dump.log. The problem with this file is, is that CRIU injects a parasite code into the container processes and this parasite code also writes to the same log file. At this point a process from the inside of the container is trying to access the log file on the outside of the container and SELinux prohibits this. To enable writing to the log file from the injected parasite code, this commit creates an empty log file and labels the log file with c.MountLabel(). CRIU uses existing files when writing it logs so the log file label persists and now, with the correct label, SELinux no longer blocks access to the log file. Signed-off-by: Adrian Reber <areber@redhat.com>
* Merge pull request #2358 from rhatdan/namespaceOpenShift Merge Robot2019-02-25
|\ | | | | Fix up handling of user defined network namespaces
| * Fix up handling of user defined network namespacesDaniel J Walsh2019-02-23
| | | | | | | | | | | | | | | | | | If user specifies network namespace and the /etc/netns/XXX/resolv.conf exists, we should use this rather then /etc/resolv.conf Also fail cleaner if the user specifies an invalid Network Namespace. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* | In shared networkNS /etc/resolv.conf&/etc/hosts should be sharedDaniel J Walsh2019-02-23
|/ | | | | | | | | | | We should just bind mount the original containers /etc/resolv.conf and /etchosts into the new container. Changes in the resolv.conf and hosts should be seen by all containers, This matches Docker behaviour. In order to make this work the labels on these files need to have a shared SELinux label. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* OpenTracing support added to start, stop, run, create, pull, and psSebastian Jug2019-02-18
| | | | | | Drop context.Context field from cli.Context Signed-off-by: Sebastian Jug <sejug@redhat.com>
* Fix volume handling in podmanDaniel J Walsh2019-02-14
| | | | | | | | | | | | | | | | | | iFix builtin volumes to work with podman volume Currently builtin volumes are not recored in podman volumes when they are created automatically. This patch fixes this. Remove container volumes when requested Currently the --volume option on podman remove does nothing. This will implement the changes needed to remove the volumes if the user requests it. When removing a volume make sure that no container uses the volume. Signed-off-by: Daniel J Walsh dwalsh@redhat.com Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* Merge pull request #2138 from giuseppe/rootless-pod-fixOpenShift Merge Robot2019-01-11
|\ | | | | rootless: fix usage of create --pod=new:FOO
| * spec: add nosuid,noexec,nodev to ro bind mountGiuseppe Scrivano2019-01-11
| | | | | | | | | | | | | | runc fails to change the ro mode of a rootless bind mount if the other flags are not kept. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* | Replace tab with spaces in MarshalIndent in libpodMatthew Heon2019-01-10
| | | | | | | | | | | | | | | | | | The json-iterator package will panic on attempting to use MarshalIndent with a non-space indentation. This is sort of silly but swapping from tabs to spaces is not a big issue for us, so let's work around the silly panic. Signed-off-by: Matthew Heon <matthew.heon@pm.me>
* | Move all libpod/ JSON references over to jsoniterMatthew Heon2019-01-10
|/ | | | Signed-off-by: Matthew Heon <matthew.heon@pm.me>
* apparmor: apply default profile at container initializationValentin Rothberg2019-01-09
| | | | | | | | | | | | | | | | | | | Apply the default AppArmor profile at container initialization to cover all possible code paths (i.e., podman-{start,run}) before executing the runtime. This allows moving most of the logic into pkg/apparmor. Also make the loading and application of the default AppArmor profile versio-indepenent by checking for the `libpod-default-` prefix and over-writing the profile in the run-time spec if needed. The intitial run-time spec of the container differs a bit from the applied one when having started the container, which results in displaying a potentially outdated AppArmor profile when inspecting a container. To fix that, load the container config from the file system if present and use it to display the data. Fixes: #2107 Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
* Merge pull request #2061 from adrianreber/static-ipOpenShift Merge Robot2019-01-09
|\ | | | | Use existing interface to request IP address during restore
| * Use existing interface to request IP address during restoreAdrian Reber2019-01-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The initial implementation to request the same IP address for a container during a restore was based on environment variables influencing CNI. With this commit the IP address selection switches to Podman's internal static IP API. This commit does a comment change in libpod/container_easyjson.go to avoid unnecessary re-generation of libpod/container_easyjson.go during build as this fails in CI. The reason for this is that make sees that libpod/container_easyjson.go needs to be re-created. The commit, however, only changes a part of libpod/container.go which is marked as 'ffjson: skip'. Signed-off-by: Adrian Reber <areber@redhat.com>
* | hooks: Add pre-create hooks for runtime-config manipulationW. Trevor King2019-01-08
|/ | | | | | | | | | | | | | | | | | | | | | | There's been a lot of discussion over in [1] about how to support the NVIDIA folks and others who want to be able to create devices (possibly after having loaded kernel modules) and bind userspace libraries into the container. Currently that's happening in the middle of runc's create-time mount handling before the container pivots to its new root directory with runc's incorrectly-timed prestart hook trigger [2]. With this commit, we extend hooks with a 'precreate' stage to allow trusted parties to manipulate the config JSON before calling the runtime's 'create'. I'm recycling the existing Hook schema from pkg/hooks for this, because we'll want Timeout for reliability and When to avoid the expense of fork/exec when a given hook does not need to make config changes [3]. [1]: https://github.com/opencontainers/runc/pull/1811 [2]: https://github.com/opencontainers/runc/issues/1710 [3]: https://github.com/containers/libpod/issues/1828#issuecomment-439888059 Signed-off-by: W. Trevor King <wking@tremily.us>
* Fixes to handle /dev/shm correctly.Daniel J Walsh2018-12-24
| | | | | | | | | | | | | | | | | | We had two problems with /dev/shm, first, you mount the container read/only then /dev/shm was mounted read/only. This is a bug a tmpfs directory should be read/write within a read-only container. The second problem is we were ignoring users mounted /dev/shm from the host. If user specified podman run -d -v /dev/shm:/dev/shm ... We were dropping this mount and still using the internal mount. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* Containers sharing a netns should share resolv/hostsMatthew Heon2018-12-11
| | | | | | | | | | | | | When sharing a network namespace, containers should also share resolv.conf and /etc/hosts in case a container process made changes to either (for example, if I set up a VPN client in container A and join container B to its network namespace, I expect container B to use the DNS servers from A to ensure it can see everything on the VPN). Resolves: #1546 Signed-off-by: Matthew Heon <mheon@redhat.com>
* Prevent a second lookup of user for image volumesMatthew Heon2018-12-11
| | | | | | | | | | Instead of forcing another user lookup when mounting image volumes, just use the information we looked up when we started generating the spec. This may resolve #1817 Signed-off-by: Matthew Heon <mheon@redhat.com>
* bind mount /etc/resolv.conf|hosts in podsbaude2018-12-06
| | | | | | | containers inside pods need to make sure they get /etc/resolv.conf and /etc/hosts bind mounted when network is expected Signed-off-by: baude <bbaude@redhat.com>
* Merge pull request #1940 from wking/numeric-gidOpenShift Merge Robot2018-12-05
|\ | | | | libpod/container_internal_linux: Allow gids that aren't in the group file
| * libpod/container_internal_linux: Allow gids that aren't in the group fileW. Trevor King2018-12-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an image config sets config.User [1] to a numeric group (like 1000:1000), but those values do not exist in the container's /etc/group, libpod is currently breaking: $ podman run --rm registry.svc.ci.openshift.org/ci-op-zvml7cd6/pipeline:installer --help error creating temporary passwd file for container 228f6e9943d6f18b93c19644e9b619ec4d459a3e0eb31680e064eeedf6473678: unable to get gid 1000 from group file: no matching entries in group file However, the OCI spec requires converters to copy numeric uid and gid to the runtime config verbatim [2]. With this commit, I'm frontloading the "is groupspec an integer?" check and only bothering with lookup.GetGroup when it was not. I've also removed a few .Mounted checks, which are originally from 00d38cb3 (podman create/run need to load information from the image, 2017-12-18, #110). We don't need a mounted container filesystem to translate integers. And when the lookup code needs to fall back to the mounted root to translate names, it can handle erroring out internally (and looking it over, it seems to do that already). [1]: https://github.com/opencontainers/image-spec/blame/v1.0.1/config.md#L118-L123 [2]: https://github.com/opencontainers/image-spec/blame/v1.0.1/conversion.md#L70 Signed-off-by: W. Trevor King <wking@tremily.us>
* | libpod/container_internal: Deprecate implicit hook directoriesW. Trevor King2018-12-03
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Part of the motivation for 800eb863 (Hooks supports two directories, process default and override, 2018-09-17, #1487) was [1]: > We only use this for override. The reason this was caught is people > are trying to get hooks to work with CoreOS. You are not allowed to > write to /usr/share... on CoreOS, so they wanted podman to also look > at /etc, where users and third parties can write. But we'd also been disabling hooks completely for rootless users. And even for root users, the override logic was tricky when folks actually had content in both directories. For example, if you wanted to disable a hook from the default directory, you'd have to add a no-op hook to the override directory. Also, the previous implementation failed to handle the case where there hooks defined in the override directory but the default directory did not exist: $ podman version Version: 0.11.2-dev Go Version: go1.10.3 Git Commit: "6df7409cb5a41c710164c42ed35e33b28f3f7214" Built: Sun Dec 2 21:30:06 2018 OS/Arch: linux/amd64 $ ls -l /etc/containers/oci/hooks.d/test.json -rw-r--r--. 1 root root 184 Dec 2 16:27 /etc/containers/oci/hooks.d/test.json $ podman --log-level=debug run --rm docker.io/library/alpine echo 'successful container' 2>&1 | grep -i hook time="2018-12-02T21:31:19-08:00" level=debug msg="reading hooks from /usr/share/containers/oci/hooks.d" time="2018-12-02T21:31:19-08:00" level=warning msg="failed to load hooks: {}%!(EXTRA *os.PathError=open /usr/share/containers/oci/hooks.d: no such file or directory)" With this commit: $ podman --log-level=debug run --rm docker.io/library/alpine echo 'successful container' 2>&1 | grep -i hook time="2018-12-02T21:33:07-08:00" level=debug msg="reading hooks from /usr/share/containers/oci/hooks.d" time="2018-12-02T21:33:07-08:00" level=debug msg="reading hooks from /etc/containers/oci/hooks.d" time="2018-12-02T21:33:07-08:00" level=debug msg="added hook /etc/containers/oci/hooks.d/test.json" time="2018-12-02T21:33:07-08:00" level=debug msg="hook test.json matched; adding to stages [prestart]" time="2018-12-02T21:33:07-08:00" level=warning msg="implicit hook directories are deprecated; set --hooks-dir="/etc/containers/oci/hooks.d" explicitly to continue to load hooks from this directory" time="2018-12-02T21:33:07-08:00" level=error msg="container create failed: container_linux.go:336: starting container process caused "process_linux.go:399: container init caused \"process_linux.go:382: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: oh, noes!\\\\n\\\"\"" (I'd setup the hook to error out). You can see that it's silenly ignoring the ENOENT for /usr/share/containers/oci/hooks.d and continuing on to load hooks from /etc/containers/oci/hooks.d. When it loads the hook, it also logs a warning-level message suggesting that callers explicitly configure their hook directories. That will help consumers migrate, so we can drop the implicit hook directories in some future release. When folks *do* explicitly configure hook directories (via the newly-public --hooks-dir and hooks_dir options), we error out if they're missing: $ podman --hooks-dir /does/not/exist run --rm docker.io/library/alpine echo 'successful container' error setting up OCI Hooks: open /does/not/exist: no such file or directory I've dropped the trailing "path" from the old, hidden --hooks-dir-path and hooks_dir_path because I think "dir(ectory)" is already enough context for "we expect a path argument". I consider this name change non-breaking because the old forms were undocumented. Coming back to rootless users, I've enabled hooks now. I expect they were previously disabled because users had no way to avoid /usr/share/containers/oci/hooks.d which might contain hooks that required root permissions. But now rootless users will have to explicitly configure hook directories, and since their default config is from ~/.config/containers/libpod.conf, it's a misconfiguration if it contains hooks_dir entries which point at directories with hooks that require root access. We error out so they can fix their libpod.conf. [1]: https://github.com/containers/libpod/pull/1487#discussion_r218149355 Signed-off-by: W. Trevor King <wking@tremily.us>
* Merge pull request #1880 from baude/f29fixesOpenShift Merge Robot2018-11-28
|\ | | | | Fix golang formatting issues
| * Fix golang formatting issuesbaude2018-11-28
| | | | | | | | | | | | | | Whe running unittests on newer golang versions, we observe failures with some formatting types when no declared correctly. Signed-off-by: baude <bbaude@redhat.com>
* | Merge pull request #1846 from cgwalters/netns-dns-localhostOpenShift Merge Robot2018-11-28
|\ \ | |/ |/| Use host's resolv.conf if no network namespace enabled
| * Use host's resolv.conf if no network namespace enabledColin Walters2018-11-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | My host system runs Fedora Silverblue 29 and I have NetworkManager's `dns=dnsmasq` setting enabled, so my `/etc/resolv.conf` only has `127.0.0.1`. I also run my development podman containers with `--net=host` for various reasons. If we have a host network namespace, there's no reason not to just use the host's nameserver configuration either. This fixes e.g. accessing content on a VPN, and is also faster since the container is using cached DNS. I know this doesn't solve the bigger picture issue of localhost-DNS conflicting with bridged networking, but that's far more involved, probably requiring a DNS proxy in the container. This patch makes my workflow a lot nicer and was easy to write. Signed-off-by: Colin Walters <walters@verbum.org>
* | Use also a struct to pass options to Restore()Adrian Reber2018-11-28
|/ | | | | | | | | | | | | This is basically the same change as ff47a4c2d5485fc49f937f3ce0c4e2fd6bdb1956 (Use a struct to pass options to Checkpoint()) just for the Restore() function. It is used to pass multiple restore options to the API and down to conmon which is used to restore containers. This is for the upcoming changes to support checkpointing and restoring containers with '--tcp-established'. Signed-off-by: Adrian Reber <areber@redhat.com>
* Merge pull request #1850 from vrothberg/mount-propagationOpenShift Merge Robot2018-11-27
|\ | | | | set root propagation based on volume properties
| * set root propagation based on volume propertiesValentin Rothberg2018-11-26
| | | | | | | | | | | | | | | | | | | | | | | | Set the root propagation based on the properties of volumes and default mounts. To remain compatibility, follow the semantics of Docker. If a volume is shared, keep the root propagation shared which works for slave and private volumes too. For slave volumes, it can either be shared or rshared. Do not change the root propagation for private volumes and stick with the default. Fixes: #1834 Signed-off-by: Valentin Rothberg <vrothberg@suse.com>
* | Merge pull request #1734 from rhatdan/networkOpenShift Merge Robot2018-11-27
|\ \ | |/ |/| libpod should know if the network is disabled
| * libpod should know if the network is disabledDaniel J Walsh2018-11-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | /etc/resolv.conf and /etc/hosts should not be created and mounted when the network is disabled. We should not be calling the network setup and cleanup functions when it is disabled either. In doing this patch, I found that all of the bind mounts were particular to Linux along with the generate functions, so I moved them to container_internal_linux.go Since we are checking if we are using a network namespace, we need to check after the network namespaces has been created in the spec. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* | Added option to keep containers running after checkpointingAdrian Reber2018-11-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CRIU supports to leave processes running after checkpointing: -R|--leave-running leave tasks in running state after checkpoint runc also support to leave containers running after checkpointing: --leave-running leave the process running after checkpointing With this commit the support to leave a container running after checkpointing is brought to Podman: --leave-running, -R leave the container running after writing checkpoint to disk Now it is possible to checkpoint a container at some point in time without stopping the container. This can be used to rollback the container to an early state: $ podman run --tmpfs /tmp --name podman-criu-test -d docker://docker.io/yovfiatbeb/podman-criu-test $ curl 10.88.64.253:8080/examples/servlets/servlet/HelloWorldExample 3 $ podman container checkpoint -R -l $ curl 10.88.64.253:8080/examples/servlets/servlet/HelloWorldExample 4 $ curl 10.88.64.253:8080/examples/servlets/servlet/HelloWorldExample 5 $ podman stop -l $ podman container restore -l $ curl 10.88.64.253:8080/examples/servlets/servlet/HelloWorldExample 4 So after checkpointing the container kept running and was stopped after some time. Restoring this container will restore the state right at the checkpoint. Signed-off-by: Adrian Reber <areber@redhat.com>
* | Use a struct to pass options to Checkpoint()Adrian Reber2018-11-20
|/ | | | | | | | | For upcoming changes to the Checkpoint() functions this commit switches checkpoint options from a boolean to a struct, so that additional options can be passed easily to Checkpoint() without changing the function parameters all the time. Signed-off-by: Adrian Reber <areber@redhat.com>
* Accurately update state if prepare() partially failsMatthew Heon2018-11-08
| | | | | | | | | We are seeing some issues where, when part of prepare() fails (originally noticed due to a bad static IP), the other half does not successfully clean up, and the state can be left in a bad place (not knowing about an active SHM mount for example). Signed-off-by: Matthew Heon <mheon@redhat.com>
* Merge pull request #1771 from baude/prepareOpenShift Merge Robot2018-11-07
|\ | | | | move defer'd function declaration ahead of prepare error return
| * move defer'd function declaration ahead of prepare error returnbaude2018-11-07
| | | | | | | | Signed-off-by: baude <bbaude@redhat.com>
* | rootless: mount /sys/fs/cgroup/systemd from the hostGiuseppe Scrivano2018-11-07
| | | | | | | | | | | | systemd requires /sys/fs/cgroup/systemd to be writeable. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* | rootless: don't bind mount /sys/fs/cgroup/systemd in systemd modeGiuseppe Scrivano2018-11-07
|/ | | | | | | it is not writeable by non-root users so there is no point in having access to it from a container. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* get user and group information using securejoin and runc's user librarybaude2018-10-29
| | | | | | | | | | | for the purposes of performance and security, we use securejoin to contstruct the root fs's path so that symlinks are what they appear to be and no pointing to something naughty. then instead of chrooting to parse /etc/passwd|/etc/group, we now use the runc user/group methods which saves us quite a bit of performance. Signed-off-by: baude <bbaude@redhat.com>
* Merge pull request #1699 from baude/rundOpenShift Merge Robot2018-10-25
|\ | | | | run performance improvements
| * run prepare in parallelbaude2018-10-25
| | | | | | | | | | | | | | run prepare() -- which consists of creating a network namespace and mounting the container image is now run in parallel. This saves 25-40ms. Signed-off-by: baude <bbaude@redhat.com>
* | Increase security and performance when looking up groupsbaude2018-10-25
|/ | | | | | | | | | We implement the securejoin method to make sure the paths to /etc/passwd and /etc/group are not symlinks to something naughty or outside the container image. And then instead of actually chrooting, we use the runc functions to get information about a user. The net result is increased security and a a performance gain from 41ms to 100us. Signed-off-by: baude <bbaude@redhat.com>
* Use the CRIU version check in checkpoint/restoreAdrian Reber2018-10-23
| | | | | | | | The newly introduced CRIU version check is now used to make sure checkpointing and restoring is only used if the CRIU version is new enough. Signed-off-by: Adrian Reber <areber@redhat.com>
* Fix CGroup paths used for systemd CGroup mountMatthew Heon2018-10-17
| | | | | | | We already have functions for retrieving the container's CGroup path, so use them instead of manually generating a path. Signed-off-by: Matthew Heon <matthew.heon@gmail.com>