summaryrefslogtreecommitdiff
path: root/libpod/container_internal_linux.go
Commit message (Collapse)AuthorAge
* We should not be mounting /run as noexec when run with --systemdDaniel J Walsh2020-09-02
| | | | | | | The system defaults /run to "exec" mode, and we default --read-only mounts on /run to "exec", so --systemd should follow suit. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* Ensure rootless containers without a passwd can startMatthew Heon2020-08-31
| | | | | | | | | | | | | | | | We want to modify /etc/passwd to add an entry for the user in question, but at the same time we don't want to require the container provide a /etc/passwd (a container with a single, statically linked binary and nothing else is perfectly fine and should be allowed, for example). We could create the passwd file if it does not exist, but if the container doesn't provide one, it's probably better not to make one at all. Gate changes to /etc/passwd behind a stat() of the file in the container returning cleanly. Fixes #7515 Signed-off-by: Matthew Heon <mheon@redhat.com>
* vendor: update opencontainers/runtime-specGiuseppe Scrivano2020-08-21
| | | | Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* Don't limit the size on /run for systemd based containersDaniel J Walsh2020-08-18
| | | | | | | | | | | | | | We had a customer incident where they ran out of space on /run. If you don't specify size, it will be still limited to 50% or memory available in the cgroup the container is running in. If the cgroup is unlimited then the /run will be limited to 50% of the total memory on the system. Also /run is mounted on the host as exec, so no reason for us to mount it noexec. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* Change /sys/fs/cgroup/systemd mount to rprivateMatthew Heon2020-08-12
| | | | | | | | I used the wrong propagation first time around because I forgot that rprivate is the default propagation. Oops. Switch to rprivate so we're using the default. Signed-off-by: Matthew Heon <mheon@redhat.com>
* Ensure correct propagation for cgroupsv1 systemd cgroupMatthew Heon2020-08-11
| | | | | | | | | | | | | | | | On cgroups v1 systems, we need to mount /sys/fs/cgroup/systemd into the container. We were doing this with no explicit mount propagation tag, which means that, under some circumstances, the shared mount propagation could be chosen - which, combined with the fact that we need a mount to mask /sys/fs/cgroup/systemd/release_agent in the container, means we would leak a never-ending set of mounts under /sys/fs/cgroup/systemd/ on container restart. Fortunately, the fix is very simple - hardcode mount propagation to something that won't leak. Signed-off-by: Matthew Heon <mheon@redhat.com>
* Ensure WORKDIR from images is createdMatthew Heon2020-08-03
| | | | | | | | | | | | | | A recent crun change stopped the creation of the container's working directory if it does not exist. This is arguably correct for user-specified directories, to protect against typos; it is definitely not correct for image WORKDIR, where the image author definitely intended for the directory to be used. This makes Podman create the working directory and chown it to container root, if it does not already exist, and only if it was specified by an image, not the user. Signed-off-by: Matthew Heon <matthew.heon@pm.me>
* Merge pull request #6991 from mheon/change_passwd_ondiskOpenShift Merge Robot2020-07-29
|\ | | | | Make changes to /etc/passwd on disk for non-read only
| * Make changes to /etc/passwd on disk for non-read onlyMatthew Heon2020-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bind-mounting /etc/passwd into the container is problematic becuase of how system utilities like `useradd` work. They want to make a copy and then rename to try to prevent breakage; this is, unfortunately, impossible when the file they want to rename is a bind mount. The current behavior is fine for read-only containers, though, because we expect useradd to fail in those cases. Instead of bind-mounting, we can edit /etc/passwd in the container's rootfs. This is kind of gross, because the change will show up in `podman diff` and similar tools, and will be included in images made by `podman commit`. However, it's a lot better than breaking important system tools. Fixes #6953 Signed-off-by: Matthew Heon <matthew.heon@pm.me>
* | Switch all references to github.com/containers/libpod -> podmanDaniel J Walsh2020-07-28
| | | | | | | | Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* | Support default profile for apparmorDaniel J Walsh2020-07-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently you can not apply an ApparmorProfile if you specify --privileged. This patch will allow both to be specified simultaniosly. By default Apparmor should be disabled if the user specifies --privileged, but if the user specifies --security apparmor:PROFILE, with --privileged, we should do both. Added e2e run_apparmor_test.go Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* | Add --umask flag for create, runAshley Cui2020-07-21
| | | | | | | | | | | | | | | | --umask sets the umask inside the container Defaults to 0022 Co-authored-by: Daniel J Walsh <dwalsh@redhat.com> Signed-off-by: Ashley Cui <acui@redhat.com>
* | Add support for overlay volume mounts in podman.Qi Wang2020-07-20
|/ | | | | | | | Add support -v for overlay volume mounts in podman. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> Signed-off-by: Qi Wang <qiwan@redhat.com>
* Preserve passwd on container restartMatthew Heon2020-07-15
| | | | | | | | | | | | | | | We added code to create a `/etc/passwd` file that we bind-mount into the container in some cases (most notably, `--userns=keep-id` containers). This, unfortunately, was not persistent, so user-added users would be dropped on container restart. Changing where we store the file should fix this. Further, we want to ensure that lookups of users in the container use the right /etc/passwd if we replaced it. There was already logic to do this, but it only worked for user-added mounts; it's easy enough to alter it to use our mounts as well. Signed-off-by: Matthew Heon <matthew.heon@pm.me>
* Remove all instances of named return "err" from LibpodMatthew Heon2020-07-09
| | | | | | | | | | | | | | | | | This was inspired by https://github.com/cri-o/cri-o/pull/3934 and much of the logic for it is contained there. However, in brief, a named return called "err" can cause lots of code confusion and encourages using the wrong err variable in defer statements, which can make them work incorrectly. Using a separate name which is not used elsewhere makes it very clear what the defer should be doing. As part of this, remove a large number of named returns that were not used anywhere. Most of them were once needed, but are no longer necessary after previous refactors (but were accidentally retained). Signed-off-by: Matthew Heon <matthew.heon@pm.me>
* Add username to /etc/passwd inside of container if --userns keep-idDaniel J Walsh2020-07-07
| | | | | | | | | | If I enter a continer with --userns keep-id, my UID will be present inside of the container, but most likely my user will not be defined. This patch will take information about the user and stick it into the container. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* Merge pull request #6836 from ashley-cui/tzlibpodOpenShift Merge Robot2020-07-06
|\ | | | | Add --tz flag to create, run
| * Add --tz flag to create, runAshley Cui2020-07-02
| | | | | | | | | | | | | | --tz flag sets timezone inside container Can be set to IANA timezone as well as `local` to match host machine Signed-off-by: Ashley Cui <acui@redhat.com>
* | move go module to v2Valentin Rothberg2020-07-06
|/ | | | | | | | | | | | | | | With the advent of Podman 2.0.0 we crossed the magical barrier of go modules. While we were able to continue importing all packages inside of the project, the project could not be vendored anymore from the outside. Move the go module to new major version and change all imports to `github.com/containers/libpod/v2`. The renaming of the imports was done via `gomove` [1]. [1] https://github.com/KSubedi/gomove Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
* podman: add new cgroup mode splitGiuseppe Scrivano2020-06-25
| | | | | | | | | | | | | | | | | | | When running under systemd there is no need to create yet another cgroup for the container. With conmon-delegated the current cgroup will be split in two sub cgroups: - supervisor - container The supervisor cgroup will hold conmon and the podman process, while the container cgroup is used by the OCI runtime (using the cgroupfs backend). Closes: https://github.com/containers/libpod/issues/6400 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* Add container name to the /etc/hosts within the containerDaniel J Walsh2020-06-20
| | | | | | | | | | | | | | | | | | | | This will allow containers that connect to the network namespace be able to use the container name directly. For example you can do something like podman run -ti --name foobar fedora ping foobar While we can do this with hostname now, this seems more natural. Also if another container connects on the network to this container it can do podman run --network container:foobar fedora ping foobar And connect to the original container,without having to discover the name. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* Turn on More lintersDaniel J Walsh2020-06-15
| | | | | | | | | - misspell - prealloc - unparam - nakedret Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* libpod: fix check for slirp4netns netnsGiuseppe Scrivano2020-06-11
| | | | | | | | | | fix the check for c.state.NetNS == nil. Its value is changed in the first code block, so the condition is always true in the second one and we end up running slirp4netns twice. Closes: https://github.com/containers/libpod/issues/6538 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* container: do not set hostname when joining utsGiuseppe Scrivano2020-06-10
| | | | | | | do not set the hostname when joining an UTS namespace, as it could be owned by a different userns. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* container: make resolv.conf and hosts accessible in usernsGiuseppe Scrivano2020-06-10
| | | | | | | | when running in a new userns, make sure the resolv.conf and hosts files bind mounted from another container are accessible to root in the userns. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* check --user range for rootless containersQi Wang2020-06-02
| | | | | | Check --user range if it's a uid for rootless containers. Returns error if it is out of the range. From https://github.com/containers/libpod/issues/6431#issuecomment-636124686 Signed-off-by: Qi Wang <qiwan@redhat.com>
* Fix mountpont in SecretMountsWithUIDGIDDaniel J Walsh2020-05-19
| | | | | | | In FIPS Mode we expect to work off of the Mountpath not the Rundir path. This is causing FIPS Mode checks to fail. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* libpod: set hostname from joined containerGiuseppe Scrivano2020-04-27
| | | | | | | when joining a UTS namespace, take the hostname from the destination container. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* Update podman to use containers.confDaniel J Walsh2020-04-20
| | | | | | | | Add more default options parsing Switch to using --time as opposed to --timeout to better match Docker. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* userns: support --userns=autoGiuseppe Scrivano2020-04-06
| | | | | | | automatically pick an empty range and create an user namespace for the container. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* Add support for containers.confDaniel J Walsh2020-03-27
| | | | | | | vendor in c/common config pkg for containers.conf Signed-off-by: Qi Wang qiwan@redhat.com Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* Attempt manual removal of CNI IP allocations on refreshMatthew Heon2020-03-19
| | | | | | | | | | | | | | We previously attempted to work within CNI to do this, without success. So let's do it manually, instead. We know where the files should live, so we can remove them ourselves instead. This solves issues around sudden reboots where containers do not have time to fully tear themselves down, and leave IP address allocations which, for various reasons, are not stored in tmpfs and persist through reboot. Fixes #5433 Signed-off-by: Matthew Heon <matthew.heon@pm.me>
* Ensure that exec sessions inherit supplemental groupsMatthew Heon2020-02-28
| | | | | | | | This corrects a regression from Podman 1.4.x where container exec sessions inherited supplemental groups from the container, iff the exec session did not specify a user. Signed-off-by: Matthew Heon <matthew.heon@pm.me>
* Add network options to podman pod createMatthew Heon2020-02-19
| | | | | | | | | | | | | | | | | Enables most of the network-related functionality from `podman run` in `podman pod create`. Custom CNI networks can be specified, host networking is supported, DNS options can be configured. Also enables host networking in `podman play kube`. Fixes #2808 Fixes #3837 Fixes #4432 Fixes #4718 Fixes #4770 Signed-off-by: Matthew Heon <matthew.heon@pm.me>
* make lint: enable gocriticValentin Rothberg2020-01-13
| | | | | | | `gocritic` is a powerful linter that helps in preventing certain kinds of errors as well as enforcing a coding style. Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
* Correctly export the root file-system changesAdrian Reber2019-12-09
| | | | | | | | | | | | | | | | When doing a checkpoint with --export the root file-system diff was not working as expected. Instead of getting the changes from the running container to the highest storage layer it got the changes from the highest layer to that parent's layer. For a one layer container this could mean that the complete root file-system is part of the checkpoint. With this commit this changes to use the same functionality as 'podman diff'. This actually enables to correctly diff the root file-system including tracking deleted files. This also removes the non-working helper functions from libpod/diff.go. Signed-off-by: Adrian Reber <areber@redhat.com>
* Allow chained network namespace containersMatthew Heon2019-12-03
| | | | | | | | | | | | | The code currently assumes that the container we delegate network namespace to will never further delegate to another container, so when looking up things like /etc/hosts and /etc/resolv.conf we won't pull the correct files from the chained dependency. The changes to resolve this are relatively simple - just need to keep looking until we find a container without NetNsCtr set. Fixes #4626 Signed-off-by: Matthew Heon <matthew.heon@pm.me>
* Merge pull request #4493 from mheon/add_removing_stateOpenShift Merge Robot2019-12-02
|\ | | | | Add ContainerStateRemoving
| * Add ContainerStateRemovingMatthew Heon2019-11-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When Libpod removes a container, there is the possibility that removal will not fully succeed. The most notable problems are storage issues, where the container cannot be removed from c/storage. When this occurs, we were faced with a choice. We can keep the container in the state, appearing in `podman ps` and available for other API operations, but likely unable to do any of them as it's been partially removed. Or we can remove it very early and clean up after it's already gone. We have, until now, used the second approach. The problem that arises is intermittent problems removing storage. We end up removing a container, failing to remove its storage, and ending up with a container permanently stuck in c/storage that we can't remove with the normal Podman CLI, can't use the name of, and generally can't interact with. A notable cause is when Podman is hit by a SIGKILL midway through removal, which can consistently cause `podman rm` to fail to remove storage. We now add a new state for containers that are in the process of being removed, ContainerStateRemoving. We set this at the beginning of the removal process. It notifies Podman that the container cannot be used anymore, but preserves it in the DB until it is fully removed. This will allow Remove to be run on these containers again, which should successfully remove storage if it fails. Fixes #3906 Signed-off-by: Matthew Heon <mheon@redhat.com>
* | Disable checkpointing of containers started with --rmAdrian Reber2019-11-28
| | | | | | | | | | | | | | | | | | | | | | | | | | Trying to checkpoint a container started with --rm works, but it makes no sense as the container, including the checkpoint, will be deleted after writing the checkpoint. This commit inhibits checkpointing containers started with '--rm' unless '--export' is used. If the checkpoint is exported it can easily be restored from the exported checkpoint, even if '--rm' is used. To restore a container from a checkpoint it is even necessary to manually run 'podman rm' if the container is not started with '--rm'. Signed-off-by: Adrian Reber <areber@redhat.com>
* | container-restore: Fix restore with user namespaceRadostin Stoyanov2019-11-17
|/ | | | | | | | | | | | | | | | | | | | | | When restoring a container with user namespace, the user namespace is created by the OCI runtime, and the network namespace is created after the user namespace to ensure correct ownership. In this case PostConfigureNetNS will be set and the value of c.state.NetNS would be nil. Hence, the following error occurs: $ sudo podman run --name cr \ --uidmap 0:1000:500 \ -d docker.io/library/alpine \ /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done' $ sudo podman container checkpoint cr $ sudo podman container restore cr ... panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x13a5e3c] Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
* podman: add support for specifying MACJakub Filak2019-11-06
| | | | | | | | I basically copied and adapted the statements for setting IP. Closes #1136 Signed-off-by: Jakub Filak <jakub.filak@sap.com>
* Vendor in latest containers/buildahUrvashi Mohnani2019-11-01
| | | | | | | | Pull in changes to pkg/secrets/secrets.go that adds the logic to disable fips mode if a pod/container has a label set. Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
* add libpod/configValentin Rothberg2019-10-31
| | | | | | | | | | | | Refactor the `RuntimeConfig` along with related code from libpod into libpod/config. Note that this is a first step of consolidating code into more coherent packages to make the code more maintainable and less prone to regressions on the long runs. Some libpod definitions were moved to `libpod/define` to resolve circular dependencies. Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
* systemd: mask /sys/fs/cgroup/systemd/release_agentGiuseppe Scrivano2019-10-25
| | | | | | | | when running in systemd mode on cgroups v1, make sure the /sys/fs/cgroup/systemd/release_agent is masked otherwise the container is able to modify it and execute scripts on the host. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* When restoring containers, reset cgroup pathMatthew Heon2019-10-10
| | | | | | | | | | | | | | | | | | | | Previously, `podman checkport restore` with exported containers, when told to create a new container based on the exported checkpoint, would create a new container, with a new container ID, but not reset CGroup path - which contained the ID of the original container. If this was done multiple times, the result was two containers with the same cgroup paths. Operations on these containers would this have a chance of crossing over to affect the other one; the most notable was `podman rm` once it was changed to use the --all flag when stopping the container; all processes in the cgroup, including the ones in the other container, would be stopped. Reset cgroups on restore to ensure that the path matches the ID of the container actually being run. Signed-off-by: Matthew Heon <matthew.heon@pm.me>
* Move OCI runtime implementation behind an interfaceMatthew Heon2019-10-10
| | | | | | | | | | | | For future work, we need multiple implementations of the OCI runtime, not just a Conmon-wrapped runtime matching the runc CLI. As part of this, do some refactoring on the interface for exec (move to a struct, not a massive list of arguments). Also, add 'all' support to Kill and Stop (supported by runc and used a bit internally for removing containers). Signed-off-by: Matthew Heon <matthew.heon@pm.me>
* Merge pull request #4086 from mheon/cni_del_on_refreshOpenShift Merge Robot2019-09-25
|\ | | | | Force a CNI Delete on refreshing containers
| * Force a CNI Delete on refreshing containersMatthew Heon2019-09-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CNI expects that a DELETE be run before re-creating container networks. If a reboot occurs quickly enough that containers can't stop and clean up, that DELETE never happens, and Podman currently wipes the old network info and thinks the state has been entirely cleared. Unfortunately, that may not be the case on the CNI side. Some things - like IP address reservations - may not have been cleared. To solve this, manually re-run CNI Delete on refresh. If the container has already been deleted this seems harmless. If not, it should clear lingering state. Fixes: #3759 Signed-off-by: Matthew Heon <matthew.heon@pm.me>
* | rootless: Rearrange setup of rootless containersGabi Beyer2019-09-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to run Podman with VM-based runtimes unprivileged, the network must be set up prior to the container creation. Therefore this commit modifies Podman to run rootless containers by: 1. create a network namespace 2. pass the netns persistent mount path to the slirp4netns to create the tap inferface 3. pass the netns path to the OCI spec, so the runtime can enter the netns Closes #2897 Signed-off-by: Gabi Beyer <gabrielle.n.beyer@intel.com>