summaryrefslogtreecommitdiff
path: root/libpod/container_internal_linux.go
Commit message (Collapse)AuthorAge
* Use host's resolv.conf if no network namespace enabledColin Walters2018-11-27
| | | | | | | | | | | | | | | | | | | | | | My host system runs Fedora Silverblue 29 and I have NetworkManager's `dns=dnsmasq` setting enabled, so my `/etc/resolv.conf` only has `127.0.0.1`. I also run my development podman containers with `--net=host` for various reasons. If we have a host network namespace, there's no reason not to just use the host's nameserver configuration either. This fixes e.g. accessing content on a VPN, and is also faster since the container is using cached DNS. I know this doesn't solve the bigger picture issue of localhost-DNS conflicting with bridged networking, but that's far more involved, probably requiring a DNS proxy in the container. This patch makes my workflow a lot nicer and was easy to write. Signed-off-by: Colin Walters <walters@verbum.org>
* Merge pull request #1850 from vrothberg/mount-propagationOpenShift Merge Robot2018-11-27
|\ | | | | set root propagation based on volume properties
| * set root propagation based on volume propertiesValentin Rothberg2018-11-26
| | | | | | | | | | | | | | | | | | | | | | | | Set the root propagation based on the properties of volumes and default mounts. To remain compatibility, follow the semantics of Docker. If a volume is shared, keep the root propagation shared which works for slave and private volumes too. For slave volumes, it can either be shared or rshared. Do not change the root propagation for private volumes and stick with the default. Fixes: #1834 Signed-off-by: Valentin Rothberg <vrothberg@suse.com>
* | Merge pull request #1734 from rhatdan/networkOpenShift Merge Robot2018-11-27
|\ \ | |/ |/| libpod should know if the network is disabled
| * libpod should know if the network is disabledDaniel J Walsh2018-11-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | /etc/resolv.conf and /etc/hosts should not be created and mounted when the network is disabled. We should not be calling the network setup and cleanup functions when it is disabled either. In doing this patch, I found that all of the bind mounts were particular to Linux along with the generate functions, so I moved them to container_internal_linux.go Since we are checking if we are using a network namespace, we need to check after the network namespaces has been created in the spec. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* | Added option to keep containers running after checkpointingAdrian Reber2018-11-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CRIU supports to leave processes running after checkpointing: -R|--leave-running leave tasks in running state after checkpoint runc also support to leave containers running after checkpointing: --leave-running leave the process running after checkpointing With this commit the support to leave a container running after checkpointing is brought to Podman: --leave-running, -R leave the container running after writing checkpoint to disk Now it is possible to checkpoint a container at some point in time without stopping the container. This can be used to rollback the container to an early state: $ podman run --tmpfs /tmp --name podman-criu-test -d docker://docker.io/yovfiatbeb/podman-criu-test $ curl 10.88.64.253:8080/examples/servlets/servlet/HelloWorldExample 3 $ podman container checkpoint -R -l $ curl 10.88.64.253:8080/examples/servlets/servlet/HelloWorldExample 4 $ curl 10.88.64.253:8080/examples/servlets/servlet/HelloWorldExample 5 $ podman stop -l $ podman container restore -l $ curl 10.88.64.253:8080/examples/servlets/servlet/HelloWorldExample 4 So after checkpointing the container kept running and was stopped after some time. Restoring this container will restore the state right at the checkpoint. Signed-off-by: Adrian Reber <areber@redhat.com>
* | Use a struct to pass options to Checkpoint()Adrian Reber2018-11-20
|/ | | | | | | | | For upcoming changes to the Checkpoint() functions this commit switches checkpoint options from a boolean to a struct, so that additional options can be passed easily to Checkpoint() without changing the function parameters all the time. Signed-off-by: Adrian Reber <areber@redhat.com>
* Accurately update state if prepare() partially failsMatthew Heon2018-11-08
| | | | | | | | | We are seeing some issues where, when part of prepare() fails (originally noticed due to a bad static IP), the other half does not successfully clean up, and the state can be left in a bad place (not knowing about an active SHM mount for example). Signed-off-by: Matthew Heon <mheon@redhat.com>
* Merge pull request #1771 from baude/prepareOpenShift Merge Robot2018-11-07
|\ | | | | move defer'd function declaration ahead of prepare error return
| * move defer'd function declaration ahead of prepare error returnbaude2018-11-07
| | | | | | | | Signed-off-by: baude <bbaude@redhat.com>
* | rootless: mount /sys/fs/cgroup/systemd from the hostGiuseppe Scrivano2018-11-07
| | | | | | | | | | | | systemd requires /sys/fs/cgroup/systemd to be writeable. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* | rootless: don't bind mount /sys/fs/cgroup/systemd in systemd modeGiuseppe Scrivano2018-11-07
|/ | | | | | | it is not writeable by non-root users so there is no point in having access to it from a container. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* get user and group information using securejoin and runc's user librarybaude2018-10-29
| | | | | | | | | | | for the purposes of performance and security, we use securejoin to contstruct the root fs's path so that symlinks are what they appear to be and no pointing to something naughty. then instead of chrooting to parse /etc/passwd|/etc/group, we now use the runc user/group methods which saves us quite a bit of performance. Signed-off-by: baude <bbaude@redhat.com>
* Merge pull request #1699 from baude/rundOpenShift Merge Robot2018-10-25
|\ | | | | run performance improvements
| * run prepare in parallelbaude2018-10-25
| | | | | | | | | | | | | | run prepare() -- which consists of creating a network namespace and mounting the container image is now run in parallel. This saves 25-40ms. Signed-off-by: baude <bbaude@redhat.com>
* | Increase security and performance when looking up groupsbaude2018-10-25
|/ | | | | | | | | | We implement the securejoin method to make sure the paths to /etc/passwd and /etc/group are not symlinks to something naughty or outside the container image. And then instead of actually chrooting, we use the runc functions to get information about a user. The net result is increased security and a a performance gain from 41ms to 100us. Signed-off-by: baude <bbaude@redhat.com>
* Use the CRIU version check in checkpoint/restoreAdrian Reber2018-10-23
| | | | | | | | The newly introduced CRIU version check is now used to make sure checkpointing and restoring is only used if the CRIU version is new enough. Signed-off-by: Adrian Reber <areber@redhat.com>
* Fix CGroup paths used for systemd CGroup mountMatthew Heon2018-10-17
| | | | | | | We already have functions for retrieving the container's CGroup path, so use them instead of manually generating a path. Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
* Mount proper cgroup for systemd to manage inside of the container.Daniel J Walsh2018-10-15
| | | | | | | | | | | | We are still requiring oci-systemd-hook to be installed in order to run systemd within a container. This patch properly mounts /sys/fs/cgroup/systemd/libpod_parent/libpod-UUID on /sys/fs/cgroup/systemd inside of container. Since we need the UUID of the container, we needed to move Systemd to be a config option of the container. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* Add support to checkpoint/restore containersAdrian Reber2018-10-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | runc uses CRIU to support checkpoint and restore of containers. This brings an initial checkpoint/restore implementation to podman. None of the additional runc flags are yet supported and container migration optimization (pre-copy/post-copy) is also left for the future. The current status is that it is possible to checkpoint and restore a container. I am testing on RHEL-7.x and as the combination of RHEL-7 and CRIU has seccomp troubles I have to create the container without seccomp. With the following steps I am able to checkpoint and restore a container: # podman run --security-opt="seccomp=unconfined" -d registry.fedoraproject.org/f27/httpd # curl -I 10.22.0.78:8080 HTTP/1.1 403 Forbidden # <-- this is actually a good answer # podman container checkpoint <container> # curl -I 10.22.0.78:8080 curl: (7) Failed connect to 10.22.0.78:8080; No route to host # podman container restore <container> # curl -I 10.22.0.78:8080 HTTP/1.1 403 Forbidden I am using CRIU, runc and conmon from git. All required changes for checkpoint/restore support in podman have been merged in the corresponding projects. To have the same IP address in the restored container as before checkpointing, CNI is told which IP address to use. If the saved network configuration cannot be found during restore, the container is restored with a new IP address. For CRIU to restore established TCP connections the IP address of the network namespace used for restore needs to be the same. For TCP connections in the listening state the IP address can change. During restore only one network interface with one IP address is handled correctly. Support to restore containers with more advanced network configuration will be implemented later. v2: * comment typo * print debug messages during cleanup of restore files * use createContainer() instead of createOCIContainer() * introduce helper CheckpointPath() * do not try to restore a container that is paused * use existing helper functions for cleanup * restructure code flow for better readability * do not try to restore if checkpoint/inventory.img is missing * git add checkpoint.go restore.go v3: * move checkpoint/restore under 'podman container' v4: * incorporated changes from latest reviews Signed-off-by: Adrian Reber <areber@redhat.com>
* Disable problematic SELinux code causing runc issuesMatthew Heon2018-09-25
| | | | | | | Signed-off-by: Matthew Heon <matthew.heon@gmail.com> Closes: #1541 Approved by: baude
* Add --mount option for `create` & `run` commandDaniel J Walsh2018-09-21
| | | | | | | | Signed-off-by: Kunal Kushwaha <kushwaha_kunal_v7@lab.ntt.co.jp> Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> Closes: #1524 Approved by: mheon
* Add new field to libpod to indicate whether or not to use labellingDaniel J Walsh2018-09-20
| | | | | | | | | | | | | | | Also update some missing fields libpod.conf obtions in man pages. Fix sort order of security options and add a note about disabling labeling. When a process requests a new label. libpod needs to reserve all labels to make sure that their are no conflicts. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> Closes: #1406 Approved by: mheon
* Bind Mounts should be mounted read-only when in read-only modeDaniel J Walsh2018-09-20
| | | | | | | | | | We don't want to allow users to write to /etc/resolv.conf or /etc/hosts if in read only mode. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> Closes: #1510 Approved by: TomSweeneyRedHat
* Fix Mount PropagationGiuseppe Scrivano2018-08-27
| | | | | | | | | Default mount propagation inside of containes should be private Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> Closes: #1305 Approved by: mheon
* Fixing network ns segfaulthaircommander2018-08-23
| | | | | | | | | As well as small style corrections, update pod_top_test to use CreatePod, and move handling of adding a container to the pod's namespace from container_internal_linux to libpod/option. Signed-off-by: haircommander <pehunt@redhat.com> Closes: #1187 Approved by: mheon
* Change pause container to infra containerhaircommander2018-08-23
| | | | | | | Signed-off-by: haircommander <pehunt@redhat.com> Closes: #1187 Approved by: mheon
* Added option to share kernel namespaces in libpod and podmanhaircommander2018-08-23
| | | | | | | | | A pause container is added to the pod if the user opts in. The default pause image and command can be overridden. Pause containers are ignored in ps unless the -a option is present. Pod inspect and pod ps show shared namespaces and pause container. A pause container can't be removed with podman rm, and a pod can be removed if it only has a pause container. Signed-off-by: haircommander <pehunt@redhat.com> Closes: #1187 Approved by: mheon
* podman: fix --uts=hostGiuseppe Scrivano2018-08-17
| | | | | | | | | | | | | | | Do not set any hostname value in the OCI configuration when --uts=host is used and the user didn't specify any value. This prevents an error from the OCI runtime as it cannot set the hostname without a new UTS namespace. Differently, the HOSTNAME environment variable is always set. When --uts=host is used, HOSTNAME gets the value from the host. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Closes: #1280 Approved by: baude
* switch projectatomic to containersDaniel J Walsh2018-08-16
| | | | | | | | | | Need to get some small changes into libpod to pull back into buildah to complete buildah transition. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> Closes: #1270 Approved by: mheon
* We need to sort mounts so that one mount does not over mount another.Daniel J Walsh2018-08-10
| | | | | | | | | | | | | | | | | Currently we add mounts from images, volumes and internal. We can accidently over mount an existing mount. This patch sorts the mounts to make sure a parent directory is always mounted before its content. Had to change the default propagation on image volume mounts from shared to private to stop mount points from leaking out of the container. Also switched from using some docker/docker/pkg to container/storage/pkg to remove some dependencies on Docker. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> Closes: #1243 Approved by: mheon
* Support multiple networksbaude2018-07-12
| | | | | | | | | | | | | This is a refresh of Dan William's PR #974 with a rebase and proper vendoring of ocicni and containernetworking/cni. It adds the ability to define multiple networks as so: podman run --network=net1,net2,foobar ... Signed-off-by: baude <bbaude@redhat.com> Closes: #1082 Approved by: baude
* Add --volumes-from flag to podman run and createumohnani82018-07-09
| | | | | | | | | | podman now supports --volumes-from flag, which allows users to add all the volumes an existing container has to a new one. Signed-off-by: umohnani8 <umohnani@redhat.com> Closes: #931 Approved by: mheon
* Remove per-container CGroup parentsMatthew Heon2018-07-06
| | | | | | | | | | | | | | | | | | | Originally, it seemed like a good idea to place Conmon and the container it managed under a shared CGroup, so we could manage the two together. It's become increasingly clear that this is a potential performance sore point, gains us little practical benefit in managing Conmon, and adds extra steps to container cleanup that interfere with Conmon postrun hooks. Revert back to a shared CGroup for conmon processes under the CGroup parent. This will retain per-pod conmon CGroups as well if the pod is set to create a CGroup and act as CGroup parent for its containers. Signed-off-by: Matthew Heon <matthew.heon@gmail.com> Closes: #1051 Approved by: umohnani8
* more changes to compile darwinbaude2018-07-05
| | | | | | | | | | | | | | | | this should represent the last major changes to get darwin to **compile**. again, the purpose here is to get darwin to compile so that we can eventually implement a ci task that would protect against regressions for darwin compilation. i have left the manual darwin compilation largely static still and in fact now only interject (manually) two build tags to assist with the build. trevor king has great ideas on how to make this better and i will defer final implementation of those to him. Signed-off-by: baude <bbaude@redhat.com> Closes: #1047 Approved by: rhatdan
* changes to allow for darwin compilationbaude2018-06-29
Signed-off-by: baude <bbaude@redhat.com> Closes: #1015 Approved by: baude