summaryrefslogtreecommitdiff
path: root/libpod/container_internal.go
Commit message (Collapse)AuthorAge
* Update vendor or containers/common moving pkg/cgroups thereDaniel J Walsh2021-12-07
| | | | | | | [NO NEW TESTS NEEDED] This is just moving pkg/cgroups out so existing tests should be fine. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* rename libpod nettypes fieldsPaul Holzinger2021-11-16
| | | | | | | | | | Some field names are confusing. Change them so that they make more sense to the reader. Since these fields are only in the main branch we can safely rename them without worrying about backwards compatibility. Note we have to change the field names in netavark too. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* Added optional container restore statisticsAdrian Reber2021-11-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds the parameter '--print-stats' to 'podman container restore'. With '--print-stats' Podman will measure how long Podman itself, the OCI runtime and CRIU requires to restore a checkpoint and print out these information. CRIU already creates process restore statistics which are just read in addition to the added measurements. In contrast to just printing out the ID of the restored container, Podman will now print out JSON: # podman container restore --latest --print-stats { "podman_restore_duration": 305871, "container_statistics": [ { "Id": "47b02e1d474b5d5fe917825e91ac653efa757c91e5a81a368d771a78f6b5ed20", "runtime_restore_duration": 140614, "criu_statistics": { "forking_time": 5, "restore_time": 67672, "pages_restored": 14 } } ] } The output contains 'podman_restore_duration' which contains the number of microseconds Podman required to restore the checkpoint. The output also includes 'runtime_restore_duration' which is the time the runtime needed to restore that specific container. Each container also includes 'criu_statistics' which displays the timing information collected by CRIU. Signed-off-by: Adrian Reber <areber@redhat.com>
* libpod: create /etc/mtab safelyGiuseppe Scrivano2021-11-11
| | | | | | | | | | | make sure the /etc/mtab symlink is created inside the rootfs when /etc is a symlink. Closes: https://github.com/containers/podman/issues/12189 [NO NEW TESTS NEEDED] there is already a test case Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* Fix rootless networking with userns and portsPaul Holzinger2021-11-09
| | | | | | | | A rootless container created with a custom userns and forwarded ports did not work. I refactored the network setup to make the setup logic more clear. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* pod/container create: resolve conflicts of generated namesValentin Rothberg2021-11-08
| | | | | | | | | | | Address the TOCTOU when generating random names by having at most 10 attempts to assign a random name when creating a pod or container. [NO TESTS NEEDED] since I do not know a way to force a conflict with randomly generated names in a reasonable time frame. Fixes: #11735 Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
* libpod: deduplicate ports in dbPaul Holzinger2021-10-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The OCICNI port format has one big problem: It does not support ranges. So if a users forwards a range of 1k ports with podman run -p 1001-2000 we have to store each of the thousand ports individually as array element. This bloats the db and makes the JSON encoding and decoding much slower. In many places we already use a better port struct type which supports ranges, e.g. `pkg/specgen` or the new network interface. Because of this we have to do many runtime conversions between the two port formats. If everything uses the new format we can skip the runtime conversions. This commit adds logic to replace all occurrences of the old format with the new one. The database will automatically migrate the ports to new format when the container config is read for the first time after the update. The `ParsePortMapping` function is `pkg/specgen/generate` has been reworked to better work with the new format. The new logic is able to deduplicate the given ports. This is necessary the ensure we store them efficiently in the DB. The new code should also be more performant than the old one. To prove that the code is fast enough I added go benchmarks. Parsing 1 million ports took less than 0.5 seconds on my laptop. Benchmark normalize PortMappings in specgen: Please note that the 1 million ports are actually 20x 50k ranges because we cannot have bigger ranges than 65535 ports. ``` $ go test -bench=. -benchmem ./pkg/specgen/generate/ goos: linux goarch: amd64 pkg: github.com/containers/podman/v3/pkg/specgen/generate cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz BenchmarkParsePortMappingNoPorts-12 480821532 2.230 ns/op 0 B/op 0 allocs/op BenchmarkParsePortMapping1-12 38972 30183 ns/op 131584 B/op 9 allocs/op BenchmarkParsePortMapping100-12 18752 60688 ns/op 141088 B/op 315 allocs/op BenchmarkParsePortMapping1k-12 3104 331719 ns/op 223840 B/op 3018 allocs/op BenchmarkParsePortMapping10k-12 376 3122930 ns/op 1223650 B/op 30027 allocs/op BenchmarkParsePortMapping1m-12 3 390869926 ns/op 124593840 B/op 4000624 allocs/op BenchmarkParsePortMappingReverse100-12 18940 63414 ns/op 141088 B/op 315 allocs/op BenchmarkParsePortMappingReverse1k-12 3015 362500 ns/op 223841 B/op 3018 allocs/op BenchmarkParsePortMappingReverse10k-12 343 3318135 ns/op 1223650 B/op 30027 allocs/op BenchmarkParsePortMappingReverse1m-12 3 403392469 ns/op 124593840 B/op 4000624 allocs/op BenchmarkParsePortMappingRange1-12 37635 28756 ns/op 131584 B/op 9 allocs/op BenchmarkParsePortMappingRange100-12 39604 28935 ns/op 131584 B/op 9 allocs/op BenchmarkParsePortMappingRange1k-12 38384 29921 ns/op 131584 B/op 9 allocs/op BenchmarkParsePortMappingRange10k-12 29479 40381 ns/op 131584 B/op 9 allocs/op BenchmarkParsePortMappingRange1m-12 927 1279369 ns/op 143022 B/op 164 allocs/op PASS ok github.com/containers/podman/v3/pkg/specgen/generate 25.492s ``` Benchmark convert old port format to new one: ``` go test -bench=. -benchmem ./libpod/ goos: linux goarch: amd64 pkg: github.com/containers/podman/v3/libpod cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz Benchmark_ocicniPortsToNetTypesPortsNoPorts-12 663526126 1.663 ns/op 0 B/op 0 allocs/op Benchmark_ocicniPortsToNetTypesPorts1-12 7858082 141.9 ns/op 72 B/op 2 allocs/op Benchmark_ocicniPortsToNetTypesPorts10-12 2065347 571.0 ns/op 536 B/op 4 allocs/op Benchmark_ocicniPortsToNetTypesPorts100-12 138478 8641 ns/op 4216 B/op 4 allocs/op Benchmark_ocicniPortsToNetTypesPorts1k-12 9414 120964 ns/op 41080 B/op 4 allocs/op Benchmark_ocicniPortsToNetTypesPorts10k-12 781 1490526 ns/op 401528 B/op 4 allocs/op Benchmark_ocicniPortsToNetTypesPorts1m-12 4 250579010 ns/op 40001656 B/op 4 allocs/op PASS ok github.com/containers/podman/v3/libpod 11.727s ``` Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* Merge pull request #11956 from vrothberg/pauseOpenShift Merge Robot2021-10-27
|\ | | | | remove need to download pause image
| * overlay root fs: create mount on runtime dirValentin Rothberg2021-10-26
| | | | | | | | | | | | | | | | | | | | | | Make sure to create the mounts for containers with an overlay root FS in the runtime dir (e.g., /run/user/1000/...) to guarantee that we can actually overlay mount on the specific path which is not the case for the graph root. [NO NEW TESTS NEEDED] since it is not a user-facing change. Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
* | Allow 'container restore' with '--ipc host'Adrian Reber2021-10-26
|/ | | | | | | | | | | | | | | | | | Trying to restore a container that was started with '--ipc host' fails with: Error: error creating container storage: ProcessLabel and Mountlabel must either not be specified or both specified We already fixed this exact same error message for containers started with '--privileged'. The previous fix was to check if the to be restored container is a privileged container (c.config.Privileged). Unfortunately this does not work for containers started with '--ipc host'. This commit changes the check for a privileged container to check if both the ProcessLabel and the MountLabel is actually set and only then re-uses those labels. Signed-off-by: Adrian Reber <areber@redhat.com>
* Merge pull request #11991 from rhatdan/sizeOpenShift Merge Robot2021-10-22
|\ | | | | Allow API to specify size and inode quota
| * Allow API to specify size and inode quotaDaniel J Walsh2021-10-18
| | | | | | | | | | | | | | | | | | | | | | Fixes: https://github.com/containers/podman/issues/11016 [NO NEW TESTS NEEDED] We have no easy way to tests this in CI/CD systems. Requires quota to be setup on directories to work. Fixes: https://github.com/containers/podman/issues/11016 Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* | Merge pull request #11851 from cdoern/podRmOpenShift Merge Robot2021-10-20
|\ \ | | | | | | Pod Rm Infra Handling Improvements
| * | Pod Rm Infra Improvementscdoern2021-10-18
| | | | | | | | | | | | | | | | | | | | | | | | Made changes so that if the pod contains all exited containers and only infra is running, remove the pod. resolves #11713 Signed-off-by: cdoern <cdoern@redhat.com>
* | | libpod: change mountpoint ownership c.Root when using overlay on top of ↵Aditya Rajan2021-10-19
| |/ |/| | | | | | | | | | | | | | | | | external rootfs Allow chainging ownership of mountpoint created on top external overlay rootfs to support use-cases when custom --uidmap and --gidmap are specified. Signed-off-by: Aditya Rajan <arajan@redhat.com>
* | rootfs-overlay: fix overlaybase path for cleanupsAditya Rajan2021-10-18
|/ | | | | | | | Following commit ensures not dandling mounts are left behind when we are creating an overlay on top of external rootfs. Co-authored-by: Valentin Rothberg <rothberg@redhat.com> Signed-off-by: Aditya Rajan <arajan@redhat.com>
* libpod: do not call (*container).Spec()Valentin Rothberg2021-09-29
| | | | | | | | | | | | Access the container's spec field directly inside of libpod instead of calling Spec() which in turn creates expensive JSON deep copies. Accessing the field directly drops memory consumption of a simple podman run --rm busybox true from ~700kB to ~600kB. [NO TESTS NEEDED] Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
* libpod: do not call (*container).Config()Valentin Rothberg2021-09-28
| | | | | | | | | | | | Access the container's config field directly inside of libpod instead of calling `Config()` which in turn creates expensive JSON deep copies. Accessing the field directly drops memory consumption of a simple `podman run --rm busybox true` from 1245kB to 410kB. [NO TESTS NEEDED] Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
* standardize logrus messages to upper caseDaniel J Walsh2021-09-22
| | | | | | | | Remove ERROR: Error stutter from logrus messages also. [ NO TESTS NEEDED] This is just code cleanup. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* Wire network interface into libpodPaul Holzinger2021-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make use of the new network interface in libpod. This commit contains several breaking changes: - podman network create only outputs the new network name and not file path. - podman network ls shows the network driver instead of the cni version and plugins. - podman network inspect outputs the new network struct and not the cni conflist. - The bindings and libpod api endpoints have been changed to use the new network structure. The container network status is stored in a new field in the state. The status should be received with the new `c.getNetworkStatus`. This will migrate the old status to the new format. Therefore old containers should contine to work correctly in all cases even when network connect/ disconnect is used. New features: - podman network reload keeps the ip and mac for more than one network. - podman container restore keeps the ip and mac for more than one network. - The network create compat endpoint can now use more than one ipam config. The man pages and the swagger doc are updated to reflect the latest changes. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* Merge pull request #11170 from flouthoc/support-rootfs-overlayOpenShift Merge Robot2021-09-14
|\ | | | | rootfs: Add support for rootfs-overlay.
| * rootfs: Add support for rootfs-overlay and bump to buildah v1.22.1-0.202108flouthoc2021-09-14
| | | | | | | | | | | | | | | | | | | | Allows users to specify a readonly rootfs with :O, in exchange podman will create a writable overlay. bump builah to v1.22.1-0.20210823173221-da2b428c56ce [NO TESTS NEEDED] Signed-off-by: flouthoc <flouthoc.git@gmail.com>
* | fix restart always with rootlessportPaul Holzinger2021-09-13
|/ | | | | | | | When a container is automatically restarted due its restart policy and the container uses rootless cni networking with ports forwarded we have to start a new rootlessport process since it exits with conmon. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* Add Checkpointed bool to InspectMatthew Heon2021-09-07
| | | | | | | | When inspecting a container, we now report whether the container was stopped by a `podman checkpoint` operation via a new bool in the State portion of inspected, `Checkpointed`. Signed-off-by: Matthew Heon <mheon@redhat.com>
* container: resolve workdir after all the mounts happen.flouthoc2021-08-30
| | | | | | | | There are use-cases where users would want to use overlay-mounts as workdir. For such cases workdir should be resolved after all the mounts are completed during the container init process. Signed-off-by: Aditya Rajan <arajan@redhat.com>
* InfraContainer Reworkcdoern2021-08-26
| | | | | | | | | | InfraContainer should go through the same creation process as regular containers. This change was from the cmd level down, involving new container CLI opts and specgen creating functions. What now happens is that both container and pod cli options are populated in cmd and used to create a podSpecgen and a containerSpecgen. The process then goes as follows FillOutSpecGen (infra) -> MapSpec (podOpts -> infraOpts) -> PodCreate -> MakePod -> createPodOptions -> NewPod -> CompleteSpec (infra) -> MakeContainer -> NewContainer -> newContainer -> AddInfra (to pod state) Signed-off-by: cdoern <cdoern@redhat.com>
* Add support for pod inside of user namespace.Daniel J Walsh2021-08-09
| | | | | | | | | | | | | Add the --userns flag to podman pod create and keep track of the userns setting that pod was created with so that all containers created within the pod will inherit that userns setting. Specifically we need to be able to launch a pod with --userns=keep-id Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
* Fix handling of user specified container labelsDaniel J Walsh2021-08-02
| | | | | | | | | | | Currently we override the SELinux labels specified by the user if the container is runing a kata container or systemd container. This PR fixes to use the label specified by the user. Fixes: https://github.com/containers/podman/issues/11100 Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* Drop podman create --storage-opt container flagDaniel J Walsh2021-07-20
| | | | | | | | | | | | | The global flag will work in either location, and this flag just breaks users expectations, and is basically a noop. Also fix global storage-opt so that podman-remote can use it. [NO TESTS NEEDED] Since it would be difficult to test in ci/cd. Fixes: https://github.com/containers/podman/issues/10264 Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* [NO TESTS NEEDED] Create /etc/mtab with the correct ownershipUrvashi Mohnani2021-06-23
| | | | | | | | | | Create the /etc and /etc/mtab directories with the correct ownership based on what the UID and GID is for the container. This was causing issue when starting the infra container with userns as the /etc directory wasn't being created with the correct ownership. Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
* Fix permissions on initially created named volumesDaniel J Walsh2021-06-14
| | | | | | | | Permission of volume should match the directory it is being mounted on. Fixes: https://github.com/containers/podman/issues/10188 Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* Merge pull request #10635 from adrianreber/2021-06-04-privilegedOpenShift Merge Robot2021-06-12
|\ | | | | Fix restoring of privileged containers
| * Fix restoring of privileged containersAdrian Reber2021-06-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Checkpointed containers started with --privileged fail during restore with: Error: error creating container storage: ProcessLabel and Mountlabel must either not be specified or both specified This commit fixes it by not setting the labels when restoring a privileged container. [NO TESTS NEEDED] Signed-off-by: Adrian Reber <areber@redhat.com>
* | Fix pre-checkpointingAdrian Reber2021-06-10
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unfortunately --pre-checkpointing never worked as intended and recent changes to runc have shown that it is broken. To create a pre-checkpoint CRIU expects the paths between the pre-checkpoints to be a relative path. If having a previous checkpoint it needs the be referenced like this: --prev-images-dir ../parent Unfortunately Podman was giving runc (and CRIU) an absolute path. Unfortunately, again, until March 2021 CRIU silently ignored if the path was not relative and switch back to normal checkpointing. This has been now fixed in CRIU and runc and running pre-checkpoint with the latest runc fails, because runc already sees that the path is absolute and returns an error. This commit fixes this by giving runc a relative path. This commit also fixes a second pre-checkpointing error which was just recently introduced. So summarizing: pre-checkpointing never worked correctly because CRIU ignored wrong parameters and recent changes broke it even more. Now both errors should be fixed. [NO TESTS NEEDED] Signed-off-by: Adrian Reber <areber@redhat.com> Signed-off-by: Adrian Reber <adrian@lisas.de>
* Merge pull request #10366 from ashley-cui/secretoptionsOpenShift Merge Robot2021-05-17
|\ | | | | Support uid,gid,mode options for secrets
| * Support uid,gid,mode options for secretsAshley Cui2021-05-17
| | | | | | | | | | | | | | Support UID, GID, Mode options for mount type secrets. Also, change default secret permissions to 444 so all users can read secret. Signed-off-by: Ashley Cui <acui@redhat.com>
* | Create the /etc/mtab file if does not existsDaniel J Walsh2021-05-15
|/ | | | | | | | | | | We should create the /etc/mtab->/proc/mountinfo link so that mount command will work within the container. Docker does this by default. Fixes: https://github.com/containers/podman/issues/10263 Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* fix restart always with slirp4netnsPaul Holzinger2021-05-11
| | | | | | | | | | | | When a container is automatically restarted due its restart policy and the container used the slirp4netns netmode, the slirp4netns process died. This caused the container to lose network connectivity. To fix this we have to start a new slirp4netns process. Fixes #8047 Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
* Merge pull request #10220 from giuseppe/rm-volatileOpenShift Merge Robot2021-05-05
|\ | | | | podman: set volatile storage flag for --rm containers
| * podman: set volatile storage flag for --rm containersGiuseppe Scrivano2021-05-05
| | | | | | | | | | | | | | | | | | | | | | | | | | volatile containers are a storage optimization that disables *sync() syscalls for the container rootfs. If a container is created with --rm, then automatically set the volatile storage flag as anyway the container won't persist after a reboot or machine crash. [NO TESTS NEEDED] Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* | migrate Podman to containers/common/libimageValentin Rothberg2021-05-05
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Migrate the Podman code base over to `common/libimage` which replaces `libpod/image` and a lot of glue code entirely. Note that I tried to leave bread crumbs for changed tests. Miscellaneous changes: * Some errors yield different messages which required to alter some tests. * I fixed some pre-existing issues in the code. Others were marked as `//TODO`s to prevent the PR from exploding. * The `NamesHistory` of an image is returned as is from the storage. Previously, we did some filtering which I think is undesirable. Instead we should return the data as stored in the storage. * Touched handlers use the ABI interfaces where possible. * Local image resolution: previously Podman would match "foo" on "myfoo". This behaviour has been changed and Podman will now only match on repository boundaries such that "foo" would match "my/foo" but not "myfoo". I consider the old behaviour to be a bug, at the very least an exotic corner case. * Futhermore, "foo:none" does *not* resolve to a local image "foo" without tag anymore. It's a hill I am (almost) willing to die on. * `image prune` prints the IDs of pruned images. Previously, in some cases, the names were printed instead. The API clearly states ID, so we should stick to it. * Compat endpoint image removal with _force_ deletes the entire not only the specified tag. Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
* Fixes from make codespellDaniel J Walsh2021-04-21
| | | | Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* use AttachSocketPath when removing conmon filesPeter Hunt2021-04-16
| | | | Signed-off-by: Peter Hunt <pehunt@redhat.com>
* Add rootless support for cni and --uidmapPaul Holzinger2021-04-01
| | | | | | This is supported with the new rootless cni logic. Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
* Do not leak libpod package into the remote clientPaul Holzinger2021-03-15
| | | | | | | | | | | | | | | | | | Some packages used by the remote client imported the libpod package. This is not wanted because it adds unnecessary bloat to the client and also causes problems with platform specific code(linux only), see #9710. The solution is to move the used functions/variables into extra packages which do not import libpod. This change shrinks the remote client size more than 6MB compared to the current master. [NO TESTS NEEDED] I have no idea how to test this properly but with #9710 the cross compile should fail. Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
* turn hidden --trace into a NOPValentin Rothberg2021-03-08
| | | | | | | | | | The --trace has helped in early stages analyze Podman code. However, it's contributing to dependency and binary bloat. The standard go tooling can also help in profiling, so let's turn `--trace` into a NOP. [NO TESTS NEEDED] Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
* Merge pull request #9624 from mheon/fix_9615OpenShift Merge Robot2021-03-05
|\ | | | | [NO TESTS NEEDED] Do not return from c.stop() before re-locking
| * Do not return from c.stop() before re-lockingMatthew Heon2021-03-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Unlocking an already unlocked lock is a panic. As such, we have to make sure that the deferred c.lock.Unlock() in c.StopWithTimeout() always runs on a locked container. There was a case in c.stop() where we could return an error after we unlock the container to stop it, but before we re-lock it - thus allowing for a double-unlock to occur. Fix the error return to not happen until after the lock has been re-acquired. Fixes #9615 Signed-off-by: Matthew Heon <mheon@redhat.com>
* | podman cp: support copying on tmpfs mountsValentin Rothberg2021-03-04
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Traditionally, the path resolution for containers has been resolved on the *host*; relative to the container's mount point or relative to specified bind mounts or volumes. While this works nicely for non-running containers, it poses a problem for running ones. In that case, certain kinds of mounts (e.g., tmpfs) will not resolve correctly. A tmpfs is held in memory and hence cannot be resolved relatively to the container's mount point. A copy operation will succeed but the data will not show up inside the container. To support these kinds of mounts, we need to join the *running* container's mount namespace (and PID namespace) when copying. Note that this change implies moving the copy and stat logic into `libpod` since we need to keep the container locked to avoid race conditions. The immediate benefit is that all logic is now inside `libpod`; the code isn't scattered anymore. Further note that Docker does not support copying to tmpfs mounts. Tests have been extended to cover *both* path resolutions for running and created containers. New tests have been added to exercise the tmpfs-mount case. For the record: Some tests could be improved by using `start -a` instead of a start-exec sequence. Unfortunately, `start -a` is flaky in the CI which forced me to use the more expensive start-exec option. Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
* Use functions and defines from checkpointctlAdrian Reber2021-03-02
| | | | | | | | No functional changes. [NO TESTS NEEDED] - only moving code around Signed-off-by: Adrian Reber <areber@redhat.com>