summaryrefslogtreecommitdiff
path: root/libpod
Commit message (Collapse)AuthorAge
* select network backend based on configPaul Holzinger2021-11-11
| | | | | | | You can change the network backendend in containers.conf supported values are "cni" and "netavark". Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* Fix RUST_LOG envar for netavarkPaul Holzinger2021-11-11
| | | | | | | | THe rust netlink library is very verbose. It contains way to much debug and trave logs. We can set `RUST_LOG=netavark=<level>` to make sure this log level only applies to netavark and not the libraries. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* netavark IPAM assignmentPaul Holzinger2021-11-11
| | | | | | | | | | | | | | | | Add a new boltdb to handle IPAM assignment. The db structure is the following: Each network has their own bucket with the network name as bucket key. Inside the network bucket there is an ID bucket which maps the container ID (key) to a json array of ip addresses (value). The network bucket also has a bucket for each subnet, the subnet is used as key. Inside the subnet bucket an ip is used as key and the container ID as value. The db should be stored on a tmpfs to ensure we always have a clean state after a reboot. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* netavark network interfacePaul Holzinger2021-11-11
| | | | | | | | | Implement a new network interface for netavark. For now only bridge networking is supported. The interface can create/list/inspect/remove networks. For setup and teardown netavark will be invoked. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* Make networking code reusablePaul Holzinger2021-11-11
| | | | | | | | | | To prevent code duplication when creating new network backends move reusable code into a separate internal package. This allows all network backends to use the same code as long as they implement the new NetUtil interface. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* network reload return error if we cannot reload portsPaul Holzinger2021-11-10
| | | | | | | As rootless we have to reload the port mappings. If it fails we should return an error instead of the warning. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* network reload without ports should not reload portsPaul Holzinger2021-11-10
| | | | | | | | | When run as rootless the podman network reload command tries to reload the rootlessport ports because the childIP could have changed. However if the containers has no ports we should skip this instead of printing a warning. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* Merge pull request #12227 from Luap99/net-setupOpenShift Merge Robot2021-11-09
|\ | | | | Fix rootless networking with userns and ports
| * Fix rootless networking with userns and portsPaul Holzinger2021-11-09
| | | | | | | | | | | | | | | | A rootless container created with a custom userns and forwarded ports did not work. I refactored the network setup to make the setup logic more clear. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* | Merge pull request #12195 from boaz0/closes_11998OpenShift Merge Robot2021-11-09
|\ \ | | | | | | podman-generate-kube - remove empty structs from YAML
| * | podman-generate-kube - remove empty structs from YAMLBoaz Shuster2021-11-07
| | | | | | | | | | | | | | | | | | [NO NEW TESTS NEEDED] Signed-off-by: Boaz Shuster <boaz.shuster.github@gmail.com>
* | | shm_lock: Handle ENOSPC better in AllocateSemaphoreIan Wienand2021-11-09
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When starting a container libpod/runtime_pod_linux.go:NewPod calls libpod/lock/lock.go:AllocateLock ends up in here. If you exceed num_locks, in response to a "podman run ..." you will see: Error: error allocating lock for new container: no space left on device As noted inline, this error is technically true as it is talking about the SHM area, but for anyone who has not dug into the source (i.e. me, before a few hours ago :) your initial thought is going to be that your disk is full. I spent quite a bit of time trying to diagnose what disk, partition, overlay, etc. was filling up before I realised this was actually due to leaking from failing containers. This overrides this case to give a more explicit message that hopefully puts people on the right track to fixing this faster. You will now see: $ ./bin/podman run --rm -it fedora bash Error: error allocating lock for new container: allocation failed; exceeded num_locks (20) [NO NEW TESTS NEEDED] (just changes an existing error message) Signed-off-by: Ian Wienand <iwienand@redhat.com>
* | pod/container create: resolve conflicts of generated namesValentin Rothberg2021-11-08
| | | | | | | | | | | | | | | | | | | | | | Address the TOCTOU when generating random names by having at most 10 attempts to assign a random name when creating a pod or container. [NO TESTS NEEDED] since I do not know a way to force a conflict with randomly generated names in a reasonable time frame. Fixes: #11735 Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
* | Merge pull request #12184 from adrianreber/2021-11-05-stats-dumpOpenShift Merge Robot2021-11-08
|\ \ | |/ |/| Add 'stats-dump' file to exported checkpoint
| * Add 'stats-dump' file to exported checkpointAdrian Reber2021-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was the question about how long it takes to create a checkpoint. CRIU already provides some statistics about how long it takes to create a checkpoint and similar. With this change the file 'stats-dump' is included in the checkpoint archive and the tool checkpointctl can be used to display these statistics: ./checkpointctl show -t /tmp/cp.tar --print-stats Displaying container checkpoint data from /tmp/dump.tar [...] CRIU dump statistics +---------------+-------------+--------------+---------------+---------------+---------------+ | FREEZING TIME | FROZEN TIME | MEMDUMP TIME | MEMWRITE TIME | PAGES SCANNED | PAGES WRITTEN | +---------------+-------------+--------------+---------------+---------------+---------------+ | 105405 us | 1376964 us | 504399 us | 446571 us | 492153 | 88689 | +---------------+-------------+--------------+---------------+---------------+---------------+ Signed-off-by: Adrian Reber <areber@redhat.com>
* | Merge pull request #11890 from Luap99/portsOpenShift Merge Robot2021-11-06
|\ \ | | | | | | libpod: deduplicate ports in db
| * | libpod: deduplicate ports in dbPaul Holzinger2021-10-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The OCICNI port format has one big problem: It does not support ranges. So if a users forwards a range of 1k ports with podman run -p 1001-2000 we have to store each of the thousand ports individually as array element. This bloats the db and makes the JSON encoding and decoding much slower. In many places we already use a better port struct type which supports ranges, e.g. `pkg/specgen` or the new network interface. Because of this we have to do many runtime conversions between the two port formats. If everything uses the new format we can skip the runtime conversions. This commit adds logic to replace all occurrences of the old format with the new one. The database will automatically migrate the ports to new format when the container config is read for the first time after the update. The `ParsePortMapping` function is `pkg/specgen/generate` has been reworked to better work with the new format. The new logic is able to deduplicate the given ports. This is necessary the ensure we store them efficiently in the DB. The new code should also be more performant than the old one. To prove that the code is fast enough I added go benchmarks. Parsing 1 million ports took less than 0.5 seconds on my laptop. Benchmark normalize PortMappings in specgen: Please note that the 1 million ports are actually 20x 50k ranges because we cannot have bigger ranges than 65535 ports. ``` $ go test -bench=. -benchmem ./pkg/specgen/generate/ goos: linux goarch: amd64 pkg: github.com/containers/podman/v3/pkg/specgen/generate cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz BenchmarkParsePortMappingNoPorts-12 480821532 2.230 ns/op 0 B/op 0 allocs/op BenchmarkParsePortMapping1-12 38972 30183 ns/op 131584 B/op 9 allocs/op BenchmarkParsePortMapping100-12 18752 60688 ns/op 141088 B/op 315 allocs/op BenchmarkParsePortMapping1k-12 3104 331719 ns/op 223840 B/op 3018 allocs/op BenchmarkParsePortMapping10k-12 376 3122930 ns/op 1223650 B/op 30027 allocs/op BenchmarkParsePortMapping1m-12 3 390869926 ns/op 124593840 B/op 4000624 allocs/op BenchmarkParsePortMappingReverse100-12 18940 63414 ns/op 141088 B/op 315 allocs/op BenchmarkParsePortMappingReverse1k-12 3015 362500 ns/op 223841 B/op 3018 allocs/op BenchmarkParsePortMappingReverse10k-12 343 3318135 ns/op 1223650 B/op 30027 allocs/op BenchmarkParsePortMappingReverse1m-12 3 403392469 ns/op 124593840 B/op 4000624 allocs/op BenchmarkParsePortMappingRange1-12 37635 28756 ns/op 131584 B/op 9 allocs/op BenchmarkParsePortMappingRange100-12 39604 28935 ns/op 131584 B/op 9 allocs/op BenchmarkParsePortMappingRange1k-12 38384 29921 ns/op 131584 B/op 9 allocs/op BenchmarkParsePortMappingRange10k-12 29479 40381 ns/op 131584 B/op 9 allocs/op BenchmarkParsePortMappingRange1m-12 927 1279369 ns/op 143022 B/op 164 allocs/op PASS ok github.com/containers/podman/v3/pkg/specgen/generate 25.492s ``` Benchmark convert old port format to new one: ``` go test -bench=. -benchmem ./libpod/ goos: linux goarch: amd64 pkg: github.com/containers/podman/v3/libpod cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz Benchmark_ocicniPortsToNetTypesPortsNoPorts-12 663526126 1.663 ns/op 0 B/op 0 allocs/op Benchmark_ocicniPortsToNetTypesPorts1-12 7858082 141.9 ns/op 72 B/op 2 allocs/op Benchmark_ocicniPortsToNetTypesPorts10-12 2065347 571.0 ns/op 536 B/op 4 allocs/op Benchmark_ocicniPortsToNetTypesPorts100-12 138478 8641 ns/op 4216 B/op 4 allocs/op Benchmark_ocicniPortsToNetTypesPorts1k-12 9414 120964 ns/op 41080 B/op 4 allocs/op Benchmark_ocicniPortsToNetTypesPorts10k-12 781 1490526 ns/op 401528 B/op 4 allocs/op Benchmark_ocicniPortsToNetTypesPorts1m-12 4 250579010 ns/op 40001656 B/op 4 allocs/op PASS ok github.com/containers/podman/v3/libpod 11.727s ``` Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* | | Fix swagger definition for the new mac address typePaul Holzinger2021-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new mac address type broke the api docs. While we could successfully generate the swagger file it could not be viewed in a browser. The problem is that the swagger generation create two type definitions with the name `HardwareAddr` and this pointed back to itself. Thus the render process was stucked in an endless loop. To fix this manually rename the new type to MacAddress and overwrite the types to string because the json unmarshaller accepts the mac as string. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* | | rename rootless cni ns to rootless netnsPaul Holzinger2021-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we want to use the rootless cni ns also for netavark we should pick a more generic name. The name is now "rootless network namespace" or short "rootless netns". The rename might cause some issues after the update but when the all containers are restarted or the host is rebooted it should work correctly. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* | | mount full XDG_RUNTIME_DIR in rootless cni nsPaul Holzinger2021-11-05
| | | | | | | | | | | | | | | | | | | | | We should mount the full runtime directory into the namespace instead of just the netns dir. This allows more use cases. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* | | Fix rootless cni netns cleanup logicPaul Holzinger2021-11-05
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | The check if cleanup is needed reads all container and checks if there are running containers with bridge networking. If we do not find any we have to cleanup the ns. However there was a problem with this because the state is empty by default so the running check never worked. Fortunately the was a second check which relies on the CNI files so we still did cleanup anyway. With netavark I noticed that this check is broken because the CNI files were not present. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* | MAC address json unmarshal should allow stringsPaul Holzinger2021-11-03
| | | | | | | | | | | | | | | | | | Create a new mac address type which supports json marshal/unmarshal from and to string. This change is backwards compatible with the previous versions as the unmarshal method still accepts the old byte array or base64 encoded string. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* | Merge pull request #12127 from vrothberg/bz-2014149OpenShift Merge Robot2021-10-29
|\ \ | | | | | | volumes: be more tolerant and fix infinite loop
| * | volumes: be more tolerant and fix infinite loopValentin Rothberg2021-10-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make Podman more tolerant when parsing image volumes during container creation and further fix an infinite loop when checking them. Consider `VOLUME ['/etc/foo', '/etc/bar']` in a Containerfile. While it looks correct to the human eye, the single quotes are wrong and yield the two volumes to be `[/etc/foo,` and `/etc/bar]` in Podman and Docker. When running the container, it'll create a directory `bar]` in `/etc` and a directory `[` in `/` with two subdirectories `etc/foo,`. This behavior is surprising to me but how Docker behaves. We may improve on that in the future. Note that the correct way to syntax for volumes in a Containerfile is `VOLUME /A /B /C` or `VOLUME ["/A", "/B", "/C"]`; single quotes are not supported. This change restores this behavior without breaking container creation or ending up in an infinite loop. BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2014149 Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
* | | Merge pull request #12117 from ↵OpenShift Merge Robot2021-10-28
|\ \ \ | | | | | | | | | | | | | | | | adrianreber/2021-10-27-set-checkpointed-false-after-restore Set Checkpointed state to false after restore
| * | | Set Checkpointed state to false after restoreAdrian Reber2021-10-27
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | A restored container still had the state set to 'Checkpointed: true' which seems wrong if it running again. [NO NEW TESTS NEEDED] Signed-off-by: Adrian Reber <areber@redhat.com>
* | | Merge pull request #12126 from giuseppe/fix-race-warning-messageOpenShift Merge Robot2021-10-28
|\ \ \ | | | | | | | | runtime: change PID existence check
| * | | runtime: change PID existence checkGiuseppe Scrivano2021-10-28
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 6b3b0a17c625bdf71b0ec8b783b288886d8e48d7 introduced a check for the PID file before attempting to move the PID to a new scope. This is still vulnerable to TOCTOU race condition though, since the PID file or the PID can be removed/killed after the check was successful but before it was used. Closes: https://github.com/containers/podman/issues/12065 [NO NEW TESTS NEEDED] it fixes a CI flake Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* / | oci: rename sub-cgroup to runtime instead of supervisorGiuseppe Scrivano2021-10-28
|/ / | | | | | | | | | | | | | | | | | | | | | | | | we are having a hard time figuring out a failure in the CI: https://github.com/containers/podman/issues/11191 Rename the sub-cgroup created here, so we can be certain the error is caused by this part. [NO NEW TESTS NEEDED] we need this for the CI. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* | Merge pull request #12111 from giuseppe/fix-warning-move-pause-processOpenShift Merge Robot2021-10-27
|\ \ | |/ |/| runtime: check for pause pid existence
| * runtime: check for pause pid existenceGiuseppe Scrivano2021-10-27
| | | | | | | | | | | | | | | | | | check that the pause pid exists before trying to move it to a separate scope. Closes: https://github.com/containers/podman/issues/12065 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* | Merge pull request #11956 from vrothberg/pauseOpenShift Merge Robot2021-10-27
|\ \ | |/ |/| remove need to download pause image
| * pod create: remove need for pause imageValentin Rothberg2021-10-26
| | | | | | | | | | | | | | | | | | So far, the infra containers of pods required pulling down an image rendering pods not usable in disconnected environments. Instead, build an image locally which uses local pause binary. Fixes: #10354 Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
| * overlay root fs: create mount on runtime dirValentin Rothberg2021-10-26
| | | | | | | | | | | | | | | | | | | | | | Make sure to create the mounts for containers with an overlay root FS in the runtime dir (e.g., /run/user/1000/...) to guarantee that we can actually overlay mount on the specific path which is not the case for the graph root. [NO NEW TESTS NEEDED] since it is not a user-facing change. Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
* | Merge pull request #12098 from Luap99/slirp-dadOpenShift Merge Robot2021-10-26
|\ \ | | | | | | Slirp4netns with ipv6 set net.ipv6.conf.default.accept_dad=0
| * | Slirp4netns with ipv6 set net.ipv6.conf.default.accept_dad=0Paul Holzinger2021-10-26
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | Duplicate Address Detection slows the ipv6 setup down for 1-2 seconds. Since slirp4netns is run it is own namespace and not directly routed we can skip this to make the ipv6 address immediately available. We change the default to make sure the slirp tap interface gets the correct value assigned so DAD is disabled for it. Also make sure to change this value back to the original after slirp4netns is ready in case users rely on this sysctl. Fixes #11062 Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* | Merge pull request #12067 from hshiina/logs-journal-tailOpenShift Merge Robot2021-10-26
|\ \ | | | | | | Fix a few problems in 'podman logs --tail' with journald driver
| * | Fix a few problems in 'podman logs --tail' with journald driverHironori Shiina2021-10-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following problems regarding `logs --tail` with the journald log driver are fixed: - One more line than a specified value is displayed. - '--tail 0' displays all lines while the other log drivers displays nothing. - Partial lines are not considered. - If the journald events backend is used and a container has exited, nothing is displayed. Integration tests that should have detected the bugs are also fixed. The tests are executed with json-file log driver three times without this fix. Signed-off-by: Hironori Shiina <shiina.hironori@jp.fujitsu.com>
* | | Merge pull request #12088 from adrianreber/2021-10-25-fix-label-ipc-hostOpenShift Merge Robot2021-10-26
|\ \ \ | | | | | | | | Allow 'container restore' with '--ipc host'
| * | | Allow 'container restore' with '--ipc host'Adrian Reber2021-10-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Trying to restore a container that was started with '--ipc host' fails with: Error: error creating container storage: ProcessLabel and Mountlabel must either not be specified or both specified We already fixed this exact same error message for containers started with '--privileged'. The previous fix was to check if the to be restored container is a privileged container (c.config.Privileged). Unfortunately this does not work for containers started with '--ipc host'. This commit changes the check for a privileged container to check if both the ProcessLabel and the MountLabel is actually set and only then re-uses those labels. Signed-off-by: Adrian Reber <areber@redhat.com>
* | | | Document to not set K8S envars for CNIPaul Holzinger2021-10-26
| |_|/ |/| | | | | | | | | | | | | | | | | Setting these environment variables can cause issues with custom CNI plugins, see #12083. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* | | Update vendor github.com/opencontainers/runtime-toolsDaniel J Walsh2021-10-25
|/ / | | | | | | | | | | | | | | | | This will change mount of /dev within container to noexec, making containers slightly more secure. [NO NEW TESTS NEEDED] Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* | [NO NEW TESTS NEEDED] Fix off-by-one index comparision (reported by LGTM)Stefan Weil2021-10-25
| | | | | | | | | | | | | | | | LGTM alert: Off-by-one index comparison against length may lead to out-of-bounds read. Signed-off-by: Stefan Weil <sw@weilnetz.de>
* | Replace 'an user' => 'a user'Stefan Weil2021-10-24
| | | | | | | | Signed-off-by: Stefan Weil <sw@weilnetz.de>
* | Merge pull request #11991 from rhatdan/sizeOpenShift Merge Robot2021-10-22
|\ \ | | | | | | Allow API to specify size and inode quota
| * | Allow API to specify size and inode quotaDaniel J Walsh2021-10-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes: https://github.com/containers/podman/issues/11016 [NO NEW TESTS NEEDED] We have no easy way to tests this in CI/CD systems. Requires quota to be setup on directories to work. Fixes: https://github.com/containers/podman/issues/11016 Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* | | Merge pull request #12021 from rhatdan/kubeOpenShift Merge Robot2021-10-22
|\ \ \ | |_|/ |/| | Generate Kube should not print default structs
| * | Generate Kube should not print default structsDaniel J Walsh2021-10-19
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If podman uses Workdir="/" or the workdir specified in the image, it should not add it to the yaml. If Podman find environment variables in the image, they should not get added to the yaml. If the container or pod do not have changes to SELinux we should not print seLinuxOpt{} If the container or pod do not change any dns options the yaml should not have a dnsOption={} If the container is not privileged it should not have privileged=false in the yaml. Fixes: https://github.com/containers/podman/issues/11995 Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* | Remove infra ID from DB before removing containersMatthew Heon2021-10-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we interrupt pod removal between removing containers and removing the whole pod, the infra ID was still in the DB, and most pod operations would try to retrieve the infra container (and would this fail). Clear the infra ID from the DB just before we remove all containers to prevent this. Fixes #12034 [NO NEW TESTS NEEDED] This is a very narrow race and I have no idea how to repro it. Signed-off-by: Matthew Heon <mheon@redhat.com>
* | Merge pull request #11851 from cdoern/podRmOpenShift Merge Robot2021-10-20
|\ \ | | | | | | Pod Rm Infra Handling Improvements