aboutsummaryrefslogtreecommitdiff
path: root/pkg/rootlessport/rootlessport_linux.go
Commit message (Collapse)AuthorAge
* go fmt: use go 1.18 conditional-build syntaxValentin Rothberg2022-03-18
| | | | Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
* use libnetwork from c/commonPaul Holzinger2022-01-12
| | | | | | | | The libpod/network packages were moved to c/common so that buildah can use it as well. To prevent duplication use it in podman as well and remove it from here. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* libpod: deduplicate ports in dbPaul Holzinger2021-10-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The OCICNI port format has one big problem: It does not support ranges. So if a users forwards a range of 1k ports with podman run -p 1001-2000 we have to store each of the thousand ports individually as array element. This bloats the db and makes the JSON encoding and decoding much slower. In many places we already use a better port struct type which supports ranges, e.g. `pkg/specgen` or the new network interface. Because of this we have to do many runtime conversions between the two port formats. If everything uses the new format we can skip the runtime conversions. This commit adds logic to replace all occurrences of the old format with the new one. The database will automatically migrate the ports to new format when the container config is read for the first time after the update. The `ParsePortMapping` function is `pkg/specgen/generate` has been reworked to better work with the new format. The new logic is able to deduplicate the given ports. This is necessary the ensure we store them efficiently in the DB. The new code should also be more performant than the old one. To prove that the code is fast enough I added go benchmarks. Parsing 1 million ports took less than 0.5 seconds on my laptop. Benchmark normalize PortMappings in specgen: Please note that the 1 million ports are actually 20x 50k ranges because we cannot have bigger ranges than 65535 ports. ``` $ go test -bench=. -benchmem ./pkg/specgen/generate/ goos: linux goarch: amd64 pkg: github.com/containers/podman/v3/pkg/specgen/generate cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz BenchmarkParsePortMappingNoPorts-12 480821532 2.230 ns/op 0 B/op 0 allocs/op BenchmarkParsePortMapping1-12 38972 30183 ns/op 131584 B/op 9 allocs/op BenchmarkParsePortMapping100-12 18752 60688 ns/op 141088 B/op 315 allocs/op BenchmarkParsePortMapping1k-12 3104 331719 ns/op 223840 B/op 3018 allocs/op BenchmarkParsePortMapping10k-12 376 3122930 ns/op 1223650 B/op 30027 allocs/op BenchmarkParsePortMapping1m-12 3 390869926 ns/op 124593840 B/op 4000624 allocs/op BenchmarkParsePortMappingReverse100-12 18940 63414 ns/op 141088 B/op 315 allocs/op BenchmarkParsePortMappingReverse1k-12 3015 362500 ns/op 223841 B/op 3018 allocs/op BenchmarkParsePortMappingReverse10k-12 343 3318135 ns/op 1223650 B/op 30027 allocs/op BenchmarkParsePortMappingReverse1m-12 3 403392469 ns/op 124593840 B/op 4000624 allocs/op BenchmarkParsePortMappingRange1-12 37635 28756 ns/op 131584 B/op 9 allocs/op BenchmarkParsePortMappingRange100-12 39604 28935 ns/op 131584 B/op 9 allocs/op BenchmarkParsePortMappingRange1k-12 38384 29921 ns/op 131584 B/op 9 allocs/op BenchmarkParsePortMappingRange10k-12 29479 40381 ns/op 131584 B/op 9 allocs/op BenchmarkParsePortMappingRange1m-12 927 1279369 ns/op 143022 B/op 164 allocs/op PASS ok github.com/containers/podman/v3/pkg/specgen/generate 25.492s ``` Benchmark convert old port format to new one: ``` go test -bench=. -benchmem ./libpod/ goos: linux goarch: amd64 pkg: github.com/containers/podman/v3/libpod cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz Benchmark_ocicniPortsToNetTypesPortsNoPorts-12 663526126 1.663 ns/op 0 B/op 0 allocs/op Benchmark_ocicniPortsToNetTypesPorts1-12 7858082 141.9 ns/op 72 B/op 2 allocs/op Benchmark_ocicniPortsToNetTypesPorts10-12 2065347 571.0 ns/op 536 B/op 4 allocs/op Benchmark_ocicniPortsToNetTypesPorts100-12 138478 8641 ns/op 4216 B/op 4 allocs/op Benchmark_ocicniPortsToNetTypesPorts1k-12 9414 120964 ns/op 41080 B/op 4 allocs/op Benchmark_ocicniPortsToNetTypesPorts10k-12 781 1490526 ns/op 401528 B/op 4 allocs/op Benchmark_ocicniPortsToNetTypesPorts1m-12 4 250579010 ns/op 40001656 B/op 4 allocs/op PASS ok github.com/containers/podman/v3/libpod 11.727s ``` Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* rootlessport: reduce memory usage of the processPaul Holzinger2021-10-12
| | | | | | | | | | | | | | | | | | | | | | Don't use reexec for the rootlessport process, instead make it a separate binary to reduce the memory usage. The problem with reexec is that it will import all packages that podman uses and therefore loads a lot of stuff into the heap. The rootlessport process however only needs the rootlesskit library. The memory usage is a concern since the rootlessport process will spawn two process per container which has ports forwarded. The processes stay until the container dies. On my laptop the current reexec version uses 47800 KB RSS. The new separate binary only uses 4540 KB RSS. This is more than a 90% improvement. The Makefile has been updated to compile the new binary and install it to the libexec directory. Fixes #10790 [NO TESTS NEEDED] Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* standardize logrus messages to upper caseDaniel J Walsh2021-09-22
| | | | | | | | Remove ERROR: Error stutter from logrus messages also. [ NO TESTS NEEDED] This is just code cleanup. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* Drop OCICNI dependencyPaul Holzinger2021-09-15
| | | | | | | | | | | We do not use the ocicni code anymore so let's get rid of it. Only the port struct is used but we can copy this into libpod network types so we can debloat the binary. The next step is to remove the OCICNI port mapping form the container config and use the better PortMapping struct everywhere. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* remove rootlessport socket to prevent EADDRINUSEPaul Holzinger2021-09-13
| | | | | | | | | When we restart a container via podman restart or restart policy the rootlessport process fails with `address already in use` because the socketfile still exists. This is a regression and was introduced in commit abdedc31a25e. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* rootlessport: allow socket paths with more than 108 charsPaul Holzinger2021-09-01
| | | | | | | | | | | | Creating the rootlessport socket can fail with `bind: invalid argument` when the socket path is longer than 108 chars. This is the case for users with a long runtime directory. Since the kernel does not allow to use socket paths with more then 108 chars use a workaround to open the socket path. [NO TESTS NEEDED] Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* fix rootlessport flakePaul Holzinger2021-08-18
| | | | | | | | | | | | | | | | | | | | | When the rootlessport process is started the stdout/stderr are attached to the podman process. However once everything is setup podman exits and when the rootlessport process tries to write to stdout it will fail with SIGPIPE. The code handles this signal and puts /dev/null to stdout and stderr but this is not robust. I do not understand the exact cause but sometimes the process is still killed by SIGPIPE. Either go lost the signal or the process got already killed before the goroutine could handle it. Instead of handling SIGPIPE just set /dev/null to stdout and stderr before podman exits. With this there should be no race and no way to run into SIGPIPE errors. [NO TESTS NEEDED] Fixes #11248 Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* fix rootless port forwarding with network dis-/connectPaul Holzinger2021-08-03
| | | | | | | | | | | | | | | | | | | | | | | | The rootlessport forwarder requires a child IP to be set. This must be a valid ip in the container network namespace. The problem is that after a network disconnect and connect the eth0 ip changed. Therefore the packages are dropped since the source ip does no longer exists in the netns. One solution is to set the child IP to 127.0.0.1, however this is a security problem. [1] To fix this we have to recreate the ports after network connect and disconnect. To make this work the rootlessport process exposes a socket where podman network connect/disconnect connect to and send to new child IP to rootlessport. The rootlessport process will remove all ports and recreate them with the new correct child IP. Also bump rootlesskit to v0.14.3 to fix a race with RemovePort(). Fixes #10052 [1] https://nvd.nist.gov/vuln/detail/CVE-2021-20199 Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* Enable whitespace linterPaul Holzinger2021-02-11
| | | | | | | | Use the whitespace linter and fix the reported problems. [NO TESTS NEEDED] Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
* rootlessport: set source IP to slirp4netns deviceGiuseppe Scrivano2021-01-22
| | | | | | | | | set the source IP to the slirp4netns address instead of 127.0.0.1 when using rootlesskit. Closes: https://github.com/containers/podman/issues/5138 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* rootlessport: use two different channelsGiuseppe Scrivano2020-04-29
| | | | | | | | | | | The same channel is written to by two different goroutines. Use a different channel for each of them so to avoid writing to a closed channel. Closes: https://github.com/containers/libpod/issues/6018 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* rootlessport: use x/sys/unix instead of syscallGiuseppe Scrivano2020-03-24
| | | | | | | | Dup2 is not defined on arm64 in the syscall package. Closes: https://github.com/containers/libpod/issues/5587 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* rootlessport: handle SIGPIPEGiuseppe Scrivano2020-03-19
| | | | | | | | | when a sigpipe is received the stdout/stderr pipe was closed, so reopen them with /dev/null. Closes: https://github.com/containers/libpod/issues/5541 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* rootlessport: detect rootless-child exitGiuseppe Scrivano2020-03-12
| | | | | | | otherwise the rootless parent process might wait indefinitely when the rootless-child process exits early. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* rootlessport: drop Pdeathsig in favor of KillGiuseppe Scrivano2020-02-12
| | | | | | | | | | | | | | | | | | there is a race condition where the child process is immediately killed: [pid 2576752] arch_prctl(0x3001 /* ARCH_??? */, 0x7ffdf612f170) = -1 EINVAL (Invalid argument) [pid 2576752] access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) [pid 2576752] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=2576742, si_uid=0} --- [pid 2576752] +++ killed by SIGTERM +++ this happens because the parent process here really means the "parent thread". Since there is no way of running it on the main thread, let's skip this functionality altogether and use kill(2). Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* rootlessport: fix potential hangGiuseppe Scrivano2020-02-12
| | | | | | | | | | | write to the error pipe only in case of an error. Otherwise we may end up in a race condition in the select statement below as the read from errChan happens before initComplete and the function returns immediately nil. Closes: https://github.com/containers/libpod/issues/5182 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* rootlessport: honor ctr.runtime.config.TmpDirAkihiro Suda2020-01-09
| | | | | | Previously, rootlessport was using /var/tmp as the tmp dir. Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
* rootlessport: remove state dir on exitAkihiro Suda2020-01-09
| | | | Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
* rootless: use RootlessKit port forwarderAkihiro Suda2020-01-08
RootlessKit port forwarder has a lot of advantages over the slirp4netns port forwarder: * Very high throughput. Benchmark result on Travis: socat: 5.2 Gbps, slirp4netns: 8.3 Gbps, RootlessKit: 27.3 Gbps (https://travis-ci.org/rootless-containers/rootlesskit/builds/597056377) * Connections from the host are treated as 127.0.0.1 rather than 10.0.2.2 in the namespace. No UDP issue (#4586) * No tcp_rmem issue (#4537) * Probably works with IPv6. Even if not, it is trivial to support IPv6. (#4311) * Easily extensible for future support of SCTP * Easily extensible for future support of `lxc-user-nic` SUID network RootlessKit port forwarder has been already adopted as the default port forwarder by Rootless Docker/Moby, and no issue has been reported AFAIK. As the port forwarder is imported as a Go package, no `rootlesskit` binary is required for Podman. Fix #4586 May-fix #4559 Fix #4537 May-fix #4311 See https://github.com/rootless-containers/rootlesskit/blob/v0.7.0/pkg/port/builtin/builtin.go Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>