summaryrefslogtreecommitdiff
path: root/libpod/networking_slirp4netns.go
Commit message (Collapse)AuthorAge
* network reload return error if we cannot reload portsPaul Holzinger2021-11-10
| | | | | | | As rootless we have to reload the port mappings. If it fails we should return an error instead of the warning. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* network reload without ports should not reload portsPaul Holzinger2021-11-10
| | | | | | | | | When run as rootless the podman network reload command tries to reload the rootlessport ports because the childIP could have changed. However if the containers has no ports we should skip this instead of printing a warning. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* Fix rootless networking with userns and portsPaul Holzinger2021-11-09
| | | | | | | | A rootless container created with a custom userns and forwarded ports did not work. I refactored the network setup to make the setup logic more clear. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* libpod: deduplicate ports in dbPaul Holzinger2021-10-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The OCICNI port format has one big problem: It does not support ranges. So if a users forwards a range of 1k ports with podman run -p 1001-2000 we have to store each of the thousand ports individually as array element. This bloats the db and makes the JSON encoding and decoding much slower. In many places we already use a better port struct type which supports ranges, e.g. `pkg/specgen` or the new network interface. Because of this we have to do many runtime conversions between the two port formats. If everything uses the new format we can skip the runtime conversions. This commit adds logic to replace all occurrences of the old format with the new one. The database will automatically migrate the ports to new format when the container config is read for the first time after the update. The `ParsePortMapping` function is `pkg/specgen/generate` has been reworked to better work with the new format. The new logic is able to deduplicate the given ports. This is necessary the ensure we store them efficiently in the DB. The new code should also be more performant than the old one. To prove that the code is fast enough I added go benchmarks. Parsing 1 million ports took less than 0.5 seconds on my laptop. Benchmark normalize PortMappings in specgen: Please note that the 1 million ports are actually 20x 50k ranges because we cannot have bigger ranges than 65535 ports. ``` $ go test -bench=. -benchmem ./pkg/specgen/generate/ goos: linux goarch: amd64 pkg: github.com/containers/podman/v3/pkg/specgen/generate cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz BenchmarkParsePortMappingNoPorts-12 480821532 2.230 ns/op 0 B/op 0 allocs/op BenchmarkParsePortMapping1-12 38972 30183 ns/op 131584 B/op 9 allocs/op BenchmarkParsePortMapping100-12 18752 60688 ns/op 141088 B/op 315 allocs/op BenchmarkParsePortMapping1k-12 3104 331719 ns/op 223840 B/op 3018 allocs/op BenchmarkParsePortMapping10k-12 376 3122930 ns/op 1223650 B/op 30027 allocs/op BenchmarkParsePortMapping1m-12 3 390869926 ns/op 124593840 B/op 4000624 allocs/op BenchmarkParsePortMappingReverse100-12 18940 63414 ns/op 141088 B/op 315 allocs/op BenchmarkParsePortMappingReverse1k-12 3015 362500 ns/op 223841 B/op 3018 allocs/op BenchmarkParsePortMappingReverse10k-12 343 3318135 ns/op 1223650 B/op 30027 allocs/op BenchmarkParsePortMappingReverse1m-12 3 403392469 ns/op 124593840 B/op 4000624 allocs/op BenchmarkParsePortMappingRange1-12 37635 28756 ns/op 131584 B/op 9 allocs/op BenchmarkParsePortMappingRange100-12 39604 28935 ns/op 131584 B/op 9 allocs/op BenchmarkParsePortMappingRange1k-12 38384 29921 ns/op 131584 B/op 9 allocs/op BenchmarkParsePortMappingRange10k-12 29479 40381 ns/op 131584 B/op 9 allocs/op BenchmarkParsePortMappingRange1m-12 927 1279369 ns/op 143022 B/op 164 allocs/op PASS ok github.com/containers/podman/v3/pkg/specgen/generate 25.492s ``` Benchmark convert old port format to new one: ``` go test -bench=. -benchmem ./libpod/ goos: linux goarch: amd64 pkg: github.com/containers/podman/v3/libpod cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz Benchmark_ocicniPortsToNetTypesPortsNoPorts-12 663526126 1.663 ns/op 0 B/op 0 allocs/op Benchmark_ocicniPortsToNetTypesPorts1-12 7858082 141.9 ns/op 72 B/op 2 allocs/op Benchmark_ocicniPortsToNetTypesPorts10-12 2065347 571.0 ns/op 536 B/op 4 allocs/op Benchmark_ocicniPortsToNetTypesPorts100-12 138478 8641 ns/op 4216 B/op 4 allocs/op Benchmark_ocicniPortsToNetTypesPorts1k-12 9414 120964 ns/op 41080 B/op 4 allocs/op Benchmark_ocicniPortsToNetTypesPorts10k-12 781 1490526 ns/op 401528 B/op 4 allocs/op Benchmark_ocicniPortsToNetTypesPorts1m-12 4 250579010 ns/op 40001656 B/op 4 allocs/op PASS ok github.com/containers/podman/v3/libpod 11.727s ``` Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* Slirp4netns with ipv6 set net.ipv6.conf.default.accept_dad=0Paul Holzinger2021-10-26
| | | | | | | | | | | | | | Duplicate Address Detection slows the ipv6 setup down for 1-2 seconds. Since slirp4netns is run it is own namespace and not directly routed we can skip this to make the ipv6 address immediately available. We change the default to make sure the slirp tap interface gets the correct value assigned so DAD is disabled for it. Also make sure to change this value back to the original after slirp4netns is ready in case users rely on this sysctl. Fixes #11062 Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* rootlessport: reduce memory usage of the processPaul Holzinger2021-10-12
| | | | | | | | | | | | | | | | | | | | | | Don't use reexec for the rootlessport process, instead make it a separate binary to reduce the memory usage. The problem with reexec is that it will import all packages that podman uses and therefore loads a lot of stuff into the heap. The rootlessport process however only needs the rootlesskit library. The memory usage is a concern since the rootlessport process will spawn two process per container which has ports forwarded. The processes stay until the container dies. On my laptop the current reexec version uses 47800 KB RSS. The new separate binary only uses 4540 KB RSS. This is more than a 90% improvement. The Makefile has been updated to compile the new binary and install it to the libexec directory. Fixes #10790 [NO TESTS NEEDED] Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* libpod: do not call (*container).Config()Valentin Rothberg2021-09-28
| | | | | | | | | | | | Access the container's config field directly inside of libpod instead of calling `Config()` which in turn creates expensive JSON deep copies. Accessing the field directly drops memory consumption of a simple `podman run --rm busybox true` from 1245kB to 410kB. [NO TESTS NEEDED] Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
* standardize logrus messages to upper caseDaniel J Walsh2021-09-22
| | | | | | | | Remove ERROR: Error stutter from logrus messages also. [ NO TESTS NEEDED] This is just code cleanup. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* Wire network interface into libpodPaul Holzinger2021-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make use of the new network interface in libpod. This commit contains several breaking changes: - podman network create only outputs the new network name and not file path. - podman network ls shows the network driver instead of the cni version and plugins. - podman network inspect outputs the new network struct and not the cni conflist. - The bindings and libpod api endpoints have been changed to use the new network structure. The container network status is stored in a new field in the state. The status should be received with the new `c.getNetworkStatus`. This will migrate the old status to the new format. Therefore old containers should contine to work correctly in all cases even when network connect/ disconnect is used. New features: - podman network reload keeps the ip and mac for more than one network. - podman container restore keeps the ip and mac for more than one network. - The network create compat endpoint can now use more than one ipam config. The man pages and the swagger doc are updated to reflect the latest changes. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* rootlessport: allow socket paths with more than 108 charsPaul Holzinger2021-09-01
| | | | | | | | | | | | Creating the rootlessport socket can fail with `bind: invalid argument` when the socket path is longer than 108 chars. This is the case for users with a long runtime directory. Since the kernel does not allow to use socket paths with more then 108 chars use a workaround to open the socket path. [NO TESTS NEEDED] Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* fix rootless port forwarding with network dis-/connectPaul Holzinger2021-08-03
| | | | | | | | | | | | | | | | | | | | | | | | The rootlessport forwarder requires a child IP to be set. This must be a valid ip in the container network namespace. The problem is that after a network disconnect and connect the eth0 ip changed. Therefore the packages are dropped since the source ip does no longer exists in the netns. One solution is to set the child IP to 127.0.0.1, however this is a security problem. [1] To fix this we have to recreate the ports after network connect and disconnect. To make this work the rootlessport process exposes a socket where podman network connect/disconnect connect to and send to new child IP to rootlessport. The rootlessport process will remove all ports and recreate them with the new correct child IP. Also bump rootlesskit to v0.14.3 to fix a race with RemovePort(). Fixes #10052 [1] https://nvd.nist.gov/vuln/detail/CVE-2021-20199 Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* podman service reaperPaul Holzinger2021-07-02
| | | | | | | | | | | | | | | | | | | | | | | | Add a new service reaper package. Podman currently does not reap all child processes. The slirp4netns and rootlesskit processes are not reaped. The is not a problem for local podman since the podman process dies before the other processes and then init will reap them for us. However with podman system service it is possible that the podman process is still alive after slirp died. In this case podman has to reap it or the slirp process will be a zombie until the service is stopped. The service reaper will listen in an extra goroutine on SIGCHLD. Once it receives this signal it will try to reap all pids that were added with `AddPID()`. While I would like to just reap all children this is not possible because many parts of the code use `os/exec` with `cmd.Wait()`. If we reap before `cmd.Wait()` things can break, so reaping everything is not an option. [NO TESTS NEEDED] Fixes #9777 Signed-off-by: Paul Holzinger <pholzing@redhat.com>
* Add host.containers.internal entry into container's etc/hostsBaron Lenardson2021-05-17
| | | | | | | | | | | This change adds the entry `host.containers.internal` to the `/etc/hosts` file within a new containers filesystem. The ip address is determined by the containers networking configuration and points to the gateway address for the containers networking namespace. Closes #5651 Signed-off-by: Baron Lenardson <lenardson.baron@gmail.com>
* Fix rootlesskit port forwarder with custom slirp cidrPaul Holzinger2021-04-23
| | | | | | | | | | | | | The source ip for the rootlesskit port forwarder was hardcoded to the standard slirp4netns ip. This is incorrect since users can change the subnet used by slirp4netns with `--network slirp4netns:cidr=10.5.0.0/24`. The container interface ip is always the .100 in the subnet. Only when the rootlesskit port forwarder child ip matches the container interface ip the port forwarding will work. Fixes #9828 Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
* Move slirp4netns functions into an extra filePaul Holzinger2021-04-01
This should make maintenance easier. Signed-off-by: Paul Holzinger <paul.holzinger@web.de>