| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
| |
with https://github.com/opencontainers/runc/pull/1807 we moved the
systemd notify initialization from "create" to "start", so that the
OCI runtime doesn't hang while waiting on reading from the notify
socket. This means we also need to set the correct NOTIFY_SOCKET when
start'ing the container.
Closes: https://github.com/containers/libpod/issues/746
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
|
|\
| |
| | |
Add tcp-established to checkpoint/restore
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
CRIU can checkpoint and restore processes/containers with established
TCP connections if the correct option is specified. To implement
checkpoint and restore with support for established TCP connections with
Podman this commit adds the necessary options to runc during checkpoint
and also tells conmon during restore to use 'runc restore' with
'--tcp-established'.
For this Podman feature to work a corresponding conmon change is
required.
Example:
$ podman run --tmpfs /tmp --name podman-criu-test -d docker://docker.io/yovfiatbeb/podman-criu-test
$ nc `podman inspect -l | jq -r '.[0].NetworkSettings.IPAddress'` 8080
GET /examples/servlets/servlet/HelloWorldExample
Connection: keep-alive
1
GET /examples/servlets/servlet/HelloWorldExample
Connection: keep-alive
2
$ # Using HTTP keep-alive multiple requests are send to the server in the container
$ # Different terminal:
$ podman container checkpoint -l
criu failed: type NOTIFY errno 0
$ # Looking at the log file would show errors because of established TCP connections
$ podman container checkpoint -l --tcp-established
$ # This works now and after the restore the same connection as above can be used for requests
$ podman container restore -l --tcp-established
The restore would fail without '--tcp-established' as the checkpoint image
contains established TCP connections.
Signed-off-by: Adrian Reber <areber@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This is basically the same change as
ff47a4c2d5485fc49f937f3ce0c4e2fd6bdb1956 (Use a struct to pass options to Checkpoint())
just for the Restore() function. It is used to pass multiple restore
options to the API and down to conmon which is used to restore
containers. This is for the upcoming changes to support checkpointing
and restoring containers with '--tcp-established'.
Signed-off-by: Adrian Reber <areber@redhat.com>
|
|\ \
| | |
| | | |
rootless: add new netmode "slirp4netns"
|
| |/
| |
| |
| | |
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
|
|/
|
|
|
|
|
|
|
|
|
|
|
| |
The conmon exit command is running inside of a namespace where the
process is running with uid=0. When it launches again podman for the
cleanup, podman is not running in rootless mode as the uid=0.
Export some more env variables to tell podman we are in rootless
mode.
Closes: https://github.com/containers/libpod/issues/1859
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
|
|\
| |
| | |
exec: always make explicit the tty value
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
otherwise runc will take by default the value used for creating the
container. Setting it explicit overrides its default value and we
won't end up trying to use a terminal when not available.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1625876
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CRIU supports to leave processes running after checkpointing:
-R|--leave-running leave tasks in running state after checkpoint
runc also support to leave containers running after checkpointing:
--leave-running leave the process running after checkpointing
With this commit the support to leave a container running after
checkpointing is brought to Podman:
--leave-running, -R leave the container running after writing checkpoint to disk
Now it is possible to checkpoint a container at some point in time
without stopping the container. This can be used to rollback the
container to an early state:
$ podman run --tmpfs /tmp --name podman-criu-test -d docker://docker.io/yovfiatbeb/podman-criu-test
$ curl 10.88.64.253:8080/examples/servlets/servlet/HelloWorldExample
3
$ podman container checkpoint -R -l
$ curl 10.88.64.253:8080/examples/servlets/servlet/HelloWorldExample
4
$ curl 10.88.64.253:8080/examples/servlets/servlet/HelloWorldExample
5
$ podman stop -l
$ podman container restore -l
$ curl 10.88.64.253:8080/examples/servlets/servlet/HelloWorldExample
4
So after checkpointing the container kept running and was stopped after
some time. Restoring this container will restore the state right at the
checkpoint.
Signed-off-by: Adrian Reber <areber@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Redefining err by := operator within block makes this err variable block local.
Addressing lint:
libpod/oci.go:368:3:warning: ineffectual assignment to err (ineffassign)
Signed-off-by: Šimon Lukašík <slukasik@redhat.com>
|
|
|
|
| |
Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
|
|
|
|
|
|
|
| |
Instead of running a full sync after starting a container to pick
up its PID, grab it from Conmon instead.
Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
|
|
|
|
|
|
|
|
|
| |
When we scan a container in runc and see that it no longer
exists, we already set ContainerStatusExited to indicate that it
no longer exists in runc. Now, also set an exit code and exit
time, so PS output will make some sense.
Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When syncing container state, we normally call out to runc to see
the container's status. This does have significant performance
implications, though, and we've seen issues with large amounts of
runc processes being spawned.
This patch attempts to use stat calls on the container exit file
created by Conmon instead to sync state. This massively decreases
the cost of calling updateContainer (it has gone from an
almost-unconditional fork/exec of runc to a single stat call that
can be avoided in most states).
Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
|
|
|
|
|
|
|
|
|
| |
when reading the output from conmon using the JSON methods, it appears that
JSON marshalling is higher in pprof than it really is because the pipe is
"waiting" for a response. this gives us a clearer look at the real CPU/time
consumers.
Signed-off-by: baude <bbaude@redhat.com>
|
|
|
|
|
|
|
|
| |
I've seen a runc zombie process hanging around, it is caused by not
cleaning up the "$OCI status" process. Also adjust another location
that has the same issue.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
|
|
|
|
|
|
| |
This should allow us to share this code with buildah.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
|
|\
| |
| | |
podman: allow usage of gVisor as OCI runtime
|
| |
| |
| |
| |
| |
| |
| | |
read the OCI status from stdout, not the combined stdout+stderr
stream.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Fix an issue when using gVisor that couldn't start the container since
the XDG_RUNTIME_DIR env variable used for the "create" and "start"
commands is different. Set the environment variable for each command
so that the OCI runtime gets always the same value.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
runc uses CRIU to support checkpoint and restore of containers. This
brings an initial checkpoint/restore implementation to podman.
None of the additional runc flags are yet supported and container
migration optimization (pre-copy/post-copy) is also left for the future.
The current status is that it is possible to checkpoint and restore a
container. I am testing on RHEL-7.x and as the combination of RHEL-7 and
CRIU has seccomp troubles I have to create the container without
seccomp.
With the following steps I am able to checkpoint and restore a
container:
# podman run --security-opt="seccomp=unconfined" -d registry.fedoraproject.org/f27/httpd
# curl -I 10.22.0.78:8080
HTTP/1.1 403 Forbidden # <-- this is actually a good answer
# podman container checkpoint <container>
# curl -I 10.22.0.78:8080
curl: (7) Failed connect to 10.22.0.78:8080; No route to host
# podman container restore <container>
# curl -I 10.22.0.78:8080
HTTP/1.1 403 Forbidden
I am using CRIU, runc and conmon from git. All required changes for
checkpoint/restore support in podman have been merged in the
corresponding projects.
To have the same IP address in the restored container as before
checkpointing, CNI is told which IP address to use.
If the saved network configuration cannot be found during restore, the
container is restored with a new IP address.
For CRIU to restore established TCP connections the IP address of the
network namespace used for restore needs to be the same. For TCP
connections in the listening state the IP address can change.
During restore only one network interface with one IP address is handled
correctly. Support to restore containers with more advanced network
configuration will be implemented later.
v2:
* comment typo
* print debug messages during cleanup of restore files
* use createContainer() instead of createOCIContainer()
* introduce helper CheckpointPath()
* do not try to restore a container that is paused
* use existing helper functions for cleanup
* restructure code flow for better readability
* do not try to restore if checkpoint/inventory.img is missing
* git add checkpoint.go restore.go
v3:
* move checkpoint/restore under 'podman container'
v4:
* incorporated changes from latest reviews
Signed-off-by: Adrian Reber <areber@redhat.com>
|
|\
| |
| | |
Add ContainerStateExited and OCI delete() in cleanup()
|
| |
| |
| |
| | |
Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
|
|/
|
|
|
|
|
|
|
|
|
|
|
| |
Execute the command as described by a container image. The value of the label is processed
into a command by:
1. Ensuring the first argument of the command is podman.
2. Substituting any variables with those defined by the environment or otherwise.
If no label exists in the container image, nothing is done.
podman container runlabel LABEL IMAGE extra_args
Signed-off-by: baude <bbaude@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
We've increased the default rlimits to allow Podman to hold many
ports open without hitting limits and crashing, but this doesn't
solve the amount of memory that holding open potentially
thousands of ports will use. Offer a switch to optionally disable
port reservation for performance- and memory-constrained use
cases.
Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
since we have a way for joining an existing userns use it instead of
nsenter.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #1371
Approved by: rhatdan
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The OCI runtime might use the cgroups to see what PIDs
are inside the container, but that doesn't work with rootless
containers.
Closes: https://github.com/containers/libpod/issues/1337
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #1331
Approved by: rhatdan
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Manage the case where the main process of the container creates and
joins a new user namespace.
In this case we want to join only the first child in the new
hierarchy, which is the user namespace that was used to create the
container.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #1331
Approved by: rhatdan
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We cannot re-exec into a new user namespace to gain privileges and
access an existing as the new namespace is not the owner of the
existing container.
"unshare" is used to join the user namespace of the target container.
The current implementation assumes that the main process of the
container didn't create a new user namespace.
Since in the setup phase we are not running with euid=0, we must skip
the setup for containers/storage.
Closes: https://github.com/containers/libpod/issues/1329
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #1331
Approved by: rhatdan
|
|
|
|
|
|
|
|
|
|
| |
Need to get some small changes into libpod to pull back into buildah
to complete buildah transition.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Closes: #1270
Approved by: mheon
|
|
|
|
|
|
|
| |
Signed-off-by: Valentin Rothberg <vrothberg@suse.com>
Closes: #1242
Approved by: rhatdan
|
|
|
|
|
|
|
| |
Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
Closes: #1232
Approved by: rhatdan
|
|
|
|
|
|
|
|
|
|
|
| |
slirp4netns is required to setup the network namespace:
https://github.com/rootless-containers/slirp4netns
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #1156
Approved by: rhatdan
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Bind all the specified TCP and UDP ports so that another process
cannot reuse them. The fd of the listener is then leaked into conmon
so that the socket is kept busy until the container exits.
Closes: https://github.com/projectatomic/libpod/issues/210
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #1100
Approved by: mheon
|
|
|
|
|
|
|
|
| |
Use this to supplement exit codes returned from containers, to
make sure we know when exit codes are invalid (as the container
has not yet exited)
Signed-off-by: Matthew Heon <mheon@redhat.com>
|
|
|
|
| |
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes some boilerplate from the libpod package, so we can focus
on container stuff there. And it gives us a tidy sub-package for
focusing on ctime extraction, so we can focus on unit testing and
portability of the extraction utility there.
For the unsupported implementation, I'm falling back to Go's ModTime
[1]. That's obviously not the creation time, but it's likely to be
closer than the uninitialized Time structure from cc6f0e85 (more
changes to compile darwin, 2018-07-04, #1047). Especially for our use
case in libpod/oci, where we're looking at write-once exit files.
The test is more complicated than I initially expected, because on
Linux filesystem timestamps come from a truncated clock without
interpolation [2] (and network filesystems can be completely decoupled
[3]). So even for local disks, creation times can be up to a jiffie
earlier than 'before'. This test ensures at least monotonicity by
creating two files and ensuring the reported creation time for the
second is greater than or equal to the reported creation time for the
first. It also checks that both creation times are within the window
from one second earlier than 'before' through 'after'. That should be
enough of a window for local disks, even if the kernel for those
systems has an abnormally large jiffie. It might be ok on network
filesystems, although it will not be very resilient to network clock
lagging behind the local system clock.
[1]: https://golang.org/pkg/os/#FileInfo
[2]: https://groups.google.com/d/msg/linux.kernel/mdeXx2TBYZA/_4eJEuJoAQAJ
Subject: Re: Apparent backward time travel in timestamps on file creation
Date: Thu, 30 Mar 2017 20:20:02 +0200
Message-ID: <tqMPU-1Sb-21@gated-at.bofh.it>
[3]: https://groups.google.com/d/msg/linux.kernel/mdeXx2TBYZA/cTKj4OBuAQAJ
Subject: Re: Apparent backward time travel in timestamps on file creation
Date: Thu, 30 Mar 2017 22:10:01 +0200
Message-ID: <tqOyl-36A-1@gated-at.bofh.it>
Signed-off-by: W. Trevor King <wking@tremily.us>
Closes: #1050
Approved by: mheon
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this should represent the last major changes to get darwin to **compile**. again,
the purpose here is to get darwin to compile so that we can eventually implement a
ci task that would protect against regressions for darwin compilation.
i have left the manual darwin compilation largely static still and in fact now only
interject (manually) two build tags to assist with the build. trevor king has great
ideas on how to make this better and i will defer final implementation of those
to him.
Signed-off-by: baude <bbaude@redhat.com>
Closes: #1047
Approved by: rhatdan
|
|
|
|
|
|
|
| |
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #1048
Approved by: mheon
|
|
|
|
|
|
|
| |
Signed-off-by: baude <bbaude@redhat.com>
Closes: #1015
Approved by: baude
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we run containers in detach mode, nothing cleans up the network stack or
the mount points. This patch will tell conmon to execute the cleanup code when
the container exits.
It can also be called to attempt to cleanup previously running containers.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Closes: #942
Approved by: mheon
|
|
|
|
|
|
|
|
|
|
|
| |
If the caller sets up the app to be in logrus.DebugLevel,
then we will add the --syslog flag to conmon to get all of the
messages.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Closes: #1014
Approved by: TomSweeneyRedHat
|
|
|
|
|
|
|
| |
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #936
Approved by: rhatdan
|
|
|
|
|
|
|
| |
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #871
Approved by: mheon
|
|
|
|
|
|
|
| |
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #871
Approved by: mheon
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have an issue where iptables command is being executed by podman
and attempted to run with a different label. This fix changes podman
to only change the label on the conmon command and then set the
SELinux interface back to the default.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Closes: #906
Approved by: giuseppe
|
|
|
|
|
|
|
|
|
|
| |
There was a new line at the end of does not exist
which was causing this to fail.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Closes: #863
Approved by: baude
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If SELinux is enabled, we are leaking in pipes into the container
owned by conmon. The container processes are not allowed to use
these pipes, if the calling process is fully ranged. By changing
the level of the conmon process to s0, this allows container processes
to use the pipes.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Closes: #854
Approved by: mheon
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of manually calling the individual functions that cleanup
uses to tear down a container's resources, just call the cleanup
function to make sure that cleanup only needs to happen in one
place.
Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
Closes: #790
Approved by: rhatdan
|