| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
| |
For future work, we need multiple implementations of the OCI
runtime, not just a Conmon-wrapped runtime matching the runc CLI.
As part of this, do some refactoring on the interface for exec
(move to a struct, not a massive list of arguments). Also, add
'all' support to Kill and Stop (supported by runc and used a bit
internally for removing containers).
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|
|
|
|
|
|
| |
when runc returns an error about not being v2 complient, catch the error
and logrus an actionable message for users.
Signed-off-by: baude <bbaude@redhat.com>
|
|
|
|
| |
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
|
|\
| |
| | |
Unconditionally remove conmon files before starting
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We've been seeing a lot of issues (ref: #4061, but there are
others) where Podman hiccups on trying to start a container,
because some temporary files have been retained and Conmon will
not overwrite them.
If we're calling start() we can safely assume that we really want
those files gone so the container starts without error, so invoke
the cleanup routine. It's relatively cheap (four file removes) so
it shouldn't hurt us that much.
Also contains a small simplification to the removeConmonFiles
logic - we don't need to stat-then-remove when ignoring ENOENT is
fine.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CNI expects that a DELETE be run before re-creating container
networks. If a reboot occurs quickly enough that containers can't
stop and clean up, that DELETE never happens, and Podman
currently wipes the old network info and thinks the state has
been entirely cleared. Unfortunately, that may not be the case on
the CNI side. Some things - like IP address reservations - may
not have been cleared.
To solve this, manually re-run CNI Delete on refresh. If the
container has already been deleted this seems harmless. If not,
it should clear lingering state.
Fixes: #3759
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We should not be throwing errors because the operation we wanted
to perform is already done. Now, it is definitely strange that a
container is actually unmounted, but shows as mounted in the DB -
if this reoccurs in a way where we can investigate, it's worth
tearing into.
Fixes #4033
Signed-off-by: Matthew Heon <mheon@redhat.com>
|
|
|
|
|
|
|
|
| |
If you are running a rootless container on cgroupV1
you can not pause the container. We need to report the proper error
if this happens.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
when executing a healthcheck, we were not cleaning up after exec's use
of a socket. we now remove the socket file and ignore if for reason it
does not exist.
Fixes: #3962
Signed-off-by: baude <bbaude@redhat.com>
|
|\
| |
| | |
Support running containers without CGroups
|
| |
| |
| |
| |
| |
| |
| | |
This is mostly used with Systemd, which really wants to manage
CGroups itself when managing containers via unit file.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|/
|
|
|
|
|
|
|
|
|
| |
Previously, we only did this for volumes created at the same time
as the container. However, this is not correct behavior - Docker
does so for all named volumes, even those made with
'podman volume create' and mounted into a container later.
Fixes #3945
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|
|
|
|
|
|
|
|
|
|
| |
When we fail to remove a container's SHM, that's an error, and we
need to report it as such. This may be part of our lingering
storage woes.
Also, remove MNT_DETACH. It may be another cause of the storage
removal failures.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When volume options and the local volume driver are specified,
the volume is intended to be mounted using the 'mount' command.
Supported options will be used to volume the volume before the
first container using it starts, and unmount the volume after the
last container using it dies.
This should work for any local filesystem, though at present I've
only tested with tmpfs and btrfs.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|
|
|
|
|
|
|
|
|
|
| |
Support generating systemd unit files for a pod. Podman generates one
unit file for the pod including the PID file for the infra container's
conmon process and one unit file for each container (excluding the infra
container).
Note that this change implies refactorings in the `pkg/systemdgen` API.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Old versions of conmon have a bug where they create the exit file before
closing open file descriptors causing a race condition when restarting
containers with open ports since we cannot bind the ports as they're not
yet closed by conmon.
Killing the old conmon PID is ~okay since it forces the FDs of old
conmons to be closed, while it's a NOP for newer versions which should
have exited already.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When forcibly removing a container, we are initiating an explicit
stop of the container, which is not reflected in 'podman events'.
Swap to using our standard 'stop()' function instead of a custom
one for force-remove, and move the event into the internal stop
function (so internal calls also register it).
This does add one more database save() to `podman remove`. This
should not be a terribly serious performance hit, and does have
the desirable side effect of making things generally safer.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the exit file
If the container exit code needs to be retained, it cannot be retained
in tmpfs, because libpod runs in a memcg itself so it can't leave
traces with a daemon-less design.
This wasn't a memleak detectable by kmemleak for example. The kernel
never lost track of the memory and there was no erroneous refcounting
either. The reference count dependencies however are not easy to track
because when a refcount is increased, there's no way to tell who's
still holding the reference. In this case it was a single page of
tmpfs pagecache holding a refcount that kept pinned a whole hierarchy
of dying memcg, slab kmem, cgropups, unrechable kernfs nodes and the
respective dentries and inodes. Such a problem wouldn't happen if the
exit file was stored in a regular filesystem because the pagecache
could be reclaimed in such case under memory pressure. The tmpfs page
can be swapped out, but that's not enough to release the memcg with
CONFIG_MEMCG_SWAP_ENABLED=y.
No amount of more aggressive kernel slab shrinking could have solved
this. Not even assigning slab kmem of dying cgroups to alive cgroup
would fully solve this. The only way to free the memory of a dying
cgroup when a struct page still references it, would be to loop over
all "struct page" in the kernel to find which one is associated with
the dying cgroup which is a O(N) operation (where N is the number of
pages and can reach billions). Linking all the tmpfs pages to the
memcg would cost less during memcg offlining, but it would waste lots
of memory and CPU globally. So this can't be optimized in the kernel.
A cronjob running this command can act as workaround and will allow
all slab cache to be released, not just the single tmpfs pages.
rm -f /run/libpod/exits/*
This patch solved the memleak with a reproducer, booting with
cgroup.memory=nokmem and with selinux disabled. The reason memcg kmem
and selinux were disabled for testing of this fix, is because kmem
greatly decreases the kernel effectiveness in reusing partial slab
objects. cgroup.memory=nokmem is strongly recommended at least for
workstation usage. selinux needs to be further analyzed because it
causes further slab allocations.
The upstream podman commit used for testing is
1fe2965e4f672674f7b66648e9973a0ed5434bb4 (v1.4.4).
The upstream kernel commit used for testing is
f16fea666898dbdd7812ce94068c76da3e3fcf1e (v5.2-rc6).
Reported-by: Michele Baldessari <michele@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
<Applied with small tweaks to comments>
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|\
| |
| |
| |
| | |
wking/fatal-requested-hook-directory-does-not-exist
libpod/container_internal: Make all errors loading explicitly configured hook dirs fatal
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
hook dirs fatal
Remove this IsNotExist out which was added along with the rest of this
block in f6a2b6bf2b (hooks: Add pre-create hooks for runtime-config
manipulation, 2018-11-19, #1830). Besides the obvious "hook directory
does not exist", it was swallowing the less-obvious "hook command does
not exist". And either way, folks are likely going to want non-zero
podman exits when we fail to load a hook directory they explicitly
pointed us towards.
Signed-off-by: W. Trevor King <wking@tremily.us>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This includes:
Implement exec -i and fix some typos in description of -i docs
pass failed runtime status to caller
Add resize handling for a terminal connection
Customize exec systemd-cgroup slice
fix healthcheck
fix top
add --detach-keys
Implement podman-remote exec (jhonce)
* Cleanup some orphaned code (jhonce)
adapt remote exec for conmon exec (pehunt)
Fix healthcheck and exec to match docs
Introduce two new OCIRuntime errors to more comprehensively describe situations in which the runtime can error
Use these different errors in branching for exit code in healthcheck and exec
Set conmon to use new api version
Signed-off-by: Jhon Honce <jhonce@redhat.com>
Signed-off-by: Peter Hunt <pehunt@redhat.com>
|
| |
| |
| |
| |
| |
| |
| | |
this is the third round of preparing to use the golangci-lint on our
code base.
Signed-off-by: baude <bbaude@redhat.com>
|
| |
| |
| |
| |
| |
| | |
clean up and prepare to migrate to the golangci-linter
Signed-off-by: baude <bbaude@redhat.com>
|
|\ \
| | |
| | | |
Set correct SELinux label on restored containers
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Restoring a container from a checkpoint archive creates a complete
new root file-system. This file-system needs to have the correct SELinux
label or most things in that restored container will fail. Running
processes are not as problematic as newly exec()'d process (internally
or via 'podman exec').
This patch tells the storage setup which label should be used to mount
the container's root file-system.
Signed-off-by: Adrian Reber <areber@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | | |
clean up code identified as problematic by golands inspection
Signed-off-by: baude <bbaude@redhat.com>
|
|\ \ \
| | | |
| | | | |
generate kube with volumes
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Specifically, we were needlessly doing a double lookup to find which config mounts were user volumes. Improve this by refactoring a bit of code from inspect
Signed-off-by: Peter Hunt <pehunt@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
the results of a code cleanup performed by the goland IDE.
Signed-off-by: baude <bbaude@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Matches the behavior of Docker.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|/ / /
| | |
| | |
| | | |
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
| | |
| | |
| | |
| | |
| | |
| | | |
this is phase 2 for the removal of libpod from main.
Signed-off-by: baude <bbaude@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | | |
use the new implementation for dealing with cgroups.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
|
|/ /
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
the compilation demands of having libpod in main is a burden for the
remote client compilations. to combat this, we should move the use of
libpod structs, vars, constants, and functions into the adapter code
where it will only be compiled by the local client.
this should result in cleaner code organization and smaller binaries. it
should also help if we ever need to compile the remote client on
non-Linux operating systems natively (not cross-compiled).
Signed-off-by: baude <bbaude@redhat.com>
|
| |
| |
| |
| | |
Signed-off-by: Matthew Heon <mheon@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Allow Podman containers to request to use a specific OCI runtime
if multiple runtimes are configured. This is the first step to
properly supporting containers in a multi-runtime environment.
The biggest changes are that all OCI runtimes are now initialized
when Podman creates its runtime, and containers now use the
runtime requested in their configuration (instead of always the
default runtime).
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The storage driver and the storage options in storage.conf should
match, but if you change the storage driver via the command line
then we need to nil out the default storage options from storage.conf.
If the user wants to change the storage driver and use storage options,
they need to specify them on the command line.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This commit adds an option to the checkpoint command to export a
checkpoint into a tar.gz file as well as importing a checkpoint tar.gz
file during restore. With all checkpoint artifacts in one file it is
possible to easily transfer a checkpoint and thus enabling container
migration in Podman. With the following steps it is possible to migrate
a running container from one system (source) to another (destination).
Source system:
* podman container checkpoint -l -e /tmp/checkpoint.tar.gz
* scp /tmp/checkpoint.tar.gz destination:/tmp
Destination system:
* podman pull 'container-image-as-on-source-system'
* podman container restore -i /tmp/checkpoint.tar.gz
The exported tar.gz file contains the checkpoint image as created by
CRIU and a few additional JSON files describing the state of the
checkpointed container.
Now the container is running on the destination system with the same
state just as during checkpointing. If the container is kept running
on the source system with the checkpoint flag '-R', the result will be
that the same container is running on two different hosts.
Signed-off-by: Adrian Reber <areber@redhat.com>
|
|/
|
|
|
|
|
|
| |
This adds a couple of function in structure members needed in the next
commit to make container migration actually work. This just splits of
the function which are not modifying existing code.
Signed-off-by: Adrian Reber <areber@redhat.com>
|
|
|
|
|
|
|
| |
replace two usage of kwait.ExponentialBackoff in favor of WaitForFile
that uses inotify when possible.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
|
|
|
|
|
|
|
| |
Instead of rewriting the logic, reuse the standard logic we use
for removing containers, which is much better tested.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After a reboot, when we refresh Podman's state, we retrieved the
lock from the fresh SHM instance, but we did not mark it as
allocated to prevent it being handed out to other containers and
pods.
Provide a method for marking locks as in-use, and use it when we
refresh Podman state after a reboot.
Fixes #2900
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|
|
|
| |
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|
|
|
| |
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|
|
|
| |
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|
|
|
| |
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|
|
|
| |
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|
|
|
|
|
|
|
|
| |
The on-failure restart option supports restarting only a given
number of times. To do this, we need one additional field in the
DB to track restart count (which conveniently fills a field in
Inspect we weren't populating), plus some plumbing logic.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|
|
|
|
|
|
| |
This initial version does not support restart count, but it works
as advertised otherwise.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|
|
|
|
|
|
|
| |
This field indicates that a container was explciitly stopped by
an API call, and did not exit naturally. It's used when
implementing restart policy for containers.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|