| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Our previous flow was to perform a hijack before passing a
connection into Libpod, and then Libpod would attach to the
container's attach socket and begin forwarding traffic.
A problem emerges: we write the attach header as soon as the
attach complete. As soon as we write the header, the client
assumes that all is ready, and sends a Start request. This Start
may be processed *before* we successfully finish attaching,
causing us to lose output.
The solution is to handle hijacking inside Libpod. Unfortunately,
this requires a downright extensive refactor of the Attach and
HTTP Exec StartAndAttach code. I think the result is an
improvement in some places (a lot more errors will be handled
with a proper HTTP error code, before the hijack occurs) but
other parts, like the relocation of printing container logs, are
just *bad*. Still, we need this fixed now to get CI back into
good shape...
Fixes #7195
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|
|
|
|
|
|
| |
Test flakes mentioned in #6987 might be caused by uncorrect closing of file descriptor.
Fix the code to close file descriptors for podman run since it may close those used by other processes.
Signed-off-by: Qi Wang <qiwan@redhat.com>
|
|
|
|
| |
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was inspired by https://github.com/cri-o/cri-o/pull/3934 and
much of the logic for it is contained there. However, in brief,
a named return called "err" can cause lots of code confusion and
encourages using the wrong err variable in defer statements,
which can make them work incorrectly. Using a separate name which
is not used elsewhere makes it very clear what the defer should
be doing.
As part of this, remove a large number of named returns that were
not used anywhere. Most of them were once needed, but are no
longer necessary after previous refactors (but were accidentally
retained).
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
--sdnotify container|conmon|ignore
With "conmon", we send the MAINPID, and clear the NOTIFY_SOCKET so the OCI
runtime doesn't pass it into the container. We also advertise "ready" when the
OCI runtime finishes to advertise the service as ready.
With "container", we send the MAINPID, and leave the NOTIFY_SOCKET so the OCI
runtime passes it into the container for initialization, and let the container advertise further metadata.
This is the default, which is closest to the behavior podman has done in the past.
The "ignore" option removes NOTIFY_SOCKET from the environment, so neither podman nor
any child processes will talk to systemd.
This removes the need for hardcoded CID and PID files in the command line, and
the PIDFile directive, as the pid is advertised directly through sd-notify.
Signed-off-by: Joseph Gooch <mrwizard@dok.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the advent of Podman 2.0.0 we crossed the magical barrier of go
modules. While we were able to continue importing all packages inside
of the project, the project could not be vendored anymore from the
outside.
Move the go module to new major version and change all imports to
`github.com/containers/libpod/v2`. The renaming of the imports
was done via `gomove` [1].
[1] https://github.com/KSubedi/gomove
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When running under systemd there is no need to create yet another
cgroup for the container.
With conmon-delegated the current cgroup will be split in two sub
cgroups:
- supervisor
- container
The supervisor cgroup will hold conmon and the podman process, while
the container cgroup is used by the OCI runtime (using the cgroupfs
backend).
Closes: https://github.com/containers/libpod/issues/6400
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
|
|
|
|
|
|
| |
Add --preservefds to podman run. close https://github.com/containers/libpod/issues/6458
Signed-off-by: Qi Wang <qiwan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the container uses journald logging, we don't want to
automatically use the same driver for its exec sessions. If we do
we will pollute the journal (particularly in the case of
healthchecks) with large amounts of undesired logs. Instead,
force exec sessions logs to file for now; we can add a log-driver
flag later (we'll probably want to add a `podman logs` command
that reads exec session logs at the same time).
As part of this, add support for the new 'none' logs driver in
Conmon. It will be the default log driver for exec sessions, and
can be optionally selected for containers.
Great thanks to Joe Gooch (mrwizard@dok.org) for adding support
to Conmon for a null log driver, and wiring it in here.
Fixes #6555
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This came out of a conversation with Valentin about
systemd-managed Podman. He discovered that unit files did not
properly handle cases where Conmon was dead - the ExecStopPost
`podman rm --force` line was not actually removing the container,
but interestingly, adding a `podman cleanup --rm` line would
remove it. Both of these commands do the same thing (minus the
`podman cleanup --rm` command not force-removing running
containers).
Without a running Conmon instance, the container process is still
running (assuming you killed Conmon with SIGKILL and it had no
chance to kill the container it managed), but you can still kill
the container itself with `podman stop` - Conmon is not involved,
only the OCI Runtime. (`podman rm --force` and `podman stop` use
the same code to kill the container). The problem comes when we
want to get the container's exit code - we expect Conmon to make
us an exit file, which it's obviously not going to do, being
dead. The first `podman rm` would fail because of this, but
importantly, it would (after failing to retrieve the exit code
correctly) set container status to Exited, so that the second
`podman cleanup` process would succeed.
To make sure the first `podman rm --force` succeeds, we need to
catch the case where Conmon is already dead, and instead of
waiting for an exit file that will never come, immediately set
the Stopped state and remove an error that can be caught and
handled.
Signed-off-by: Matthew Heon <mheon@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
These are required for detached exec, where they will be used to
clean up and remove exec sessions when they exit.
As part of this, move all Exec related functionality for the
Conmon OCI runtime into a separate file; the existing one was
around 2000 lines.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|
|
|
|
|
|
|
|
|
| |
specifying `-n=ctr-name` tells conmon to log CONTAINER_NAME=name if the log driver is journald
add this, and a test!
also, refactor the args slice creation to not append() unnecessarily.
Signed-off-by: Peter Hunt <pehunt@redhat.com>
|
|
|
|
| |
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|
|
|
|
|
|
|
|
|
|
| |
During the initial workup of HTTP exec, I duplicated most of the
existing exec handling code so I could work on it without
breaking normal exec (and compare what I was doing to the nroaml
version). Now that it's done and working, we can switch over to
the refactored version and ditch the original, removing a lot of
duplicated code.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|
|
|
|
|
| |
This makes the endpoint (mostly) functional.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is heavily based off the existing exec implementation, but
does not presently share code with it, to try and ensure we don't
break anything.
Still to do:
- Add code sharing with existing exec implementation
- Wire in the frontend (exec HTTP endpoint)
- Move all exec-related code in oci_conmon_linux.go into a new
file
- Investigate code sharing between HTTP attach and HTTP exec.
Signed-off-by: Matthew Heon <mheon@redhat.com>
|
|
|
|
|
|
|
|
| |
* Add ErrLostSync to report lost of sync when de-mux'ing stream
* Add logus.SetLevel(logrus.DebugLevel) when `go test -v` given
* Add context to debugging messages
Signed-off-by: Jhon Honce <jhonce@redhat.com>
|
|
|
|
| |
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
|
|
|
|
|
|
| |
rid ourseleves of libpod references in v2 client
Signed-off-by: Brent Baude <bbaude@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to better support kata containers and systemd containers
container-selinux has added new types. Podman should execute the
container with an SELinux process label to match the container type.
Traditional Container process : container_t
KVM Container Process: containre_kvm_t
PID 1 Init process: container_init_t
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A few major fixes here:
- Support for attaching to Configured containers, to match Docker
behavior.
- Support for stream parameter has been improved (we now properly
handle cases where it is not set).
- Initial support for logs parameter has been added.
- Setting attach streams when the container has a terminal is now
supported.
- Errors are properly reported once the hijack has begun.
Signed-off-by: Matthew Heon <mheon@redhat.com>
|
|
|
|
|
|
| |
the current implementation of info, while typed, is very loosely done so. we need stronger types for our apiv2 implmentation and bindings.
Signed-off-by: Brent Baude <bbaude@redhat.com>
|
|\
| |
| | |
Prepare for crun checkpoint support
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
For (almost) all commands which podman passes on to a OCI runtime
XDG_RUNTIME_DIR is set to the same value. This does not happen for the
checkpoint command.
Using crun to checkpoint a container without this change will lead to
crun using XDG_RUNTIME_DIR of the currently logged in user and so it
will not find the container Podman wants to checkpoint.
This bascially just copies a few lines from on of the other commands to
handle 'checkpoint' as all the other commands.
Thanks to Giuseppe for helping me with this.
For 'restore' it is not needed as restore goes through conmon and for
calling conmon Podman already configures XDG_RUNTIME_DIR correctly.
Signed-off-by: Adrian Reber <areber@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Podman was checking if the runtime support checkpointing by running
'runtime checkpoint -h'. That works for runc.
crun, however, does not use '-h, --help' for help output but, '-?,
--help'.
This commit switches both checkpoint support detection from
'runtime checkpoint -h'
to
'runtime checkpoint --help'.
Podman can now correctly detect if 'crun' also support checkpointing.
Signed-off-by: Adrian Reber <areber@redhat.com>
|
|/
|
|
| |
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
|
|
|
|
|
|
|
|
| |
if the control path file is deleted, libpod hangs waiting for a reader
to open it. Attempt to open it as non blocking until it returns an
error different than EINTR or EAGAIN.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
|
|
|
|
|
|
|
| |
vendor in c/common config pkg for containers.conf
Signed-off-by: Qi Wang qiwan@redhat.com
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We previously tried to send resize events only after the exec
session successfully started, which makes sense (we might drop an
event or two that came in before the exec session started
otherwise). However, the start function blocks, so waiting
actually means we send no resize events at all, which is
obviously worse than losing a few.. Sending resizes before attach
starts seems to work fine in my testing, so let's do that until we
get bug reports that it doesn't work.
Fixes #5584
Signed-off-by: Matthew Heon <mheon@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As part of the rework of exec sessions, we need to address them
independently of containers. In the new API, we need to be able
to fetch them by their ID, regardless of what container they are
associated with. Unfortunately, our existing exec sessions are
tied to individual containers; there's no way to tell what
container a session belongs to and retrieve it without getting
every exec session for every container.
This adds a pointer to the container an exec session is
associated with to the database. The sessions themselves are
still stored in the container.
Exec-related APIs have been restructured to work with the new
database representation. The originally monolithic API has been
split into a number of smaller calls to allow more fine-grained
control of lifecycle. Support for legacy exec sessions has been
retained, but in a deprecated fashion; we should remove this in
a few releases.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|
|
|
|
|
|
| |
conmon forks itself, so make sure we reap the first process and not
leave a zombie process.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
|
|
|
|
|
|
|
|
| |
Update the outdated systemd and dbus dependencies which are now provided
as go modules. This will further tighten our dependencies and releases
and pave the way for the upcoming auto-update feature.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
|
|
|
|
|
|
|
|
|
| |
This reverts commit 4b72f9e4013411208751df2a92ab9f322d4da5b2.
Continues what began with revert of
d3d97a25e8c87cf741b2e24ac01ef84962137106 in previous commit.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|
|
|
|
|
|
|
|
| |
This reverts commit d3d97a25e8c87cf741b2e24ac01ef84962137106.
This does not resolve the issues we expected it would, and has
some unexpected side effects with the upcoming exec rework.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|
|
|
|
|
|
|
| |
Before, we were using -1 as a bogus value in podman to signify something went wrong when reading from a conmon pipe. However, conmon uses negative values to indicate the runtime failed, and return the runtime's exit code.
instead, we should use a bogus value that is actually bogus. Define that value in the define package as MinInt32 (-1<< 31 - 1), which is outside of the range of possible pids (-1 << 31)
Signed-off-by: Peter Hunt <pehunt@redhat.com>
|
|
|
|
|
|
|
|
| |
Before, we were getting the exit code from the file, in which we waited an arbitrary amount of time (5 seconds) for the file, and segfaulted if we didn't find it. instead, we should be a bit more certain conmon has sent the exit code. Luckily, it sends the exit code along the sync pipe fd, so we can read it from there
Adapt the ExecContainer interface to pass along a channel to get the pid and exit code from conmon, to be able to read both from the pipe
Signed-off-by: Peter Hunt <pehunt@redhat.com>
|
|
|
|
|
|
|
|
| |
This corrects a regression from Podman 1.4.x where container exec
sessions inherited supplemental groups from the container, iff
the exec session did not specify a user.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|
|
|
|
|
|
|
|
|
|
|
| |
when using -d and port mapping, make sure the correct fd is injected
into conmon.
Move the pipe creation earlier as the fd must be known at the time we
create the container through conmon.
Closes: https://github.com/containers/libpod/issues/5167
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
|
|\
| |
| | |
oci_conmon: do not create a cgroup under systemd
|
| |
| |
| |
| |
| |
| |
| |
| | |
Detect whether we are running under systemd (if the INVOCATION_ID is
set). If Podman is running under a systemd service, we do not need to
create a cgroup for conmon.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
it allows to disable cgroups creation only for the conmon process.
A new cgroup is created for the container payload.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new APIv2 branch provides an HTTP-based remote API to Podman.
The requirements of this are, unfortunately, incompatible with
the existing Attach API. For non-terminal attach, we need append
a header to what was copied from the container, to multiplex
STDOUT and STDERR; to do this with the old API, we'd need to copy
into an intermediate buffer first, to handle the headers.
To avoid this, provide a new API to handle all aspects of
terminal and non-terminal attach, including closing the hijacked
HTTP connection. This might be a bit too specific, but for now,
it seems to be the simplest approach.
At the same time, add a Resize endpoint. This needs to be a
separate endpoint, so our existing channel approach does not work
here.
I wanted to rework the rest of attach at the same time (some
parts of it, particularly how we start the Attach session and how
we do resizing, are (in my opinion) handled much better here.
That may still be on the table, but I wanted to avoid breaking
existing APIs in this already massive change.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
do not change the permissions mask for the rundir and the tmpdir when
running a container with a user namespace and the current user is
mapped inside the user namespace.
The change was introduced with
849548ffb8e958e901317eceffdcc2d918cafd8d, that dropped the
intermediate mount namespace in favor of allowing root into the user
namespace to access these directories.
Closes: https://github.com/containers/libpod/issues/4846
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
|
|
|
|
|
|
|
| |
`gocritic` is a powerful linter that helps in preventing certain kinds
of errors as well as enforcing a coding style.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
|
|\
| |
| | |
log: support --log-opt tag=
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
support a custom tag to add to each log for the container.
It is currently supported only by the journald backend.
Closes: https://github.com/containers/libpod/issues/3653
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
|
|\ \
| |/
|/| |
exec: fix pipes
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In a largely anticlimatic solution to the saga of piped input from conmon, we come to this solution.
When we pass the Stdin stream to the exec.Command structure, it's immediately consumed and lost, instead of being consumed through CopyDetachable().
When we don't pass -i in, conmon is not told to create a masterfd_stdin, and won't pass anything to the container.
With both, we can do
echo hi | podman exec -til cat
and get the expected hi
Signed-off-by: Peter Hunt <pehunt@redhat.com>
|
| |
| |
| |
| | |
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RootlessKit port forwarder has a lot of advantages over the slirp4netns port forwarder:
* Very high throughput.
Benchmark result on Travis: socat: 5.2 Gbps, slirp4netns: 8.3 Gbps, RootlessKit: 27.3 Gbps
(https://travis-ci.org/rootless-containers/rootlesskit/builds/597056377)
* Connections from the host are treated as 127.0.0.1 rather than 10.0.2.2 in the namespace.
No UDP issue (#4586)
* No tcp_rmem issue (#4537)
* Probably works with IPv6. Even if not, it is trivial to support IPv6. (#4311)
* Easily extensible for future support of SCTP
* Easily extensible for future support of `lxc-user-nic` SUID network
RootlessKit port forwarder has been already adopted as the default port forwarder by Rootless Docker/Moby,
and no issue has been reported AFAIK.
As the port forwarder is imported as a Go package, no `rootlesskit` binary is required for Podman.
Fix #4586
May-fix #4559
Fix #4537
May-fix #4311
See https://github.com/rootless-containers/rootlesskit/blob/v0.7.0/pkg/port/builtin/builtin.go
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
|