| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Support running `podman play kube` in systemd by exploiting the
previously added "service containers". During `play kube`, a service
container is started before all the pods and containers, and is stopped
last. The service container communicates its conmon PID via sdnotify.
Add a new systemd template to dispatch such k8s workloads. The argument
of the template is the path to the k8s file. Note that the path must be
escaped for systemd not to bark:
Let's assume we have a `top.yaml` file in the home directory:
```
$ escaped=$(systemd-escape ~/top.yaml)
$ systemctl --user start podman-play-kube@$escaped.service
```
Closes: https://issues.redhat.com/browse/RUN-1287
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Send the main PID only once. Previously, `(*Container).start()` and
the conmon handler sent them ~simultaneously and went into a race.
I noticed the issue while debugging a WIP PR.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
systemd sometimes spits out lines in the wrong order. Deal with it.
This fixes an infrequent flake that I haven't filed because I
didn't understand it well enough. (Hence, this reduces BUGS
but does not reduce BUG COUNT. Sorry!)
Signed-off-by: Ed Santiago <santiago@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Cleanup: the final 'play' test wasn't cleaning up after itself,
leading to angry warning messages when rerunning tests (in
my environment; never in CI)
Debug: I'm seeing a lot of "Could not parse READY=1 as MAINPID=nnn"
flakes in the sdnotify:container test (nine in the past month). Add
debug traces to help diagnose in future flakes.
Signed-off-by: Ed Santiago <santiago@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This leverages conmon's ability to proxy the SD-NOTIFY socket.
This prevents locking caused by OCI runtime blocking, waiting for
SD-NOTIFY messages, and instead passes the messages directly up
to the host.
NOTE: Also re-enable the auto-update tests which has been disabled due
to flakiness. With this change, Podman properly integrates into
systemd.
Fixes: #7316
Signed-off-by: Joseph Gooch <mrwizard@dok.org>
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some CI systems set $OCI_RUNTIME as a way to override the
default crun. Integration (e2e) tests honor this, but system
tests were not aware of the convention; this means we haven't
been testing system tests with runc, which means RHEL gating
tests are now failing.
The proper solution would be to edit containers.conf on CI
systems. Sorry, that would involve too much CI-VM work.
Instead, this PR detects $OCI_RUNTIME and creates a dummy
containers.conf file using that runtime.
Add: various skips for tests that don't work with runc.
Refactor: add a helper function so we don't need to do
the complicated 'podman info blah blah .OCIRuntime.blah'
thing in many places.
BUG: we leave a tmp file behind on exit.
Signed-off-by: Ed Santiago <santiago@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
podman gating tests are hanging in the new Fedora CI setup;
long and tedious investigation suggests that 'socat' processes
are being left unkilled, which then causes BATS to hang when
it (presumably) runs a final 'wait' in its end cleanup.
The two principal changes are to exec socat in a subshell
with fd3 closed, and to pkill its child processes before
killing the process itself. I don't know if both are needed.
The pkill definitely is; the exec may just be superstition.
Since I've wasted more than a day of PTO time on this, I'm
okay with a little superstition. What I do know is that with
these two changes, my reproducer fails to reproduce in over
one hour of trying (normally it fails within 5 minutes).
AND, update: only rawhide (f35) leaves stray socat processes
behind. f33 and ubuntu do not, so 'pkill -P' fails.
I really have no idea what's going on.
Signed-off-by: Ed Santiago <santiago@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- run test: minor cleanup to .containerenv test. Basically,
make it do only two podman-runs (they're expensive) and
tighten up the results checks
- ps test: add ps -a --storage. Requires small tweak to
run_podman helper, so we can have "timeout" be an expected
result
- sdnotify test: workaround for #8718 (seeing MAINPID=xxx as
last output line instead of READY=1). As found by the
newly-added debugging echos, what we are seeing is:
MAINPID=103530
READY=1
MAINPID=103530
It's not supposed to be that way; it's supposed to be just
the first two. But when faced with reality, we must bend
to accommodate it, so let's accept READY=1 anywhere in
the output stream, not just as the last line.
Signed-off-by: Ed Santiago <santiago@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- apiv2 - the 'ten /info requests' test is flaking often,
taking ~8 seconds (our limit is 7, up from 5 a few weeks
ago). Brent suggested that the first /info call might be
expensive, because it needs to access storage. So, let's
prime it by running one /info outside the timing loop.
And, because even that continues to fail, bump it up
to 10 seconds and file #8076 to track the slowdown.
- toolbox test - WaitForReady() has timed out, even on one
occasion causing a run failure because it failed 3 times.
Solution: bump up timeout from 2s to 5s. Not really great,
but CI systems are underpowered, and it's not unreasonable
that 2s might be too low.
- sdnotify test - add a 'podman wait' between stop & rm.
This may prevent a "cannot rm container as it is running"
race condition.
While working on this, Brent and I noticed a few ways that
test-apiv2 logging can be improved:
- test name: when request is POST, display the jsonified
parameters, not the original input ones. This should
make it much easier to reproduce failures.
- use curl's "--write-out" option to capture http code,
content type, and request time. We were getting the
first two via grep from logged headers; this is cleaner.
And there was no other way to get timing. We now include
the timing as X-Response-Time in the log file.
- abort on *any* curl error, not just 7 (cannot connect).
Any error at all from curl is bad news.
Signed-off-by: Ed Santiago <santiago@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- run --userns=keep-id: confirm that $HOME gets set (#8013)
- inspect: confirm that JSON output is a sane number of
lines (10 or more), not an unreadable one-liner (#8011
and #8021). Do so with image, pod, network, volume
because the code paths might be different.
- cgroups: confirm that 'run' preserves cgroup manager (#7970)
- sdnotify: reenable tests, and hope CI doesn't hang. This
test was disabled on August 18 because CI jobs were hanging
and timing out. My suspicion was that it was #7316, which
in turn seems to have hinged on conmon #182. The latter
was merged on Sep 16, so let's cross our fingers and see
what happens.
Also: remove inaccurate warning from a networking test.
And, wow, fix is_cgroupsv2(), it has never actually worked.
Signed-off-by: Ed Santiago <santiago@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CI and system tests currently pull some images from docker.io.
Eliminate that, by:
- building a custom image containing much of what we need
for testing; and
- copying other needed images to quay.io
(Reason: effective 2020-11-01 docker.io will limit the
number of image pulls).
The principal change is to create a new quay.io/libpod/testimage,
using the new test/system/build-testimage script, instead of
relying on quay.io/libpod/alpine_labels. We also switch to
using a hardcoded :YYYYMMDD tag, instead of :latest, in an
attempt to futureproof our CI. This image includes 'httpd'
from busybox-extras, which we use in our networking test
(previously we had to pull and run busybox from docker.io).
The testimage can and should be extended as needed for future
tests, e.g. adding test file content or other useful tools.
For the '--pull' tests which require actually pulling from
the registry, I've created an image with the same name but
tagged :00000000 so it will never be pulled by default.
Since this image is only used minimally, it's just busybox.
Unfortunately there remain two cases we cannot solve in
this tiny alpine-based image:
1) docker registry
2) systemd
For those, I've (manually) run:
podman pull [ docker.io/library/registry:2.7 | registry.fedoraproject.org/fedora:31 ]
podman tag !$ quay.io/...
podman push !$
...and amended the calling tests accordingly.
I've tried to make the the smallest reasonable diff, not the
smallest possible one. I hope it's a reasonable tradeoff.
Signed-off-by: Ed Santiago <santiago@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
info, images, run, networking tests: remove some skip_if_remote()s
that were added in the varlink days. All of these tests now seem
to work with APIv2.
help test: check that first output line from 'podman --help'
is the program description (regression check for #7273).
load test: clean up stray images, rewrite test to make it conform
to existing convention. In the process, discover and file #7337
exec test (and networking): file #7360, and add FIXME comment
to skip()s suggesting evaluating those tests once that is fixed.
pod test: now that #6328 is fixed, use 'podman pod inspect --format'
instead of relying on jq
Various other tests: add an explanation of why test is disabled
so we can more easily distinguish "this will never be meaningful
under remote" vs "hey, doesn't work for now, but maybe someday".
Signed-off-by: Ed Santiago <santiago@redhat.com>
|
|
|
|
|
|
|
|
| |
Some CI tests are hanging, timing out in 60 or 120 minutes.
I wonder if it's #7316, the bug where all podman commands
hang forever if NOTIFY_SOCKET is set?
Signed-off-by: Ed Santiago <santiago@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Oops. PR #6693 (sdnotify) added tests, but they were disabled
due to broken crun on f31. I tried for three weeks to get a
magic CI:IMG PR to update crun on the CI VMs ... but in that
time I forgot to actually enable those new tests.
This PR removes a 'skip', replacing it with a check that systemd
is running plus one more to make sure our runtime is crun. It
looks like sdnotify just doesn't work on Ubuntu (it hangs), and
my guess is that it's a crun/runc issue.
I also changed the test image from fedora:latest to :31, because,
sigh, fedora:latest removed the systemd-notify tool.
WARNING WARNING WARNING: the symptom of a missing systemd-notify
is that podman will hang forever, not even stopped by the timeout
command in podman_run! (Filed: #7316). This means that if the
sdnotify-in-container test ever fails, the symptom will be that
Cirrus itself will time out (2 hours?). This is horrible. I
don't know what to do about it other than push for a fix for 7316.
Signed-off-by: Ed Santiago <santiago@redhat.com>
|
|
Signed-off-by: Ed Santiago <santiago@redhat.com>
|