SP metadata cache keeps growing
Peter Schober
peter.schober at univie.ac.at
Thu Feb 27 14:21:43 EST 2020
* Cantor, Scott <cantor.2 at osu.edu> [2020-02-26 22:43]:
> My guess is systemd is actively killing the process perhaps? The old
> init.d wait didn't really have that effect.
Yup, that's what's happening. On a clean CentOS 8 test system:
# systemctl start shibd
Job for shibd.service failed because a timeout was exceeded.
See "systemctl status shibd.service" and "journalctl -xe" for details.
# ps aux | fgrep shibd
shibd 2136 0.0 88.0 1227988 461504 ? Rsl 18:15 5:37 /usr/sbin/shibd -f -F
# systemctl status shibd
* shibd.service - Shibboleth Service Provider Daemon
Loaded: loaded (/usr/lib/systemd/system/shibd.service; disabled; vendor preset: disabled)
Active: deactivating (stop-sigterm) (Result: timeout)
Docs: https://wiki.shibboleth.net/confluence/display/SP3/Home
Main PID: 2136 (shibd)
Tasks: 4 (limit: 26213)
Memory: 450.9M
CGroup: /system.slice/shibd.service
└─2136 /usr/sbin/shibd -f -F
Feb 27 18:15:29 centos8lxc systemd[1]: shibd.service: Service RestartSec=30s expired, scheduling restart.
Feb 27 18:15:29 centos8lxc systemd[1]: shibd.service: Scheduled restart job, restart counter is at 2.
Feb 27 18:15:29 centos8lxc systemd[1]: Stopped Shibboleth Service Provider Daemon.
Feb 27 18:15:29 centos8lxc systemd[1]: Starting Shibboleth Service Provider Daemon...
Feb 27 18:20:29 centos8lxc systemd[1]: shibd.service: Start operation timed out. Terminating.
Note the "Terminating" in the last log line and "restart counter" in
the second. Not sure that the maximum number of restart attempts is
but I stopped my tests at 6...
# systemctl show shibd | fgrep -i restarts
NRestarts=6
Even though it can sometimes seen to be done/killed for good ...
# systemctl status shibd
* shibd.service - Shibboleth Service Provider Daemon
Loaded: loaded (/usr/lib/systemd/system/shibd.service; disabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: timeout) since Thu 2020-02-27 18:28:00 UTC; 5s ago
Docs: https://wiki.shibboleth.net/confluence/display/SP3/Home
Process: 2154 ExecStart=/usr/sbin/shibd -f -F (code=killed, signal=KILL)
Main PID: 2154 (code=killed, signal=KILL)
Feb 27 18:28:00 centos8lxc systemd[1]: shibd.service: Main process exited, code=killed, status=9/KILL
Feb 27 18:28:00 centos8lxc systemd[1]: shibd.service: Failed with result 'timeout'.
Feb 27 18:28:00 centos8lxc systemd[1]: Failed to start Shibboleth Service Provider Daemon.
This can seemingly go on forever, ultimately filling the file system
with identical copies of the cached metadata.
# ls -l /var/cache/shibboleth/
-rw-r--r-- 1 shibd shibd 54404390 Feb 27 18:28 aconet-metadata.xml.3ac6
-rw-r--r-- 1 shibd shibd 54404390 Feb 27 18:09 aconet-metadata.xml.5711
-rw-r--r-- 1 shibd shibd 54404390 Feb 27 18:22 aconet-metadata.xml.6cb5
-rw-r--r-- 1 shibd shibd 54404390 Feb 27 18:15 aconet-metadata.xml.71f2
-rw-r--r-- 1 shibd shibd 54404390 Feb 27 18:02 aconet-metadata.xml.a620
As for requiring a manual override by the deployer:
I note that shibd uses systemd's notify protocol:
# systemctl show shibd | fgrep -i type
Type=notify
And that the systemd.service docs[1] mention that services can use that
channel to extend the timeout as long as needed by re-sending messages:
> If a service of Type=notify sends "EXTEND_TIMEOUT_USEC=…", this may
> cause the start time to be extended beyond TimeoutStartSec=. The first
> receipt of this message must occur before TimeoutStartSec= is
> exceeded, and once the start time has exended beyond TimeoutStartSec=,
> the service manager will allow the service to continue to start,
> provided the service repeats "EXTEND_TIMEOUT_USEC=…" within the
> interval specified until the service startup status is finished by
> "READY=1". (see sd_notify(3)).
Is that something you'd consider adding while xerces takes its time?
In the meantime I've amended the systemd wik page a bit:
https://wiki.shibboleth.net/confluence/display/SP3/LinuxSystemd
Thanks,
-peter
[1] https://www.freedesktop.org/software/systemd/man/systemd.service.html#TimeoutStartSec=
More information about the users
mailing list