Oddness with AWS hosted IdP(s)

Norman Bodnar bodnarn at gmail.com
Thu Jul 19 10:04:39 EDT 2018

We had a weird and frustrating experience last night, 7/18/2018 from
roughly 9:00 PM EST and on into 12:45 AM...

Here's the high-level:
2 Amazon instances of our Shibboleth IdP (one is literally a clone which we
gave a different elastic ip to - it's our test IdP).
IdP 3.3.2 running on Jetty
Been running fine for a year and a half.

Decided to run updates on OS.
We first took AWS snapshots of both servers before updates.
During business hours, I updated the "test" IdP VM. Updates completed
Rebooted. Tested. All is well. The exact procedure should be fine to do on
Production IdP after hours.

Ran updates on Production IdP at 8 PM. Success.
Restarted Production IdP at 9 PM. Failure. shibbolth-idp not starting. Kept
getting "Negative delay" message (see below).

That was odd, but it got worse.

Fell back to the non-updated Production IdP snapshot. Same thing, "Negative
Tried yet another older snapshot taken back when server was built.
"Negative delay"

Switched Elastic IP of Production to the cloned test IdP since
Shibboleth-idp service was still running. That didn't take. Had to restart
service. You guessed it, "Negative delay"
So now we were dead in the water.

Through all this, we were looking at logs. The frustrating part was,
nothing useful was being logged. idp-process.log was not being written to
through this entire ordeal (when restarting service). That was definitely
not "normal."

Our clone had some debugging options turned on and yet idp-process.log on
that server was also not budging.

And then, to top off the night, at around 1:05 AM, ALL of our AWS-based
IdPs suddenly simply "worked" - after restarting shibboleth-idp service.
Logs were being written to as well.

They all worked, production, clone, older snapshots (under other IPs so we
could compare them all).

Where am I going with this?
1) Any one else experience any issues last night (AWS-based IdPs)? Yes, we
created a ticket with them, nothing found yet.
2) Could it have been bad metadata from someone? We are federated with
Incommon and have about 6 other SPs working successfully.
3) With none of the normal shibboleth logs producing any output, what other
areas/logs could I troubleshoot?
4) "Negative delay"??

The thing is, we obviously had connectivity. It could be (probably is) an
AWS thing. The other external variables would be the incoming metadata. It
surely seems to be an external influence.

I'm digging into the issue now with a short night of sleep behind me.

[root at AWSanubis xxxxxxx]# service shibboleth-idp status
Redirecting to /bin/systemctl status shibboleth-idp.service
 shibboleth-idp.service - Shibboleth Identity Provider
   Loaded: loaded (/etc/systemd/system/shibboleth-idp.service; enabled;
vendor preset: disabled)
   Active: inactive (dead) since Wed 2018-07-18 23:47:01 EDT; 14min ago
  Process: 2166 ExecStart=/usr/bin/java -jar /opt/jetty/start.jar
jetty.home=/opt/jetty jetty.base=/opt/shibboleth-idp/jetty-base
(code=exited, status=0/SUCCESS)
 Main PID: 2166 (code=exited, status=0/SUCCESS)

Jul 18 23:47:01 AWSanubis java[2166]: at
Jul 18 23:47:01 AWSanubis java[2166]: at
Jul 18 23:47:01 AWSanubis java[2166]: at
java.security.AccessController.doPrivileged(Native Method)
Jul 18 23:47:01 AWSanubis java[2166]: at
Jul 18 23:47:01 AWSanubis java[2166]: Caused by:
java.lang.IllegalArgumentException: Negative delay.
Jul 18 23:47:01 AWSanubis java[2166]: at
Jul 18 23:47:01 AWSanubis java[2166]: at
Jul 18 23:47:01 AWSanubis java[2166]: at
Jul 18 23:47:01 AWSanubis java[2166]: at
Jul 18 23:47:01 AWSanubis java[2166]: ... 15 more
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://shibboleth.net/pipermail/users/attachments/20180719/a1f63253/attachment.html>

More information about the users mailing list