A short study about Shibboleth performances.

Dan McLaughlin dmclaughlin at tech-consortium.com
Wed Aug 28 12:19:10 EDT 2019


Hey Scott,

So I have a real world use case that seems to trigger this issue and
has caused us significant headache with the SP crashing until I found
this email thread.

We recently had to move away from SSL offloading on our load balancer
for security reasons, which meant we had to move from Layer 7 to Layer
4 load balancing so we can terminate SSL at Apache HTTPD.  The upside
is that we have more control over SSL configuration, the downside is
that we can no longer use cookies to maintain stickiness to a specific
backend webserver, instead we have to use ip-netmask stickiness which
defaults to /32 or 255.255.255.255.  So each IP that comes in gets
mapped to a specific server, which isn't great for the cases where NAT
proxies are used and all the traffic from a specific company/agency
comes from a few IP's.

We have 3 large  agencies that use our application that have 10's of
thousands of employees, each of which have from two to three outbound
IP addresses.  At most we've seen upwards of 500 concurrent sessions
from each of the agencies.  All three of these agencies load balance
all of their outbound connections to the Internet through a farm of
gateway routers.  In all three cases one of the gateways has a faster
Internet connection and is configured as the primary gateway and only
if that gateway has too many connections will it start moving
connections over to the other gateways--which it seems to do on a FIFO
basis.   We have our StorageService and Session Cache configured as
such...

 <StorageService type="Memory" id="mem" cleanupInterval="900"/>
    <DataSealer type="Versioned" path="sealer.keys" />
 <SessionCache type="StorageService" StorageService="mem"
cacheAssertions="false"
                  cacheAllowance="1800" inprocTimeout="900"
cleanupInterval="900"
  persistedAttributes="id email givenname uid lastname phonenumber division"/>

Here's where things go south...When 8-9AM or 1-2PM rolls around each
day we see a large number of users login from each of these three
agencies, and quite often we will see from 1 or more agencies a large
number of the sessions (hundreds of sessions) all the sudden change
source IP's because their primary gateway has exhausted all it's
connections and they are moved over to one of the backup
gateways--this causes all their IP's to change.    When this happens
the loadbalancer on our side sees hundreds of connections come from a
new IP address that it hasn't see before and they all get redirected
to another webserver.   We have consistentAddress="true" so all the
users are asked to login again and when this happens the SP crashes,
windows service manager restarts it, and then it happens again, and
again, until the morning rush of logins subsides.  Looking at the logs
just before the SP crashes we will see hundreds of log entries similar
to...

...
2019-06-27 09:28:20 INFO Shibboleth.SessionCache [3] [default]:
removed session (_6e7018a9cdf118e5e16912d139b23720)
2019-06-27 09:28:20 WARN Shibboleth.SessionCache [3] [default]:
duplicate insertion of revocation for session
(_6e7018a9cdf118e5e16912d139b23720)
2019-06-27 09:28:20 INFO Shibboleth.SessionCache [13] [default]:
removed session (_c8b097c1bdbcadbc597cf62f46485e64)
2019-06-27 09:28:20 INFO Shibboleth.SessionCache [12] [default]:
removed session (_1f48b292ae1ca4905073a7e47e8c645e)
2019-06-27 09:28:20 INFO Shibboleth.SessionCache [1] [default]:
removed session (_a62ac79e332c3e4abdd2499cdd245b75)
2019-06-27 09:28:20 WARN Shibboleth.SessionCache [1] [default]:
duplicate insertion of revocation for session
(_a62ac79e332c3e4abdd2499cdd245b75)
2019-06-27 09:28:20 INFO Shibboleth.SessionCache [17] [default]:
removed session (_e4f498fbc05385df75886d1c877bbd0b)
2019-06-27 09:28:20 INFO Shibboleth.SessionCache [7] [default]:
removed session (_07f974c657164d2e79746bdd6ee4df6e)
2019-06-27 09:28:20 WARN Shibboleth.SessionCache [7] [default]:
duplicate insertion of revocation for session
(_07f974c657164d2e79746bdd6ee4df6e)
2019-06-27 09:28:20 INFO Shibboleth.SessionCache [3] [default]:
removed session (_07f974c657164d2e79746bdd6ee4df6e)
...

After finding this email thread I decided to try setting
maintainReverseIndex="false" and ever since the SP hasn't crashed
once, where before it was crashing at least 3-4 times in the morning
and after lunch time.


Regards,

Dan

On Thu, Jun 9, 2016 at 9:02 AM Cantor, Scott <cantor.2 at osu.edu> wrote:
>
> > Regardless of the potential platform, we found a few configuration
> > optimizations that you might find interesting.
>
> I'm moderately surprised V3 was that much faster, if at all.
>
> FWIW, the maintainReverseIndex setting in the SP really only impacts artifical performance loads, and there's a similar setting in the IdP that can cause issues. You don't get that many sessions generated with a single NameID under normal usage and it takes thousands of them to degrade performance. It absolutely kills load tests though.
>
> -- Scott
>
> --
> To unsubscribe from this list send an email to users-unsubscribe at shibboleth.net


More information about the users mailing list