General Guidance on IdP Environment Sizing
paul.fardy at utoronto.ca
Fri Sep 28 13:09:35 EDT 2018
On Thursday, September 27, 2018 1:58 PM, Scott Cantor wrote:
> If the issue is LDAP performance then the sizing in question would be on that side, not the IdP. The IdP spends most of its time signing things, it's incredibly CPU bound.
Could you expand on this? This isn't our experience. Our IdP spends most off its time awaiting LDAP and Kerberos queries. CPU utilization is low: consistently less than 10% usage of a 4CPU VM, with a few peaks of 20%. We increased CPUs for the load. We can't determine if there was any benefit.
Also, I'm relying mostly on Apache's service time, which isn't in the default LogFormat.
> The time taken to serve the request, in seconds.
This has helped find SLOW responses, though it cannot help tune for fast responses. The granularity is seconds. It would be great if we could log LDAP latency to the IdP logs.
We increased the CPUs. But my goal was to see if that would increase parallelism. I don't know the JVM. How can we know the JVM is or is not slow to serve requests because threads are waiting for a context switch into or out of an LDAP wait state. We did gain from addressing the IdP LDAP config (ldap.properties):
> idp.pool.LDAP.minSize = 40
> idp.pool.LDAP.maxSize = 100
Initially, we weren't even re-using LDAP connections. So every authn and every attribute query included connect, bind, search, close.
I think the IdP software's internal issue would be JVM multi-threading. Could context switching be slow? Can we determine if or when it's overloaded? I increased CPUs. But that's really speculating that the JVM needs help. If it's mostly waiting, it doesn't need more hardware. Our system isn't using it. I just one wonder if it might need more.
Sizing memory is also an issue. We increased our RAM before a student registration load day and we tuned tomcat: we increased the thread pool, ... which immediately increased memory usage, but was is usage or wastage?
We did some testing on our QA system with tomcat's setenv.sh:
> CATALINA_OPTS="-Didp.home=/local/shibboleth-idp -Xmx2048m -XX:+UseG1GC -Xss$((256+128))k"
setting stack size to 384K. And the thread pool is created with a lot of pre-allocated RAM. I think 384K is probably too large. Has anyone tested stack size? I think the default is 512K. I don't have a reference for that.
> I do an LDAP lookup per login, though primary authn is Kerberos protocol (much faster than LDAP), but with 200-400,000 logins per day I just have two servers live and could easily handle the load on one (physical) box.
We found our bottlenecks in LDAP latency. One box runs well for us.
Paul Fardy, Shib Admin, Info Security, ITS
University of Toronto
More information about the users