httpclient sockets stuck in CLOSE_WAIT

Paul B. Henson henson at cpp.edu
Tue Jun 30 19:49:53 UTC 2020


We recently ran into an issue where CAS proxy authentication was failing. The idp was not making the call back with the PGT to the client, and the following was seen in the logs:

2020-06-29 13:30:34,779 - 2620:df:8000:ff10:0:1:247:16/86C949E906E495B3A7742481EB67A7E9 - INFO [net.shibboleth.idp.cas.flow.impl.ValidateProxyCallbackAction:139] - Proxy authentication failed for https://www.idm.unx.cpp.edu/cas_pgt: java.security.GeneralSecurityException: IO error

It looks kind of like this issue:

	https://issues.shibboleth.net/jira/browse/IDP-1483

However, the sockets in CLOSE_WAIT never go away. This is a list from my test system yesterday:

java       9768   tomcat   83u  IPv6 6247871      0t0  TCP 134.71.246.213:58478->134.71.247.16:https (CLOSE_WAIT)      
java       9768   tomcat  254u  IPv6 6245017      0t0  TCP 134.71.246.213:58460->134.71.247.16:https (CLOSE_WAIT)      
java       9768   tomcat  281u  IPv6 6227710      0t0  TCP 134.71.246.213:58416->134.71.247.16:https (CLOSE_WAIT)      
java       9768   tomcat  321u  IPv6 6242010      0t0  TCP 134.71.246.213:58476->134.71.247.16:https (CLOSE_WAIT)      
java       9768   tomcat  328u  IPv6 6249798      0t0  TCP 134.71.246.213:58494->134.71.247.16:https (CLOSE_WAIT)      

And this is today, about 24 hours later:

java       9768   tomcat   73u  IPv6 6421411      0t0  TCP 134.71.246.213:60386->134.71.247.16:https (CLOSE_WAIT)
java       9768   tomcat   83u  IPv6 6247871      0t0  TCP 134.71.246.213:58478->134.71.247.16:https (CLOSE_WAIT)
java       9768   tomcat  254u  IPv6 6245017      0t0  TCP 134.71.246.213:58460->134.71.247.16:https (CLOSE_WAIT)
java       9768   tomcat  281u  IPv6 6227710      0t0  TCP 134.71.246.213:58416->134.71.247.16:https (CLOSE_WAIT)
java       9768   tomcat  321u  IPv6 6242010      0t0  TCP 134.71.246.213:58476->134.71.247.16:https (CLOSE_WAIT)
java       9768   tomcat  328u  IPv6 6249798      0t0  TCP 134.71.246.213:58494->134.71.247.16:https (CLOSE_WAIT)

I don't really understand the analysis in the idp ticket. The ticket seems to be blaming the network for this issue? That doesn't sound correct, a socket stuck in CLOSE_WAIT is the fault of the application, not the network:

https://doc.nuxeo.com/blog/using-httpclient-properly-avoid-closewait-tcp-connections/

https://stackoverflow.com/questions/5636316/troubleshooting-connections-stuck-in-close-wait-status

Basically, the web server on the CAS client side closes the connection with a FIN, but the http client on the idp never closes its side, leaving the connection in half open state.

We have not updated the IDP version recently, but we did recently upgrade tomcat, java, and the kernel. I'm going to start swapping versions around and see if I can isolate what might have caused this to pop up. For now, I took the advice of the issue on our production systems and switched to a dedicated CAS proxy http client with a limit of 500, which won't fix the issue but will at least make it last longer between breakages <sigh>.

Thanks...

--
Paul B. Henson  |  (909) 979-6361  |  http://www.cpp.edu/~henson/
Operating Systems and Network Analyst  |  henson at cpp.edu
California State Polytechnic University  |  Pomona CA 91768



More information about the users mailing list