httpclient sockets stuck in CLOSE_WAIT
Paul B. Henson
henson at cpp.edu
Tue Jun 30 19:49:53 UTC 2020
We recently ran into an issue where CAS proxy authentication was failing. The idp was not making the call back with the PGT to the client, and the following was seen in the logs:
2020-06-29 13:30:34,779 - 2620:df:8000:ff10:0:1:247:16/86C949E906E495B3A7742481EB67A7E9 - INFO [net.shibboleth.idp.cas.flow.impl.ValidateProxyCallbackAction:139] - Proxy authentication failed for https://www.idm.unx.cpp.edu/cas_pgt: java.security.GeneralSecurityException: IO error
It looks kind of like this issue:
https://issues.shibboleth.net/jira/browse/IDP-1483
However, the sockets in CLOSE_WAIT never go away. This is a list from my test system yesterday:
java 9768 tomcat 83u IPv6 6247871 0t0 TCP 134.71.246.213:58478->134.71.247.16:https (CLOSE_WAIT)
java 9768 tomcat 254u IPv6 6245017 0t0 TCP 134.71.246.213:58460->134.71.247.16:https (CLOSE_WAIT)
java 9768 tomcat 281u IPv6 6227710 0t0 TCP 134.71.246.213:58416->134.71.247.16:https (CLOSE_WAIT)
java 9768 tomcat 321u IPv6 6242010 0t0 TCP 134.71.246.213:58476->134.71.247.16:https (CLOSE_WAIT)
java 9768 tomcat 328u IPv6 6249798 0t0 TCP 134.71.246.213:58494->134.71.247.16:https (CLOSE_WAIT)
And this is today, about 24 hours later:
java 9768 tomcat 73u IPv6 6421411 0t0 TCP 134.71.246.213:60386->134.71.247.16:https (CLOSE_WAIT)
java 9768 tomcat 83u IPv6 6247871 0t0 TCP 134.71.246.213:58478->134.71.247.16:https (CLOSE_WAIT)
java 9768 tomcat 254u IPv6 6245017 0t0 TCP 134.71.246.213:58460->134.71.247.16:https (CLOSE_WAIT)
java 9768 tomcat 281u IPv6 6227710 0t0 TCP 134.71.246.213:58416->134.71.247.16:https (CLOSE_WAIT)
java 9768 tomcat 321u IPv6 6242010 0t0 TCP 134.71.246.213:58476->134.71.247.16:https (CLOSE_WAIT)
java 9768 tomcat 328u IPv6 6249798 0t0 TCP 134.71.246.213:58494->134.71.247.16:https (CLOSE_WAIT)
I don't really understand the analysis in the idp ticket. The ticket seems to be blaming the network for this issue? That doesn't sound correct, a socket stuck in CLOSE_WAIT is the fault of the application, not the network:
https://doc.nuxeo.com/blog/using-httpclient-properly-avoid-closewait-tcp-connections/
https://stackoverflow.com/questions/5636316/troubleshooting-connections-stuck-in-close-wait-status
Basically, the web server on the CAS client side closes the connection with a FIN, but the http client on the idp never closes its side, leaving the connection in half open state.
We have not updated the IDP version recently, but we did recently upgrade tomcat, java, and the kernel. I'm going to start swapping versions around and see if I can isolate what might have caused this to pop up. For now, I took the advice of the issue on our production systems and switched to a dedicated CAS proxy http client with a limit of 500, which won't fix the issue but will at least make it last longer between breakages <sigh>.
Thanks...
--
Paul B. Henson | (909) 979-6361 | http://www.cpp.edu/~henson/
Operating Systems and Network Analyst | henson at cpp.edu
California State Polytechnic University | Pomona CA 91768
More information about the users
mailing list