Metadata Typo Causes Integration Headaches

Marvin Addison serac at
Tue Sep 18 07:55:37 EDT 2018

I wanted to share a few lessons from one of the most difficult SAML
integrations we've done to date that was ultimately caused by an
incorrect XML namespace in a metadata file. I'll start with the
summary of what we learned:

1. Don't maintain metadata by hand. I believe that's somewhere in the
InCommon best practices documents, but it's a best practice
regardless. We've had several minor issues over the years, but this
most recent one was a huge support black hole and I think we've
finally learned the lesson the hard way.
2. Metadata-based credential resolution is complicated by filtering
that can reduce the effective key set from what's patently defined in
metadata XML files.
3. IdP logging in the credential resolution process could be improved.

I'll share the full story because some of the details underscore the
points above. We had a SAML client using mod_auth_mellon/Lasso that
was sending an authentication request over HTTP-Redirect with a
detached signature. The signature failed to verify at the IdP, which
is a failure that happens fairly early in the authentication pipeline.
We took a fair bit of time to rule out the most common cause of
signature failures, simple keypair mismatch between SP and IdP. After
having ruled out keypair mismatch, we started looking harder at the
client, which I was unfamiliar with, under the premise that there was
a bug somewhere. We went so far as to manually verify the signature
generated by the client and determined that it was correct. As we
gained confidence that the client was sending a proper request with
correct credentials, we quickly ran out of options for further

We turned up some IdP logging categories early on to help confirm that
we could load a credential from metadata. A full sample excerpt is
available for review [1] for the curious. The following entry in
particular confirmed our metadata entry was being loaded for the SP:

2018-09-17 12:36:58,072 DEBUG
org.opensaml.saml.metadata.resolver.impl.AbstractBatchMetadataResolver:162 Metadata Resolver FilesystemMetadataResolver VTMetadata:
Resolved 1 candidates via EntityIdCriterion: EntityIdCriterion

After I saw that initially, I was satisfied that metadata resolution
was working as intended. That turned out to be a big mistake that cost
a lot of wasted time. The breakthrough came when we revisited the logs
after running out of leads with the vendor and we noticed that no
matter what logging categories we turned up we never saw any log
entries for the actual signature verification attempt. Code review
suggested that could be caused by failure to find any trusted
certificates, then we finally realized that the resolved metadata
entry was being further filtered:

2018-09-17 12:36:58,073 DEBUG
org.opensaml.saml.metadata.resolver.impl.PredicateRoleDescriptorResolver:376 Attempting to filter candidate RoleDescriptors via resolved
2018-09-17 12:36:58,073 DEBUG
org.opensaml.saml.metadata.resolver.impl.PredicateRoleDescriptorResolver:398 After predicate filtering 1 RoleDescriptors remain

There's a similar process for applying role criteria, which includes
key usage and key algorithm selectors. Unfortunately there's no
equivalent log output like the one above that indicates how many
elements remain after criteria selection, but we could infer that
something about the key criteria was likely preventing our key from
being resolved. That led us to review metadata where we noticed an
incorrect XML namespace on the KeyInfo element and children;
correcting that problem fixed everything.

It would have been immensely helpful for the trust engine to log the
number of resolved trusted certificates at DEBUG prior to attempting
signature validation; summary logging after applying criteria also
would have helped. I intend to file a Jira issue for those logging
improvements because I feel strongly that they would be generally

Marvin at Virginia Tech


More information about the users mailing list