SAML and OAuth/OIDC Mythconceptions
I Noticed in a Class...
I was auditing that some folks started throwing around that SAML is more secure than OpenID Connect (OIDC), or OIDC is better for mobile devices because SAML is too complex, and other stuff like that. Nobody should be a SAML or OAuth biggot, and what I learned from this is that folks have a lot of misconceptions about how the technology works!
SAML wasn't the first to use the basic flow that follows. It goes back as least as far as Kerberos in MIT's Project Athena, which is the fundamental authentication behind Active Directory and others. The flow decouples authentication from the client program, so the program learns about the user without having to handle the user's credentials:
The Flow
Initiate a request
So if a user vists an application, the application knows who the user is if they already have a session. Usually that means some sort of session cookie in the browser. If their is not in a session, the application sends the user to an identity provider (IdP) looking to get the information about the user. Sending the user is done with an HTTP redirect, an HTTP 403 message that sends the browser to the IdP. This is the first really cool part! Traditional applications manage their own user identities and have to collect credentials from the user to authenticate them. Then if you start to add other factors for better assurance, like using Short-Message-Service (SMS) to deliver a one-time passcode (OTP), we have to build all of that into the application.
Share authentication and enable single sign-on
When the application trusts an IdP to handle the authentication, we can strip all of that program code out and share a single implementation, the IdP! Additional assurance like an SMS OTP can be added in one place to the IdP. Once the IdP authenticates the user the general idea is that it packages up the information and sends it piggyback on a browser redirect back to the application. So when the user lands at the application on a well-known URL the application accepts the user information an initiates a session. Here is the second cool part: when multiple applications trust the same IdP, the user does not have to reauthenticate when any subsequent application sends them there. They already have a session with the IdP! The user only has to authenticate once for all the applications, so now they have single sign-on, SSO!
Trust the IdP
The application trusts the identity information the IdP sends because the IdP digitally signs it. What that means is the IdP takes the data and digitally hashes it. A hashing algorithim does two things: it produces a small result from a large amount of data, and it always produces the same result from the same input.
The IdP encrypts the hash using a private key that only it knows from a private/public pair of keys. The encrypted hash is the signature. The signature can be decrypted with the corresponding public key which everyone in the world may have access too.
If the application can decrypt the signature with the public key, then it must have been signed by the trusted IdP so the source is good. If the hash (the decrypted signature) matches a new hash of the data, then the data has not been tampered with.
SAML Security Issues and Solutions
I've seen folks claim that SAML is secure because it sends the identity data, a.k.a. the SAML assertion, directly to the application, a.k.a. the "service provider" or SP. That is not true. SAML sends the assertion piggyback on the browser redirect. There are two problems here:
- SAML came about before encrypted connections were popular (HTTPS) so if HTTP is used the data can be hijacked in transit.
- The browser may cache the information, and it could be found (compromised) by malware on the user's computer. Maybe neither is a problem if there is no senstive information?
If information security is an issue, SAML offers encryption of the assertion. To do this the IdP needs a public key from the SP to encrypt the data with. Then only the private key the SP keeps secret can decrypt the data, so it is secure until it arrives at the SP. Even if the encrypted data is captured or cached along the way, it is going to be very difficult to read.
Pros
- Authentication takes place at the IdP, and SSO is possible.
- Encryption may be used to protect the data in transit, but you have to configure it.
Cons
- SAML creates the assertion as an XML document so it is harder to work with.
- Encrypting the assertion and decrypting it is a burden.
- There is an assumption that the assertion consumer service (ACS) URL where the IdP sends the assertion is really the service provider and hasn't been hijacked. Encrypting the assertion is the only way to know that a hijacker cannot read the identity information.
Open Authenticaion and OpenID Connect; Security Issues and Solutions
First of all while OpenID Connect (OIDC) is a full-blown standard, fundamentally all it does is add an ID token definition to Open Authentication (OAuth).
OAuth was designed to protect web services, a.k.a resource servers and also a.k.a APIs. Its job is to provide an access token with scopes which tell an API what the client program is authorized to do. The access token is digitally signed to prove where it came from and that it has not been tampered with.
The client program sends the user to an authorization server, a.k.a an IdP, using a browser redirect asking for an access token just like in SAML. The user must authenticate (if not already authenticated) and the authorization server issues (or not) an appropriate access token.
The original OAuth flow was the implicit flow, where the access token piggybacks on a browser redirect back to the client program (the application). By the time OAuth came about SSL/TLS was much more prevalent and it was thought that HTTPS was sufficient to protect the data in transit. Unfortunately the clear-text data after HTTPS decryption may still be cached by the browser on the way to the resource server, so that could be a problem! And, there is an assumption that the redirect URL where the token lands at the client program hasn't been hijacked (the same problem unencrypted SAML has).
OAuth solved both of these problems with the authorization code flow and the authorization code flow with PKCE (Proof Key for Code Exchange). Implicit flow is dead, long live implicit flow. In both of these flows the token does not travel through the browser, only a one-time authorization code intended for the client program. The authorization code is useless to anything else that sees it. In authorization code flow the client program calls the authorization server directly using HTTPS with its client id, a client secret, and the code. The authorization server verifies all three pieces and returns the token directly to the client program without passing through the browser. The redirect URL cannot be hijacked because the client uses the secret.
Native applications on mobile devices or personal computers and single-page applications running on a personal computer present a problem because they cannot keep a secret. We have no idea what kind of malware could be on these machines.
Unfortunately it may be eaen easier for malware to hijack the local URL in both native and SPA applications. In that case, it would get the authorization code and could simply ask for the token (remember, no secrets here).
PKCE fixes that by proving the client program that made the original authentication request is the same application making the token request. How it does it is incredibly simple:
- For the original request the client program randombly creates a code verifier string, a.k.a "v".
- It hashes v creating a code challenge, a.k.a. "h".
- It sends h along with the original request.
- After it gets the authorization code, it sends v with the token request.
- At the authorization server if hashing v matches h then it must be the same client program making the token request, because malware would not have the correct v.
So, both the correct URL and the original client that made the request. Good to go!
The OAuth access token carries the user id, but no other information about the user. OIDC simply allows the client to ask for an ID token as well, which carries more information about the user like their address, telephone number, etc. An ID token still carries less information than a SAML assertion can, so OIDC also specifies the authorization server must provide a /userinfo endpoint where the client program can go to get the full information about a user.
Pros
- Access tokens and ID tokens are JavaScript Web Tokens, a.k.a JWTs (pronouced jots) which are created with JavaScript Object Notation (JSON) and require a lot less processing than XML.
- The Auth-code and Auth-Code with PKCE flows is just as secure as SAML encryption because we know the redirect URL has not been hijacked.
- SAML does not support access tokens, so OAuth needs to be leveraged to protect APIs. Some authentication servers, e.g. Okta, support linking OAuth to a SAML authentication.
Cons
- Folks still use implicit flow, and it still is not secure.
- SAML sends everything in the assertion. OIDC applications get a simple ID token and need to make a separate call to the /userinfo endpoint at the authorization server to pick up the full user information. That is your next project!
Wrap-up
Well, that's all I have to say. I tried to stay out the weeds and just talk about what you need to know. I hope this little trip high over the world of SAML, OAuth, and OIDC is helpful to you, especially when it comes to throwing out the bad information a lot of folks seem to have!