« Ankle bracelets for aliens | Main | How not to act in the gym »
March 3, 2005
Curse you, OpenSSL error stack
The passive network capture system I've been working on has two features that have an interesting interaction:- It decrypts SSL/TLS transactions that it captures.
- It delivers the captured data via SSL/TLS.
We had a report that the SSL delivery connection was failing with the following error:
error:0407106B:rsa routines:RSA_padding_check_PKCS1_type_2:block type is not 02 error:04065072:rsa routines:RSA_EAY_PRIVATE_DECRYPT:padding check failed
This doesn't make any sense, though, because the capture system acts as an SSL client and doesn't do any RSA decryption. Also, it only happens when we're decrypting SSL. If we're just capturing HTTP data, or you don't have the SSL keys, then the system works perfectly.
It should be clear at this point that we're getting some kind of error bleedthrough from the SSL decryption, but how? We need one more piece of information to work it out: it only happens when the delivery socket is in non-blocking mode. If it's in blocking mode, everything works great.
What's happening is this: it's a result of the
way that OpenSSL handles errors. It maintains a per-thread (static in
our case) error stack. When you call SSL_get_error(r,ssl) it combines
the information from r,ssl, and the error stack to decide what to
return. Now, here's the important point: the error stack isn't
cleared automatically on the call to SSL_write().
So, here's the sequence of events:
- We call
RSA_private_decrypt()to decrypt the connection. - The
RSA_private_decrypt()fails, populating the error stack. - Sometime later we call
SSL_write()to deliver the data. SSL_write()encounters a blocking condition. This:- sets errno to
EAGAIN (35) - returns -1
- leaves the error stack untouched.
- sets errno to
- When we call
SSL_get_error(), we get the error from (2) because that's what's on the error stack. - Since we're getting a totally unexpected error, we do the conservative thing and abort the connection.
This doesn't happen in blocking mode because you never return an error in step 4 (unless something went really wrong internally).
This problem doesn't occur normally for two reasons. First,
generally when you encounter
an OpenSSL error you call ERR_get_error() to find out
what went wrong. ERR_get_error() clears the error
stack as a side effect. We didn't bother to call it in the RSA decryption
code in step (1) because we know what went wrong—the encryption
block is badly formatted somehow—and there's nothing to do about it.
Second, when something goes wrong in an SSL connection, you typically
just throw the connection away and when you create a new
connection SSL_connect() clears the error stack
as a side effect.
There's a simple one line fix: call ERR_get_error()
in step 1 to collect the error and clear the error stack.
As a belt-and-suspenders move, we also clear the error
stack before SSL_write() by calling ERR_clear_error(), just in case there's some other place we've forgotten to collect
the error.
Isn't programming fun?
Posted by ekr at March 3, 2005 7:21 AM | Filed under:
Comments
Note that ERR_get_error does not clear the entire error stack, it just pops the top item off the stack. So to safely use OpenSSL you must always call ERR_clear_error() after any function fails when you've dealt with the error condition.
There was a nifty bug in mod_ssl caused by the same issue a while back. Global state sucks.
Posted by: Joe Orton at March 3, 2005 1:14 PM
OpenSSL is a very pooly designed API. Eric did it ad hoc, it accreted more cruft, and now it's a shambles. IMHO, of course.
Posted by: Mordy at March 4, 2005 11:20 AM