Re: [Python-Dev] frozenset C API?
On 9/5/07, Bill Janssen
It's actually easier to do all or nothing. I'm tempted to just report 'critical' extensions.
Simpler to provide them all, though I should note that the purpose of the information provided here is mainly for authorization/accounting purposes, not for "other" use of the certificate. If that's desired, they should pull the binary form of the certificate (there's an interface for that), and use M2Crypto or PyOpenSSL to decode it in general. This certificate has already been validated; the issue is how to get critical information to the app so it can make authorization decisions (like subjectAltName when the subject field is empty). Reporting non-critical extensions like "extended key usage" is nifty, but seems pointless.
RFC 2818 """If a subjectAltName extension of type dNSName is present, that MUST be used as the identity. Otherwise, the (most specific) Common Name field in the Subject field of the certificate MUST be used. Although the use of the Common Name is existing practice, it is deprecated and Certification Authorities are encouraged to use the dNSName instead. """ This is from an explanation of how to do hostname verification when doing HTTPS requests. HTTPS clients MUST do this in order to be compliant. Is an HTTPS client not in your list of use cases? """In general, HTTP/TLS requests are generated by dereferencing a URI. As a consequence, the hostname for the server is known to the client. If the hostname is available, the client MUST check it against the server's identity as presented in the server's Certificate message, in order to prevent man-in-the-middle attacks.""" I really don't understand why you would not expose all data in the certificate. It seems totally obvious. The data is there for a reason. I want the subjectAltName. Probably other people want other stuff. Why cripple it? Please include it all. -- Christopher Armstrong International Man of Twistery http://radix.twistedmatrix.com/ http://twistedmatrix.com/ http://canonical.com/
I really don't understand why you would not expose all data in the certificate.
You mean, providing the entire certificate as a blob? That is planned (although perhaps not implemented). Or do you mean "expose all data in a structured manner". BECAUSE IT'S NOT POSSIBLE. Sorry for shouting, but people don't ever get the notion of "extension".
It seems totally obvious. The data is there for a reason. I want the subjectAltName. Probably other people want other stuff. Why cripple it? Please include it all.
That's not possible. You can get the whole thing as a blob, and then you have to decode it yourself if something you want is not decoded. Regards, Martin
On 05:03 pm, martin@v.loewis.de wrote:
I really don't understand why you would not expose all data in the certificate.
You mean, providing the entire certificate as a blob? That is planned (although perhaps not implemented).
Or do you mean "expose all data in a structured manner". BECAUSE IT'S NOT POSSIBLE. Sorry for shouting, but people don't ever get the notion of "extension".
"structure" is a relative term. A typical way to deal with extensions unknown to the implementation is to provide ways to deal with the *extension-specific* parts of the data in question, c.f. http://java.sun.com/j2se/1.4.2/docs/api/java/security/cert/X509Extension.htm... Exposing the entire certificate object as a blob so that some *other* library could parse it *again* seems like just giving up. However, as to the specific issue of subjectAltName which Chris first mentioned: if HTTPS isn't an important specification to take into account while designing an SSL layer for Python, then I can't imagine what is. subjectAltName should be directly supported regardless of how it deals with unknown extensions.
It seems totally obvious. The data is there for a reason. I want the subjectAltName. Probably other people want other stuff. Why cripple it? Please include it all.
That's not possible. You can get the whole thing as a blob, and then you have to decode it yourself if something you want is not decoded.
Something very much like that is certainly possible, and has been done in numerous other places (including the Java implementation linked above). Providing a semantically rich interface to every possible X509 extension is of course ridiculous, but I don't think that's what anyone is actually proposing here.
On 9/6/07, "Martin v. Löwis"
You mean, providing the entire certificate as a blob? That is planned (although perhaps not implemented).
Or do you mean "expose all data in a structured manner". BECAUSE IT'S NOT POSSIBLE. Sorry for shouting, but people don't ever get the notion of "extension".
It seems totally obvious. The data is there for a reason. I want the subjectAltName. Probably other people want other stuff. Why cripple it? Please include it all.
That's not possible. You can get the whole thing as a blob, and then you have to decode it yourself if something you want is not decoded.
Sorry, I guess I thought it was obvious. Please let me get at the bytes of just the unknown-to-ssl-module extension without forcing me to write an entire general ASN.1 certificate parser or use another (incomplete) one. Many extensions have simple data in them that is trivial to parse alone. -- Christopher Armstrong International Man of Twistery http://radix.twistedmatrix.com/ http://twistedmatrix.com/ http://canonical.com/
RFC 2818
"""If a subjectAltName extension of type dNSName is present, that MUST be used as the identity. Otherwise, the (most specific) Common Name field in the Subject field of the certificate MUST be used. Although the use of the Common Name is existing practice, it is deprecated and Certification Authorities are encouraged to use the dNSName instead. """
Yes, subjectAltName is a big one. But I think it may be the only extension I'll expose. The issue is that I don't see a generic way of mapping extension X into Python data structure Y; each one needs to be handled specially. If you can see a way around this, please speak up!
I really don't understand why you would not expose all data in the certificate. It seems totally obvious. The data is there for a reason. I want the subjectAltName. Probably other people want other stuff. Why cripple it? Please include it all.
I intend to "include it all", by giving you a way to pull the full DER form of the certificate into Python. But a number of fields in the certificate have nothing to do with authorization, like the signature, which has already been used for validation. So I don't intend to try to convert them into Python-friendly forms. Applications which want to use that information already need to have a more powerful library, like M2Crypto or PyOpenSSL, available; they can simply work with the DER form of the certificate. Bill
On 05:15 pm, janssen@parc.com wrote:
RFC 2818
"""If a subjectAltName extension of type dNSName is present, that MUST be used as the identity. Otherwise, the (most specific) Common Name field in the Subject field of the certificate MUST be used. Although the use of the Common Name is existing practice, it is deprecated and Certification Authorities are encouraged to use the dNSName instead. """
Yes, subjectAltName is a big one. But I think it may be the only extension I'll expose. The issue is that I don't see a generic way of mapping extension X into Python data structure Y; each one needs to be handled specially. If you can see a way around this, please speak up!
Well, I can't speak for Chris, but that will certainly make *me* happier :).
I intend to "include it all", by giving you a way to pull the full DER form of the certificate into Python. But a number of fields in the certificate have nothing to do with authorization, like the signature, which has already been used for validation. So I don't intend to try to convert them into Python-friendly forms. Applications which want to use that information already need to have a more powerful library, like M2Crypto or PyOpenSSL, available; they can simply work with the DER form of the certificate.
When you say "the full DER form", are you simply referring to the full blob, or a broken-down representation by key and by extension? This begs the question: M2Crypto and PyOpenSSL already do what you're proposing to do, as far as I can tell, and are, as you say, "more powerful". There are issues with each (and issues with the GNU TLS bindings too, which I notice you didn't mention...) Speaking of issues, PyOpenSSL, for example, does not expose subjectAltName :). This has been a long thread, so I may have missed posts where this was already discussed, but even if I'm repeating this, I think it deserves to be beaten to death. *Why* are you trying to bring the number of (potentially buggy, incomplete) Python SSL bindings to 4, rather than adopting one of the existing ones and implementing a simple wrapper on top of it? PyOpenSSL, in particular, is both a popular de-facto standard *and* almost completely unmaintained; python's standard library could absorb/improve it with little fuss.
When you say "the full DER form", are you simply referring to the full blob, or a broken-down representation by key and by extension?
The full blob.
This begs the question: M2Crypto and PyOpenSSL already do what you're proposing to do, as far as I can tell, and are, as you say, "more powerful".
I'm trying to give the application the ability to do some level of authorization without requiring either of those packages. Like being able to tell who's on the other side of the connection :-). Right now, I think the right fields to expose are "subject" (I see little point to exposing "issuer"), "notAfter" (you're always guaranteed to be after "notBefore", or the cert wouldn't validate, so I see little point to exposing that, but "notAfter" can be used after the connection has been established), subjectAltName if present, and perhaps the certificate's serial number. I don't see how the other fields in the cert can be profitably used. Anything else you want, you can pull over the DER blob and look into it.
PyOpenSSL, in particular, is both a popular de-facto standard *and* almost completely unmaintained; python's standard library could absorb/improve it with little fuss.
Good idea, go for it! A full wrapper for OpenSSL is beyond the scope of my ambition; I'm simply trying to add a simple fix to what's already in the standard library. Bill
Sorry for the late response. As always, I have a lot of other stuff going on at the moment, but I'm very interested in this subject. On 6 Sep, 06:15 pm, janssen@parc.com wrote:
PyOpenSSL, in particular, is both a popular de-facto standard *and* almost completely unmaintained; python's standard library could absorb/improve it with little fuss.
Good idea, go for it! A full wrapper for OpenSSL is beyond the scope of my ambition; I'm simply trying to add a simple fix to what's already in the standard library.
I guess I'd like to know two things. One, what *is* the scope of your amibition? I feel silly for asking, because I am pretty sure that somewhere in the beginning of this thread I missed either a proposal, a PEP reference, or a ticket number, but I've poked around a little and I can't seem to find it. Can you provide a reference, or describe what it is you're trying to do? Two, what's the scope of "the" plans for the SSL module in general for Python? I think I misinterpreted several things that you said as "the plan" rather than your own personal requirements: but if in reality, I can "go for it", I'd really like to help make the stdlib SSL module to be a really good, full-featured OpenSSL implementation for Python so we can have it elsewhere. (If I recall correctly you mentioned you'd like to use it with earlier Python versions as well...?) Many of the things that you recommend using another SSL library for, like pulling out arbitrary extensions, are incredibly unweildy or flat- out broken in these libraries. It's not that I mind going to a different source for this functionality; it's that in many cases, there *isn't* another source :). I think I might have said this already, but subjectAltName, for example, isn't exposed in any way by PyOpenSSL. I didn't particularly want to start my own brand-new SSL wrapper project, and contributing to the actively-maintained stdlib implementation is a lot more appealing than forking the moribund PyOpenSSL. However, even with lots of help on the maintenance, converting the current SSL module into a complete SSL library is a lot of work. Here are the questions that I'd like answers to before starting to think seriously about it: * Is this idea even congruent with the overall goals of other developers interested in SSL for Python? If not, I'm obviously barking up the wrong tree. * Would it be possible to distribute as a separate library? (I think I remember Bill saying something about that already...) * When would such work have to be completed by to fit into the 2.6 release? (I just want a rough estimate, here.) * Should someone - and I guess by someone I mean me - write up a PEP describing this? My own design for an SSL wrapper - although this simply a Python layer around PyOpenSSL - is here: http://twistedmatrix.com/trac/browser/trunk/twisted/internet/_sslverify.py This isn't really complete - in particular, the documentation is lacking, and it can't implement the stuff PyOpenSSL is missing - but I definitely like the idea of having objects for DNs, certificates, CRs, keys, key pairs, and the ubiquitous certificate-plus-matching-private- key-in-one-file that you need to run an HTTPS server :). If I am going to write a PEP, it will look a lot like that file. _sslverify was originally designed for a system that does lots of automatic signing, so I am particularly interested in it being easy to implement a method like PrivateCertificate.signCertificateRequest - it's always such a pain to get all the calls for signing a CR in any given library *just so*.
This begs the question: M2Crypto and PyOpenSSL already do what you're proposing to do, as far as I can tell, and are, as you say, "more powerful".
To clarify my point here, when I say that they "already do" what you're doing, what I mean is, they already wrap SSL, and you are trying to wrap SSL :).
I'm trying to give the application the ability to do some level of authorization without requiring either of those packages.
I'd say "why wouldn't you want to require either of those packages?" but actually, I know why you wouldn't want to, and it's that they're bad. So, given that we don't want to require them, wouldn't it be nice if we didn't need to require them at all? :).
Like being able to tell who's on the other side of the connection :-). Right now, I think the right fields to expose are
I don't quite understand what you mean by "right" fields. Right fields for what use case? This definitely isn't "right" for what I want to use SSL for.
"subject" (I see little point to exposing "issuer"),
This is a good example of what I mean. For HTTPS, the relationship between the subject and the issuer is moot, but in my own projects, the relationship is very interesting. Specifically, properties of the issuer define what properties the subject may have, in the verification scheme for Vertex ( http://divmod.org/trac/wiki/DivmodVertex ). (On the other hand, Vertex requires STARTTLS, so it itself can't be an *actual* use-case for this SSL library until it also starts supporting mid- connection TLS startup.) I can understand that you might not have use-cases for exposing these features, but your phrasing suggests that it would be a bad idea to expose them, not just that it's too much work. Am I misinterpreting? Are you just saying it isn't worth the work at this point?
"notAfter" (you're always guaranteed to be after "notBefore", or the cert wouldn't validate, so I see little point to exposing that, but "notAfter" can be used after the connection has been established),
Wouldn't it be nice to know *why* the cert didn't validate? To provide the user with a message including the notBefore date, in case their clock is set wrong or something?
I don't see how the other fields in the cert can be profitably used.
The entire idea of "extensions" is pretty direct about the fact that the original implementor need not understand their profitable use :).
When you say "the full DER form", are you simply referring to the full blob, or a broken-down representation by key and by extension?
The full blob.
Obviously, I think the broken-down representation would be nicer :). I know I'll have to wrangle with a bit of ASN.1 if I want to get anything useful out of most extensions, but if it's just the extension data there are a lot of cases where I think I could fake it. Re-parsing the whole DER is going to require a real, full-on ASN.1 library.
One, what *is* the scope of your amibition? I feel silly for asking, because I am pretty sure that somewhere in the beginning of this thread I missed either a proposal, a PEP reference, or a ticket number, but I've poked around a little and I can't seem to find it. Can you provide a reference, or describe what it is you're trying to do?
Sorry about that. We kind of did this on the fly at the Python sprint. I was trying to fix two problems: One, that the current socket.ssl support didn't validate certificates, and two, that you couldn't do server-side SSL with it. I'm only interested in that aspect, and in the simplest possible solution to those problems. I don't want to provide user validation callbacks, or arbitrary certificate decoding, or general-purpose crypto, or support for building automatic CA systems, or wrapping most of that great grab-bag of useful stuff called OpenSSL. Just fix the core issues with socket.ssl. Along the way I've found a nasty little threading/malloc bug in the existing code, and fixed that. I've added real documentation for the existing functionality. I've gone around with you and Martin, mainly, on what information to expose from the validated certificate, to support authorization and accounting (the answer so far: "notAfter", "subject", and "subjectAltName", if it's there).
I'd really like to help make the stdlib SSL module to be a really good, full-featured OpenSSL implementation for Python so we can have it elsewhere.
Well, remember, it's just a socket-layer wrapper for TLS, it's not an "OpenSSL implementation", by which I suppose you mean a full wrapper for OpenSSL, much like PyOpenSSL is supposed to be. For that purpose, doesn't it make more sense to to extend/fix PyOpenSSL, rather than try to grow the deliberately limited-purpose socket.ssl support into another version of that? Can't it be revived, if it is in fact moribund?
* Would it be possible to distribute as a separate library? (I think I remember Bill saying something about that already...)
Just to be clear that what you seem to want to work on and what I'm working on seem to be two different things... I plan to build a back-port of the improved socket.ssl support as a standalone package for 2.3 (because I need to use it on OS X 10.4).
I'd say "why wouldn't you want to require either of those packages?" but actually, I know why you wouldn't want to, and it's that they're bad.
It's that they are too big and complicated to easily see how to fix. But that seems to be a side-effect of trying to wrap all of OpenSSL, which is a big, evolving project.
Wouldn't it be nice to know *why* the cert didn't validate? To provide
Yes, so I've put in a bit of work making sure the OpenSSL errors are properly relayed back to the Python application.
The entire idea of "extensions" is pretty direct about the fact that the original implementor need not understand their profitable use :).
Not really. Each extension is proposed, debated, and approved before it's added to the spec for extensions. My idea is that as support for various extensions appear in OpenSSL, we can evaluate them and see if they are worth supporting in Python.
Specifically, properties of the issuer define what properties the subject may have, in the verification scheme for Vertex ( http://divmod.org/trac/wiki/DivmodVertex )
I didn't see a write-up of your scheme at that URL; can you point me to a particular page in the Wiki which describes the use case? I should point out that we're (actually, Greg Smith) also wrapping another chunk of the OpenSSL library for hashing. And last week I suggested that we might wrap yet another chunk for doing cryptography. This chunk-by-chunk approach might be a good way to go. If a chunk that did general X509 certificate munging did appear, I'd be happy to change the SSL support to use it. Bill
By the way, if you're offering to help with this, there are a couple of things I could use some help with. I scratched my head a bit about how to turn the "othername" possibility of a subjectAltName into a Python data structure, using the OpenSSL C code, and finally gave up. If you could provide a C function that does that, I'd be very grateful. And there's a similar issue with the "permanent identifier" defined in RFC 4043. I don't see how to iterate over an ASN1 sequence using the OpenSSL C code -- if you can figure out how to do that and provide a C function to translate that field in a certificate into a Python data structure, it would also be a great help. Bill
By the way, I think the hostname matching provisions of 2818 (which is, after all, only an informational RFC, not a standard) are poorly thought out. Many machines have more hostnames than you can shake a stick at, and often provide certs with the wrong hostname in them (usually because they have no way to determine what the *right* hostname is, from inside that machine). Bill
On Thu, Sep 06, 2007, Bill Janssen wrote:
By the way, I think the hostname matching provisions of 2818 (which is, after all, only an informational RFC, not a standard) are poorly thought out. Many machines have more hostnames than you can shake a stick at, and often provide certs with the wrong hostname in them (usually because they have no way to determine what the *right* hostname is, from inside that machine).
...which is why you pretty much need to have a canonical hostname mapped to each IP you're using on a machine. Basically, you need to map the hostname you intend to use to an IP, then do reverse-DNS to find out whether the hostname is in fact the canonical hostname. If not, you're using the wrong hostname on your cert. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "Many customs in this life persist because they ease friction and promote productivity as a result of universal agreement, and whether they are precisely the optimal choices is much less important." --Henry Spencer http://www.lysator.liu.se/c/ten-commandments.html
By the way, I think the hostname matching provisions of 2818 (which is, after all, only an informational RFC, not a standard) are poorly thought out. Many machines have more hostnames than you can shake a stick at, and often provide certs with the wrong hostname in them (usually because they have no way to determine what the *right* hostname is, from inside that machine).
...which is why you pretty much need to have a canonical hostname mapped to each IP you're using on a machine. Basically, you need to map the hostname you intend to use to an IP, then do reverse-DNS to find out whether the hostname is in fact the canonical hostname. If not, you're using the wrong hostname on your cert.
Yep. The problem is having a particular service know which certificate it should choose to use, and also to know when the network connectivity has changed. Usually, server ports are bound to wildcard IP addresses, so that they can still be reached even if the network connectivity changes (particularly true for servers running on laptops, or the Python server I'm running on my iPhone). The server has no way of knowing which IP address the client knows it as, and no way of knowing which of its multiple certificates to present, so that the name in the cert will match the name the client thought it was using. Or am I wrong? Is there some interface in the socket API which gives this information? Bill
On Wed, Sep 12, 2007, Bill Janssen wrote:
By the way, I think the hostname matching provisions of 2818 (which is, after all, only an informational RFC, not a standard) are poorly thought out. Many machines have more hostnames than you can shake a stick at, and often provide certs with the wrong hostname in them (usually because they have no way to determine what the *right* hostname is, from inside that machine).
...which is why you pretty much need to have a canonical hostname mapped to each IP you're using on a machine. Basically, you need to map the hostname you intend to use to an IP, then do reverse-DNS to find out whether the hostname is in fact the canonical hostname. If not, you're using the wrong hostname on your cert.
Yep. The problem is having a particular service know which certificate it should choose to use, and also to know when the network connectivity has changed. Usually, server ports are bound to wildcard IP addresses, so that they can still be reached even if the network connectivity changes (particularly true for servers running on laptops, or the Python server I'm running on my iPhone). The server has no way of knowing which IP address the client knows it as, and no way of knowing which of its multiple certificates to present, so that the name in the cert will match the name the client thought it was using.
My understanding is that the client tells the server which hostname it wants to use; the server should then pass down that information. That's how virtual hosting works in the first place. The only difference with SSL is that the hostname must have a unique IP address, so that when the client does a reverse DNS to validate the IP address presented by the SSL certificate, it all comes together correctly. There are, of course, wildcard certs; I don't understand how those work. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "Many customs in this life persist because they ease friction and promote productivity as a result of universal agreement, and whether they are precisely the optimal choices is much less important." --Henry Spencer http://www.lysator.liu.se/c/ten-commandments.html
My understanding is that the client tells the server which hostname it wants to use; the server should then pass down that information. That's how virtual hosting works in the first place. The only difference with SSL is that the hostname must have a unique IP address, so that when the client does a reverse DNS to validate the IP address presented by the SSL certificate, it all comes together correctly.
Unfortunately, it does not quite work that way. The client tells the server what hostname to use only *after* the SSL connection has been established, and certificates being exchanged (in the Host: header). So the Host: header cannot be used for selecting what certificate to present to the client. *That* is the reason why people typically assume they have to have different IP addresses for different SSL hosts: certificate selection must be done based on IP address (which is already known before the SSL handshaking starts). There is no need for the client to do a reverse name lookup, and indeed, the client should *not* do a reverse DNS lookup to check the server's identity. Instead, it should check the host name it wants to talk to against the certificate. However, there is an alternative to using multiple IP addresses: one could also use multiple "subject alternative names", and create a certificate that lists them all.
There are, of course, wildcard certs; I don't understand how those work.
The same way: the client does *not* perform a reverse name lookup. Instead, it just matches the hostname against the name in the certificate; if the certificate is for *.python.org (say) and the client wants to talk to pypi.python.org, it matches, and hostname verification passes. It would also pass if the client wanted to talk to cheeseshop.python.org, or wiki.python.org (which all have the same IP address). Regards, Martin
However, there is an alternative to using multiple IP addresses: one could also use multiple "subject alternative names", and create a certificate that lists them all.
Unfortunately, much of the client code that does the hostname verification is wrapped up in gullible Web browsers or Java HTTPS libraries that swallowed RFC 2818 whole, and not easily accessible by applications. Does any of it recognize and accept "subject alternative name"? It's possible to at least override the default Java client-side hostname verification handling in a new application. And Python is lucky; because there was no client-side hostname verification possible, RFC 2818 hasn't been plastered into the Python standard library :-). Bill
However, there is an alternative to using multiple IP addresses: one could also use multiple "subject alternative names", and create a certificate that lists them all.
Unfortunately, much of the client code that does the hostname verification is wrapped up in gullible Web browsers or Java HTTPS libraries that swallowed RFC 2818 whole, and not easily accessible by applications. Does any of it recognize and accept "subject alternative name"?
Works fine with Firefox and MSIE. Regards, Martin
participants (5)
-
"Martin v. Löwis"
-
Aahz
-
Bill Janssen
-
Christopher Armstrong
-
glyph@divmod.com