From barry at python.org Thu Nov 6 18:14:16 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 6 Nov 2008 12:14:16 -0500 Subject: [Email-SIG] [Python-3000] email libraries: use byte or unicode strings? In-Reply-To: <4912DE77.5040209@gmail.com> References: <200810281612.54570.victor.stinner@haypocalc.com> <20081029091259.7153ec82@resist.wooz.org> <20081030221726.0A0636007DF@longblack.object-craft.com.au> <2B7A0223-2FFE-416F-8AE1-7082CA2453AB@python.org> <4912DE77.5040209@gmail.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Nov 6, 2008, at 7:09 AM, Nick Coghlan wrote: > So here's a question (speaking as someone that has never had to go > near > the email module, and is unlikely to do so anytime soon): is this > something that should hold up the release of Python 3.0? Not if you're like Guido and want to get 3.0 out this year. ;) > As I see it, there are 3 options: > 1. Hold up 3.0 until you get an API for the email package that handles > Unicode vs bytes issues gracefully > 2. Drop the email package entirely from 3.0, iterate on a 3.0 > version of > it on PyPI for a while, then add the cleaned up version in 3.1 > 3. Keep the current version (issues and all) in 3.0, with fairly > strong > warnings that the API may change in 3.1 At this point I think our only option is essentially 3, keep what we have warts and all. When the precursor to the email package was being developed (at that time, called mimelib), it was initially done as a separate package and only folded into core when it was stable and fairly widely used. For email-ng (or whatever we call it) we should follow the same guidelines. Eventually email-ng will be folded back into the core and will replace the current email package. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSRMl6HEjvBPtnXfVAQIw6AP8D1ie5tOyL+2nvemxE8pEHd4HrfudqTDu xMHqi7QyT/EUfEsrK1lH4wqZhE76dbDlie6yGQWL6vrAsUPvo3xEDWCOie6+18D+ TO/G2s7jXtZeMXSXJFpCmVUE+kS2B4b5OJQgdHqQlJL5CyA3PhdeRrGMSyv38WDn bjqASX5hCxI= =bDTT -----END PGP SIGNATURE----- From barry at python.org Thu Nov 6 18:17:31 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 6 Nov 2008 12:17:31 -0500 Subject: [Email-SIG] [Python-3000] email libraries: use byte or unicode strings? In-Reply-To: <4912E192.80105@gmail.com> References: <200810281612.54570.victor.stinner@haypocalc.com> <20081029091259.7153ec82@resist.wooz.org> <20081030221726.0A0636007DF@longblack.object-craft.com.au> <2B7A0223-2FFE-416F-8AE1-7082CA2453AB@python.org> <491221F3.4040304@g.nevcal.com> <4912E192.80105@gmail.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Nov 6, 2008, at 7:22 AM, Nick Coghlan wrote: > Glenn Linderman wrote: >> Even 8-bit binary can be translated into a >> sequence of Unicode codepoints with the same numeric value, for >> example. > > No, no, no, no. Using latin-1 to tunnel binary data through Unicode > just > gets us straight back into the "is it text or bytes?" hell that is the > 8-bit string in 2.x. It defeats the entire point of making the break > between str and bytes in 3.0 in the first place. And I'll note that this is essentially how the email package in 3.0 cheats its way into some modicum of usability. It is teh suck, but it works (defined as "passes the tests" ;). > If something is potentially arbitrary binary data, we need to treat it > that way and use bytes. People are just going to have to get over > their > aesthetic objections to the leading b on their bytes literals. Heck, > be > happy you don't have to write bytes(map(ord, 'literal')) as was the > case > in the early stages of 3.0 :) > > Providing a Unicode based text API over the top for the cases where > handling malformed data isn't necessary may be convenient and a good > idea, but it shouldn't be the only API (3.0 is already guilty of > that in > a few places - we shouldn't be adding more). Right, and really it's a deeper issue. We're really only concerned with bytes vs. unicodes in headers. When talking about payloads, we get into a much more rich type hierarchy, with images, audio, byte streams, etc, etc. Message.get_payload(decode=True) doesn't know anything about that stuff, but it could. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSRMmrHEjvBPtnXfVAQLjuQQAmhi6Fz/K4MN+QBDzRgxZmX5WnSpYs2IR ZYei/S/0xxbtZbfvC0IzIeeg4BfR1SVGRYypZGWSwSOxHX08VWNKpR0QBa6oNZsm xjiW02856wK8AHAM2Lt59GHpj4qXbEFvUDjnv7/72WmUJO+yJbRPTCwUGLY5IToZ xFCftr/WWfQ= =/faa -----END PGP SIGNATURE----- From barry at python.org Thu Nov 6 18:23:03 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 6 Nov 2008 12:23:03 -0500 Subject: [Email-SIG] [Python-3000] email libraries: use byte or unicode strings? In-Reply-To: <491288FC.8090805@g.nevcal.com> References: <200810281612.54570.victor.stinner@haypocalc.com> <20081029091259.7153ec82@resist.wooz.org> <20081030221726.0A0636007DF@longblack.object-craft.com.au> <2B7A0223-2FFE-416F-8AE1-7082CA2453AB@python.org> <491221F3.4040304@g.nevcal.com> <20081105225947.50E885AC03F@longblack.object-craft.com.au> <49122ECB.5010205@g.nevcal.com> <87tzalsiz6.fsf@uwakimon.sk.tsukuba.ac.jp> <491288FC.8090805@g.nevcal.com> Message-ID: <38AB5885-C61D-4D8E-A3B8-DEBAA0063BAF@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Nov 6, 2008, at 1:04 AM, Glenn Linderman wrote: > So I would hope that the users of such Betas would quickly discover > that they were producing garbage, report it to M$, and go back to > using a release version with only the usual expectation of bugs, > inconsistencies, standards violations, and security exploits, but > not expect that Beta software is, or should be, fully compatible > with other applications that handle proper email. It's a nice thought, but it's completely impossible for real-world applications to ignore broken messages. "Be lenient in what you accept and strict in what you produce" is the only way you can operate, and the email package has a very strong design goal toward that tenant. > Did Python's 2.x mail library handle the data that you describe? > Did anyone seriously expect it to? Did Mozilla clients handle it? > Can you provide a list of email clients that handled it gracefully, > other than the same Outhouse Excess client that produced it? And if > not, why would you expect Python's 3.0 mail library to handle it? Yes, Python 2.x's email package handles broken messages, and email-ng must too. "Handling it" means: 1) never throw an exception 2) record defects in a usable way for upstream consumers of the message to handle it currently also means 3) ignore idempotency for defective messages. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSRMn93EjvBPtnXfVAQINZQP/QeaDuDI9gRK7VQwpgkSCQ/i07v8Be6EP q8Xijd5NHt34wCxZVCWp+ttAH6FrrbKSUktLvI9CBVUzYPE+T5GhPC7vvVlnp3rF JsO5tJv8qFHjJi1jlwvgxQo1KXJB/kSxNyZiKXGZ9i16RGEoqXTbj+1XVgu8MONI 0EkEpD9bIq8= =a1sq -----END PGP SIGNATURE----- From barry at python.org Thu Nov 6 19:47:41 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 6 Nov 2008 13:47:41 -0500 Subject: [Email-SIG] [Python-3000] email libraries: use byte or unicode strings? In-Reply-To: References: <200810281612.54570.victor.stinner@haypocalc.com> <20081029091259.7153ec82@resist.wooz.org> <20081030221726.0A0636007DF@longblack.object-craft.com.au> <2B7A0223-2FFE-416F-8AE1-7082CA2453AB@python.org> <4912DE77.5040209@gmail.com> <7009144E-1A85-462D-8BDB-D29A96238652@fuhm.net> Message-ID: <298E4E1A-6D03-4CF0-B79A-13608404831B@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Nov 6, 2008, at 1:15 PM, Guido van Rossum wrote: >> But if that's not the case, wouldn't it make more sense to keep >> email out of >> the initial 3.0 release, rather than put a half-broken version in >> with >> special "we can totally change the API for the next release" >> dispensation? > > Tough call. I'm inclined to give people *something* in 3.0 with the > promise we'll fix it in 3.1, rather than withholding it altogether. I think that's the right thing to do, because large parts of the API will be the same, and where ever it's possible, we should provide a migration path for the new API (e.g. DeprecationWarnings, etc.). - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSRM7zXEjvBPtnXfVAQKlMQP/YxW+AWdFb83NC9mpL3uBZrNkEygKlcp6 IoyehmucOfCmPGp8dwCkw/BP9qCoKXkFyCnMbIuLOhbyzYfPsPD822voGjeLNb2O bYPMoMSOdlUPJaV4trdGd3RR7KIYAwhXymWW1MxnkyfDZ1mNyRRJyR3SMPJiLZoL /MDfrcchcGQ= =z39Z -----END PGP SIGNATURE-----