[Python-Dev] Multilingual programming article on the Red Hat Developer blog

Wed Sep 17 07:10:08 CEST 2014

Steven D'Aprano <steve at pearwood.info> writes:

> On Wed, Sep 17, 2014 at 11:14:15AM +1000, Chris Angelico wrote:
>> On Wed, Sep 17, 2014 at 5:29 AM, R. David Murray <rdmurray at bitdance.com> wrote:
>
>> > Basically, we are pretending that the each smuggled
>> > byte is single character for string parsing purposes...but they don't
>> > match any of our parsing constants.  They are all "any character" matches
>> > in the regexes and what have you.
>> 
>> This is slightly iffy, as you can't be sure that one byte represents
>> one character, but as long as you don't much care about that, it's not
>> going to be an issue.
>
> This discussion would probably be a lot more easy to follow, with fewer 
> miscommunications, if there were some examples. Here is my example, 
> perhaps someone can tell me if I'm understanding it correctly.
>
> I want to send an email including the header line:
>
> 'Subject: “NOBODY expects the Spanish Inquisition!”'
>

  >>> from email.header import Header
  >>> h = Header('Subject: “NOBODY expects the Spanish Inquisition!”')
  >>> h.encode('utf-8')
  '=?utf-8?q?Subject=3A_=E2=80=9CNOBODY_expects_the_Spanish_Inquisition!?=\n =?utf-8?q?=E2=80=9D?='
  >>> h.encode()
  '=?utf-8?q?Subject=3A_=E2=80=9CNOBODY_expects_the_Spanish_Inquisition!?=\n =?utf-8?q?=E2=80=9D?='
  >>> h.encode('ascii')
  '=?utf-8?q?Subject=3A_=E2=80=9CNOBODY_expects_the_Spanish_Inquisition!?=\n =?utf-8?q?=E2=80=9D?='

--
Akira