[Tutor] why is unicode converted file double spaced?

Pirritano, Matthew MPirritano at ochca.com
Wed Apr 8 17:46:44 CEST 2009

Excellent! Thanks!

After specifying the output encoding as cp1252 I ran this short syntax and got zero errors. Thanks to Jon Peck from the SPSS list who also weighed in.


Matthew Pirritano, Ph.D.
Research Analyst IV
Medical Services Initiative (MSI)
Orange County Health Care Agency
(714) 568-5648

-----Original Message-----
From: tutor-bounces+mpirritano=ochca.com at python.org [mailto:tutor-bounces+mpirritano=ochca.com at python.org] On Behalf Of Kent Johnson
Sent: Wednesday, April 08, 2009 3:40 AM
To: spir
Cc: tutor at python.org
Subject: Re: [Tutor] why is unicode converted file double spaced?

On Wed, Apr 8, 2009 at 3:55 AM, spir <denis.spir at free.fr> wrote:
> Le Tue, 7 Apr 2009 17:54:42 -0400,
> Kent Johnson <kent37 at tds.net> s'exprima ainsi:
>> >     outp.write(outLine.strip()+'\n')
>> UnicodeEncodeError: 'ascii' codec can't encode characters in position
>> 640-641: ordinal not in range(128)
> Hem, sorry for stepping in the thread. Isn't the issue that lines previously decoded as utf16 are now expanded with an ASCII-by-default NL char? Thought that when one wants to cope with unicode all literals need to be  explicitely formed as unicode strings.

Yes, we have not been explicit about the conversion, that is my fault.
But in this case conversion to ascii is the desired behaviour so as
long as the input falls within the ascii character set all is well.

> Otherwise python silently falls back to ascii default and one gets such getting-on-the-nerves UnicodeEncodeError messages. (A bit like if when adding an int and a float python tried to produce an int result, imo.)
> As the source was utf16, it may contain non ascii chars, so the whole output should be managed as unicode, no?

Yes, an explicit call to encode() with the desired error parameter is
one solution, as I indicated in a previous email.

Tutor maillist  -  Tutor at python.org

More information about the Tutor mailing list