[Tutor] why is unicode converted file double spaced?
Kent Johnson
kent37 at tds.net
Wed Apr 8 12:39:50 CEST 2009
On Wed, Apr 8, 2009 at 3:55 AM, spir <denis.spir at free.fr> wrote:
> Le Tue, 7 Apr 2009 17:54:42 -0400,
> Kent Johnson <kent37 at tds.net> s'exprima ainsi:
>
>> > outp.write(outLine.strip()+'\n')
>
>> UnicodeEncodeError: 'ascii' codec can't encode characters in position
>> 640-641: ordinal not in range(128)
>
> Hem, sorry for stepping in the thread. Isn't the issue that lines previously decoded as utf16 are now expanded with an ASCII-by-default NL char? Thought that when one wants to cope with unicode all literals need to be explicitely formed as unicode strings.
Yes, we have not been explicit about the conversion, that is my fault.
But in this case conversion to ascii is the desired behaviour so as
long as the input falls within the ascii character set all is well.
> Otherwise python silently falls back to ascii default and one gets such getting-on-the-nerves UnicodeEncodeError messages. (A bit like if when adding an int and a float python tried to produce an int result, imo.)
> As the source was utf16, it may contain non ascii chars, so the whole output should be managed as unicode, no?
Yes, an explicit call to encode() with the desired error parameter is
one solution, as I indicated in a previous email.
Kent
More information about the Tutor
mailing list