Character encoding & the copyright symbol

Benjamin Kaplan benjamin.kaplan at case.edu
Thu Aug 6 21:54:26 CEST 2009


On Thu, Aug 6, 2009 at 12:41 PM, Robert Dailey<rcdailey at gmail.com> wrote:
> On Aug 6, 11:31 am, "Richard Brodie" <R.Bro... at rl.ac.uk> wrote:
>> "Robert Dailey" <rcdai... at gmail.com> wrote in message
>>
>> news:29ab0981-b95d-4435-91bd-a7a520419ada at b15g2000yqd.googlegroups.com...
>>
>> > UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in
>> > position 1650: character maps to <undefined>
>>
>> > The file is defined as ASCII.
>>
>> That's the problem: ASCII is a seven bit code. What you have is
>> actually ISO-8859-1 (or possibly Windows-1252).
>>
>> The different ISO-8859-n variants assign various characters to
>> to '\xa9'. Rather than being Western-European centric and assuming
>> ISO-8859-1 by default, Python throws an error when you stray
>> outside of strict ASCII.
>
> Thanks for the help guys. Sorry I left out code, I wasn't sure at the
> time if it would be helpful. Below is my code:
>
>
> #========================================================
> def GetFileContentsAsString( file ):
>   f = open( file, mode='r', encoding='cp1252' )
>   contents = f.read()
>   f.close()
>   return contents
>
> #========================================================
> def ReplaceVersion( file, version, regExps ):
>   #match = regExps[0].search( 'FILEVERSION 1,45332,2100,32,' )
>   #print( match.group() )
>   text = GetFileContentsAsString( file )
>   print( text )
>
>
> As you can see, I am trying to load the file with encoding 'cp1252'
> which, according to the python 3.1 docs, translates to windows-1252. I
> also tried 'latin_1', which translates to ISO-8859-1, but this did not
> work either. Am I doing something else wrong?

This is why we need code and full tracebacks. There's a good chance
that your error is on the print(text) line. That's because sys.stdout
is probably a byte stream without an encoding defined. When you try to
print your unicode string, Python has to convert it to a stream of
bytes. Python refuses to guess on the console encoding and just falls
back to ascii, the conversion fails, and you get your error. Try using
print( text.encode( 'cp1252' ) ) instead.
> --
> http://mail.python.org/mailman/listinfo/python-list
>



More information about the Python-list mailing list