Changing the default text codec
paul at prescod.net
Mon Feb 23 10:48:43 CET 2004
> Sorry if my terminology is wrong..... but I'm having intermittent
> problems dealing with accented characters in python. (Only from the 8
> bit latin-1 character set I think..)
I would say that if you get a 100% failure rate in IDLE and a 100%
success rate from a console program then your problem is not
intermittent but environment specific.
> For example - if I run my program from IDLE and give it the word
> 'degri' (containing e-acute) then I get the error :
What do you mean "give it the word". Through raw_input()? Through a file?
However you are getting this information, it seems to me that in IDLE
you are getting a Unicode object rather than an 8-bit string object.
Convert it to an 8-bit string:
> if letter in self.valid_letters:
> UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in position
> 26: ordinal not in range(128)
Something looks suspicious here. I wouldn't expect self.valid_letters to
have a 0x83 character in it because I would expect it to be hard-coded
to ASCII in your program like:
valid_letters = "abcdefghijklmnopqrstuvwxyzABCDEF..."
On the other hand I wouldn't expect "letter" to have more than one
character so how could it have a problem at position 26?
> What I'd like to do is switch by default to an 8 bit codec (latin-1 I
> think ?????) and then offer the user the choice of either mapping the
> accented characters to their nearest equivalent (e-acute to e for
> example) *or* treating them as seperate characters.............
Why change the default codec rather than explicitly using the codec you
care about? If you want to work in the 8-bit world rather than the
Unicode world, just use the "encode" function on the Unicode object. If
you want to work in the Unicode world.
> I can't work out how to change the default codec (no matter what the
> locale) ?
I'd advise against fixing the problem in that way. Convert data
appropriately when you bring it from the outside world into the Python
program and ignore the default codec.
More information about the Python-list