[Tutor] Unicode encoding and raw_input() in Python 2.7 ?

Dave Angel davea at davea.name
Sat Apr 18 02:37:10 CEST 2015


On 04/17/2015 04:39 AM, Samuel VISCAPI wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Hi,
>
> This is my first post to that mailing list if I remember correctly, so
> hello everyone !
>

Welcome to the list.

> I've been stuck on a simple problem for the past few hours. I'd just
> like raw_input to work with accentuated characters.

That should mean you want to use unicode.


If you're using raw_input, then you must be using Python 2.x.  Easiest 
first step to doing things right in Unicode would be to switch to 
version 3.x    But I'll assume that you cannot do this, for the duration 
of this message.



>
> For example:
>
> firstname = str.capitalize(raw_input('First name: '))

If you're serious about Unicode, you're getting an encoded string with 
raw_input, so you'll need to decode it, using whatever encoding your 
console device is using.  If you don't know, you're in big trouble.  But 
if you're in Linux, chances are good that it's utf-8.

>
> where firstname could be "Valérie", "Gisèle", "Honoré", etc...


>
> I tried -*- coding: utf-8 -*-, u'', unicode(), but to no avail...
>

As Alan says, you're not tellins us anything useful.  "No avail" is too 
imprecise to be useful.   I'll comment on them anyway.

The coding statement applies only to literals you use in your source 
code.  It has nothing at all to do with the value returned by raw_input.

u'' likewise is used in your source code.  It has nothing to do with 
what the user may type into your program.

unicode() is a "function" that may decode a string received from 
raw_input, providing you know what the coding was.  You can also 
accomplish it by using the method str.decode().


> I'm using str.capitalize and str.lower throughout my code, so I guess
> some encoding / decoding will also be necessary at some point.

Those apply to strings.  But if you're doing it right, you should have 
unicode objects long before you apply such methods.  So you'd want the 
unicode methods  unicode.upper and unicode.lower



-- 
DaveA


More information about the Tutor mailing list