csv reader
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Tue Dec 15 23:27:09 EST 2009
En Tue, 15 Dec 2009 19:12:01 -0300, Emmanuel <manouchk at gmail.com> escribió:
> Then my problem is diferent!
>
> In fact I'm reading a csv file saved from openoffice oocalc using
> UTF-8 encoding. I get a list of list (let's cal it tab) with the csv
> data.
> If I do:
>
> print tab[2][4]
> In ipython, I get:
> equação de Toricelli. Tarefa exercícios PVR 1 e 2 ; PVP 1
>
> If I only do:
> tab[2][4]
>
> In ipython, I get:
> 'equa\xc3\xa7\xc3\xa3o de Toricelli. Tarefa exerc\xc3\xadcios PVR 1 e
> 2 ; PVP 1'
>
> Does that mean that my problem is not the one I'm thinking?
Yes. You have a real problem, but not this one. When you say `print
something`, you get a nice view of `something`, basically the result of
doing `str(something)`. When you say `something` alone in the interpreter,
you get a more formal representation, the result of calling
`repr(something)`:
py> x = "ecuação"
py> print x
ecuação
py> x
'ecua\x87\xc6o'
py> print repr(x)
'ecua\x87\xc6o'
Those '' around the text and the \xNN notation allow for an unambiguous
representation. Two strings may "look like" the same but be different, and
repr shows that.
('ecua\x87\xc6o' is encoded in windows-1252; you should see
'equa\xc3\xa7\xc3\xa3o' in utf-8)
> My real problem is when I use that that kind of UTF-8 encoded (?) with
> selenium here.
> If I just switch the folowing line:
> self.sel.type("q", "equação")
>
> by:
> self.sel.type("q", u"equação")
>
>
> It works fine!
Yes: you should work with unicode most of the time. The "recipe" for
having as little unicode problems as possible says:
- convert the input data (read from external sources, like a file) from
bytes to unicode, using the (known) encoding of those bytes
- handle unicode internally everywhere in your program
- and convert from unicode to bytes as late as possible, when writing
output (to screen, other files, etc) using the encoding expected by those
external files.
See the Unicode How To: http://docs.python.org/howto/unicode.html
> The problem is that the csv.reader does give a "equação" and not a
> u"equação"
The csv module cannot handle unicode text directly, but see the last
example in the csv documentation for a simple workaround:
http://docs.python.org/library/csv.html
--
Gabriel Genellina
More information about the Python-list
mailing list