what is this UnicodeDecodeError:....?

John Machin sjmachin at lexicon.net
Tue Oct 10 17:11:13 EDT 2006


kath wrote:
> I have a number of excel files. In each file DATE is represented by
> different name. I want to read the date from those different file. Also
> the date is in different column in different file.
>
> To identify the date field in different files I have created a file
> called _globals where I keep all aliases for DATE in a array called
> 'alias_DATE'.

It's actually a list. In Python an array is something else; look at the
docs for the array module if you're interested.

>
> Array alias_DATE looks like,
>
> alias_DATE=['TRADEDATE', 'Accounting Date', 'Date de VL','Datum',
> 'Kurs-datum', 'Date', 'Fecha Datos', 'Calculation Date', 'ClosingDate',
> 'Pricing Date', 'NAV Date', 'NAVDate', 'NAVDATE', 'ValuationDate',
> 'Datestamp', 'Fecha de Valoración', 'Kurs-','datum',
> """Kurs-\ndatum""", "Kurs-\ndatum"]

Nothing to do with the question you asked, but the last two entries
have the same value; is that intentional?
| >>> """Kurs-\ndatum""" == "Kurs-\ndatum"
| True


>
> Now I want the index of the column where date is there.  I followed the
> with followin code.
>
>
> >>> b=xlrd.open_workbook('Santander_051206.xls')
> >>> sh=b.sheet_by_index(0)
> >>> sh.cell_value(rowx=0, colx=11)
> u'Fecha de Valoraci\xf3n'
> >>> val=sh.cell_value(rowx=0, colx=11)
> >>> val
> u'Fecha de Valoraci\xf3n'
> >>> print val
> Fecha de Valoración
> >>> import _globals		# the file where I have stored my 'alias_DATE' array
> >>> _globals.alias_DATE.index(val)
> Traceback (most recent call last):
>   File "<interactive input>", line 1, in ?
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xf3 in position
> 17: ordinal not in range(128)
> >>>
>
> Though I have matching value in the array, why I am getting this error.
> Can any one please tell me why is this error, and how to get rid of
> this error. Because I have some files which containing some more
> special characters.
>

Hello again, Sudhir.

The text string returned by xlrd is a unicode object (u'Fecha de
Valoraci\xf3n'). The text strings in your list are str objects, encoded
in some unspecified encoding. Python is trying to convert the str
object 'Fecha de Valoración' to Unicode, using the (default) ascii
codec to do the conversion, and failing.

One way to handle this is to specify any non-ASCII strings in your
lookup list as unicode, like this:

contents of sudhir.py:
| # -*- coding: cp1252 -*-
| alist = ['Datestamp', u'Fecha de Valoraci\xf3n', 'Kurs-','datum']
| blist = ['Datestamp', u'Fecha de Valoración', 'Kurs-','datum']
| assert alist == blist
| val = u'Fecha de Valoraci\xf3n'
| print 'a', alist.index(val)
| print 'b', blist.index(val)

| OS prompt>sudhir.py
| a 1
| b 1

Note: the encoding "cp1252" is appropriate to my environment, not
necessarily to yours.

You may like to have a look through this:
 http://www.amk.ca/python/howto/unicode

HTH,
John




More information about the Python-list mailing list