Unicode

Vlastimil Brom vlastimil.brom at gmail.com
Mon Dec 17 12:56:06 CET 2012


2012/12/17 Anatoli Hristov <tolidtm at gmail.com>:
>> this seems to be an encoding error of your terminal on printing.
>> You may need to describe (or better post the respective parts of the
>> source) where the text is coming from (external text file, database
>> entry, harcoded in the python source ...), how it is stored, retrieved
>> and possibly manipulated before you insert it to the database.
>>
> Here is how I get the data using the urllib opener:
>
> def GetSpecsFR(icecat_prod_id):
>     opener = urllib.FancyURLopener({})
>     ffr = opener.open("http://prf.icecat.biz/index.cgi?product_id=%s;mi=start;smi=product;shopname=openICEcat-url;lang=fr"
> % icecat_prod_id)
>     specsfr = ffr.read()
>     #specsfr = specsfr.decode('utf-8')
>     specsfr = RemoveHTML(specsfr)
>     ##specsfr = "%r" % specsfr
> ##    if specsfr:
> ##        try:
> ##            specsfr = str(specsfr)
> ##        except UnicodeEncodeError:
> ##            specsfr = str(specsfr.encode('utf-16'))
>     return specsfr

Hi,
I don't know, what the product ID would look like, for this page, but
assuming, the catalog pages are also utf-8 encoded as well as the
error page I get, it should work ok; cf.:

>>> import urllib
>>> opener = urllib.FancyURLopener({})
>>> ffr = opener.open("http://prf.icecat.biz/index.cgi?product_id=%s;mi=start;smi=product;shopname=openICEcat-url;lang=fr" % (1234,))
>>> src = ffr.read()
>>> print src.decode("utf-8")


<!-- This Icecat template is used as head of all pages in Product finder -->


<HTML>
<HEAD>

[... - shortened]

<div align="center">"Désolé, pour ce produit, nous n'avons pas trouvé
d'autres informations produit.<br>Si vous n'êtes pas redirigés
automatiquement, veuillez cliquer" <a href="#" style="font-size:80%"
onclick="history.back()">ici</a>
</div>
<!--
            <td bgcolor="" width="230" align="center"><img
src="/imgs/logo.gif" width="180" height="58"></td>
-->



>>>

Printing on an unicode-capable shell works ok (wx PyShell in my case),
inserting to the database should be straightforward too (although I
don't have experiences with the specific db you are using.

Are you getting another unicode errors in other parts of the process,
or do the above steps work differently on your computer?

hth,
  vbr



More information about the Python-list mailing list