unicode confusing

Pet petshmidt at googlemail.com
Tue May 26 09:29:00 CEST 2009


On May 25, 6:07 pm, Paul Boddie <p... at boddie.org.uk> wrote:
> On 25 Mai, 17:39, someone <petshm... at googlemail.com> wrote:
>
> > Hi,
>
> > reading content of webpage (encoded in utf-8) with urllib2, I can't
> > get parsed data into DB
>
> > Exception:
>
> >   File "/usr/lib/python2.5/site-packages/pyPgSQL/PgSQL.py", line 3111,
> > in execute
> >     raise OperationalError, msg
> > libpq.OperationalError: ERROR:  invalid UTF-8 byte sequence detected
> > near byte 0xe4
>
> > I've already checked several python unicode tutorials, but I have no
> > idea how to solve my problem.
>
> With pyPgSQL, there are a few tricks that you have to take into
> account:
>
> 1. With PostgreSQL, it would appear advantageous to create databases
> using the "-E unicode" option.

Hi,

DB is in UTF8


>
> 2. When connecting, use the client_encoding and unicode_results
> arguments for the connect function call:
>
>   connection = PgSQL.connect(client_encoding="utf-8",
> unicode_results=1)

If I do unicode_results=1, then there are exceptions in other places,
e.g. urllib.urlencode(values)
cant encode values

>
> 3. After connecting, it appears necessary to set the client encoding
> explicitly:
>
>   connection.cursor().execute("set client_encoding to unicode")

I've tried this as well, but still have exceptions

>
> I'd appreciate any suggestions which improve on the above, but what
> this should allow you to do is to present Unicode objects to the
> database and to receive such objects from queries. Whether you can
> relax this and pass UTF-8-encoded strings instead of Unicode objects
> is not something I can guarantee, but it's usually recommended that
> you manipulate Unicode objects in your program where possible, and
> here you should be able to let pyPgSQL deal with the encodings
> preferred by the database.
>

Thanks for your suggestions! Sadly, I can't solve my problem...

Pet

> Paul




More information about the Python-list mailing list