split problem if the delimiter is inside the text limiter

imageguy imageguy1206 at gmail.com
Wed Mar 18 20:42:47 EDT 2009


> You have to know the original encoding (I mean, the one used for the csv
> file), else there's nothing you can do. Then it's just a matter of
> decoding (to unicode) then encoding (to utf8), ie (if your source is in
> latin1):
>
> utf_string = latin1_string.decode("latin1").encode("utf8")

The OP mentioned using 'pgdb' which I assumed to mean he is using
PostgeSQL and the PygreSQL DB.
If that is the case, then PostgreSQL has an optional parameter call
'client_encoding'.  If this is set in within postgres db or as part of
the db transaction, then the db will accept the incoming data 'as is'
and  do the decoding internally saving this step and giving a bit of a
performance boost as well as the client (python) application doesn't
need to be concerned about it.

As you so correctly point out Bruno, you do need to know the original
encoding.  My comments above just simplify the db update process.

This part of the manual might be helpful
http://www.postgresql.org/docs/8.1/static/multibyte.html

If 'pgdb' != PostgreSQL then please accept my apologies for this
intrusion in this thread.

g.




More information about the Python-list mailing list