Ascii to Unicode.
Carey Tilden
carey.tilden at gmail.com
Thu Jul 29 14:18:41 EDT 2010
On Thu, Jul 29, 2010 at 10:59 AM, Joe Goldthwaite <joe at goldthwaites.com> wrote:
> Hi Ulrich,
>
> Ascii.csv isn't really a latin-1 encoded file. It's an ascii file with a
> few characters above the 128 range that are causing Postgresql Unicode
> errors. Those characters work fine in the Windows world but they're not the
> correct byte representation for Unicode. What I'm attempting to do is
> translate those upper range characters into the correct Unicode
> representations so that they look the same in the Postgresql database as
> they did in the CSV file.
Having bytes outside of the ASCII range means, by definition, that the
file is not ASCII encoded. ASCII only defines bytes 0-127. Bytes
outside of that range mean either the file is corrupt, or it's in a
different encoding. In this case, you've been able to determine the
correct encoding (latin-1) for those errant bytes, so the file itself
is thus known to be in that encoding.
Carey
More information about the Python-list
mailing list