Reading Windows CSV file with LCID entries under Linux.
Tim Golden
mail at timgolden.me.uk
Mon Sep 22 11:59:41 EDT 2008
Thomas Troeger wrote:
> I've stumbled over a problem with Windows Locale ID information and
> codepages. I'm writing a Python application that parses a CSV file,
> the format of a line in this file is "LCID;Text1;Text2". Each line can
> contain a different locale id (LCID) and the text fields contain data
> that is encoded in some codepage which is associated with this LCID. My
> current data file contains the codes 1033 for German and 1031 for
> English US (as listed in
> http://www.microsoft.com/globaldev/reference/lcid-all.mspx).
> Unfortunately, I cannot find out which Codepage (like cp-1252 or
> whatever) belongs to which LCID.
>
> My question is: How can I convert this data into something more
> reasonable like unicode? Basically, what I want is something like
> "Text1;Text2", both fields encoded as UTF-8. Can this be done with
> Python? How can I find out which codepage I have to use for 1033 and 1031?
The GetLocaleInfo API call can do that conversion:
http://msdn.microsoft.com/en-us/library/ms776270(VS.85).aspx
You'll need to use ctypes (or write a c extension) to
use it. Be aware that if it doesn't succeed you may need
to fall back on cp 65001 -- utf8.
TJG
More information about the Python-list
mailing list