Reading Windows CSV file with LCID entries under Linux.

Tim Golden mail at timgolden.me.uk
Mon Sep 22 17:59:41 CEST 2008


Thomas Troeger wrote:
> I've stumbled over a problem with Windows Locale ID information and 
> codepages. I'm writing a Python application that parses a CSV file,
> the format of a line in this file is "LCID;Text1;Text2". Each line can 
> contain a different locale id (LCID) and the text fields contain data 
> that is encoded in some codepage which is associated with this LCID. My 
> current data file contains the codes 1033 for German and 1031 for 
> English US (as listed in 
> http://www.microsoft.com/globaldev/reference/lcid-all.mspx). 
> Unfortunately, I cannot find out which Codepage (like cp-1252 or 
> whatever) belongs to which LCID.
> 
> My question is: How can I convert this data into something more 
> reasonable like unicode? Basically, what I want is something like 
> "Text1;Text2", both fields encoded as UTF-8. Can this be done with 
> Python? How can I find out which codepage I have to use for 1033 and 1031?


The GetLocaleInfo API call can do that conversion:

http://msdn.microsoft.com/en-us/library/ms776270(VS.85).aspx

You'll need to use ctypes (or write a c extension) to
use it. Be aware that if it doesn't succeed you may need
to fall back on cp 65001 -- utf8.

TJG



More information about the Python-list mailing list