[BangPypers] Internationalization in Python 2.6
Dileep
dileep.ds at gmail.com
Wed Nov 27 10:25:01 CET 2013
Hi,
How the internationalization works in Python 2.6.
I have an input string to the script. I do not know the encoding of the
string.
I want to write that string to an xml file.
Here I am trying different encoding formats to decode the input string and
make it as unicode.
Then using the same encoding format while creating the xml string and
writing to a file.
Is that approach fine ? or any other way to support internationalization if
we do not know the encoding format for the input string ?
I am getting the xml string without any issues. print xml_string works fine
.
But when it is writing to a file, the tag value got changed, even though I
used the same encoding format used for decoding.
I written a sample code like below
import os
import codecs
from xml.dom.minidom import Document
def write_to_xml(output_string, encod_fmt):
doc = Document()
root = doc.createElement('root')
doc.appendChild(root)
tag_key = doc.createElement('output_string')
tag_value = output_string
tag_key.appendChild(doc.createTextNode((tag_value)))
root.appendChild(tag_key)
xml_string = doc.toprettyxml(indent=" ",encoding=encod_fmt)
print xml_string
fname = os.path.join('/root/output.xml')
doc.writexml(codecs.open(fname,'wb',encod_fmt), encoding=encod_fmt)
def convert_string(input_string):
try:
input_string_unicode = input_string.decode('utf-8')
encoding = 'utf-8'
except UnicodeDecodeError:
try:
input_string_unicode = input_string.decode('Latin-1')
encoding = 'Latin-1'
except UnicodeDecodeError:
try:
input_string_unicode = input_string.decode('iso-8859-1')
encoding = 'iso-8859-1'
except UnicodeDecodeError:
raise
#output_string = input_string_unicode.encode(encoding)
write_to_xml(input_string_unicode, encoding)
if __name__ == '__main__':
input_string = raw_input()
convert_string(input_string)
Output
---------
[root] python i18n_test.py
Étest
<?xml version="1.0" encoding="Latin-1"?>
<root>
<output_string>
Étest
</output_string>
</root>
But the file content is as below.
<?xml version="1.0"
encoding="Latin-1"?><root><output_string><C9>test</output_string></root>
--
Regards
D.S. DIleep
More information about the BangPypers
mailing list