Unicode support

Richy2004 richard.scothern at gmail.com
Fri Aug 6 16:57:44 CEST 2004


code:
import sys,codecs
file = codecs.open("accountmgr_words_arb.txt", "r", "utf-16")
print (file.readline())

output:
File "./test.py", line 5, in ?
print (file.readline())
File "C:\Python23\lib\codecs.py", line 384, in readline
return self.reader.readline(size)
File "c:\Python23\lib\encodings\utf_16.py", line 57, in readline
raise NotImplementedError, '.readline() is not implemented for
UTF-16'
NotImplementedError: .readline() is not implemented for UTF-16

======================================================
code:
import sys, codecs
file = codecs.open("accountmgr_words_arb.txt", "r", "utf-16")
print (file.read())

output:
Traceback (most recent call last):
File "./test.py", line 5, in ?
print (file.read())
File "c:\Python23\lib\encodings\cp850.py", line 18, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode characters in position
0-2: character maps to <undefined>

======================================================
code:
import sys, codecs
file = codecs.open("accountmgr_words_arb.txt", "rb", "utf-16")
lines = file.readlines()
print lines

this works !, output:
[u'\u0646\u0648\u0639 \u062d\u0633\u0627\u0628 \u062c\u062f\u064a\u062f
\u0645\u062e\u062a\u0627\u0631.\r\n']

if I add these lines:
line = lines[0]
tokens = line.split("\\u")
print tokens[0]

I get this: :(
Traceback (most recent call last):
File "./test.py", line 8, in ?
print tokens[0]
File "c:\Python23\lib\encodings\cp850.py", line 18, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode characters in position
0-2: character maps to <undefined>

Thanks,
Richard




More information about the Python-list mailing list