Read utf-8 file

moonhkt moonhkt at
Mon Mar 18 10:34:58 CET 2013

File have China Made
中國 製
UTF-16 (hex) 	0x4E2D (4e2d)
UTF-8 (hex) 	0xE4 0xB8 0xAD (e4b8ad)

Read by od -cx utf_a.text
0000000   中  **  **  國  **  **  製  **  **  \n
            e4b8    ade5    9c8b    e8a3    bd0a

Read by python, why python display as beow ?


u'\u4e2d\u570b\u88fd\n'  <--- Value 中國製
<-- UTF-8 value
u'\u4e2d' 中      CJK UNIFIED IDEOGRAPH-4E2D
u'\u570b' 國      CJK UNIFIED IDEOGRAPH-570B
u'\u88fd' 製      CJK UNIFIED IDEOGRAPH-88FD

import unicodedata
import codecs         # UNICODE

file =, 'r','utf-8' )
  for line in file:
     #print repr(line)
     #print "========="
     print line.encode("utf")
     for keys in line.split(","):

       print repr(keys)  ," <--- Value" ,  keys.encode("utf") ,"<--
UTF-8 value"
       for key in keys:
            name =
            print "%-9s %-8s %-30s" % ( (repr(key)),
key.encode("utf") , name )

How to display
e4b8ad for 中 in python ?

More information about the Python-list mailing list