Why do my list go uni-code by itself?

Martin Hvidberg Martin at Hvidberg.net
Mon Dec 20 16:08:20 EST 2010


I'm reading a fixed format text file, line by line. I hereunder present 
the code. I have <snipped> out part not related to the file reading.
Only relevant detail left out is the lstCutters. It looks like this:
[[1, 9], [11, 21], [23, 48], [50, 59], [61, 96], [98, 123], [125, 150]]
It specifies the first and last character position of each token in the 
fixed format of the input line.
All this works fine, and is only to explain where I'm going.

The code, in the function definition, is broken up in more lines than 
necessary, to be able to monitor the variables, step by step.

--- Code start ------

import codecs

<snip>

def CutLine2List(strIn,lstCut):
     strIn = strIn.strip()
     print '>InNextLine>',strIn
     # skip if line is empty
     if len(strIn)<1:
         return False
     lstIn = list()
     for cc in lstCut:
         strSubline =strIn[cc[0]-1:cc[1]-1].strip()
         lstIn.append(strSubline)
         print '>InSubline2>'+lstIn[len(lstIn)-1]+'<'
     del strIn, lstCut,cc
     print '>InReturLst>',lstIn
     return lstIn

<snip>

filIn = codecs.open(
                     strFileNameIn,
                     mode='r',
                     encoding='utf-8',
                     errors='strict',
                     buffering=1)
  for linIn in filIn:
     lstIn = CutLine2List(linIn,lstCutters)

--- Code end ------

A sample output, representing one line from the input file looks like this:

 >InNextLine> I         30          2002-12-11 20:01:19.280    
563        FANØ                                 
2001-12-12-15.46.12.734502 2001-12-12-15.46.12.734502
 >InSubline2>I<
 >InSubline2>30<
 >InSubline2>2002-12-11 20:01:19.280<
 >InSubline2>563<
 >InSubline2>FANØ<
 >InSubline2>2001-12-12-15.46.12.73450<
 >InSubline2>2001-12-12-15.46.12.73450<
 >InReturLst> [u'I', u'30', u'2002-12-11 20:01:19.280', u'563', 
u'FAN\xd8', u'2001-12-12-15.46.12.73450', u'2001-12-12-15.46.12.73450']


Question:
In the last printout, tagged >InReturLst> all entries turn into 
uni-code. What happens here?
Look for the word 'FANØ'. This word changes from 'FANØ' to u'FAN\xd8' -- 
That's a problem to me, and I don't want it to change like this.

What do I do to stop this behavior?

Best Regards
Martin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20101220/d6063641/attachment.html>


More information about the Python-list mailing list