reading from file

Sydoruk Yaroslav swift at mirohost.net
Thu Jun 11 17:06:14 EDT 2009


Jeff McNeil <jeff at jmcneil.net> wrote:
> Is the string in your text file literally "\xea\xe0\xea+\xef\xee
> \xe7\xe2\xee\xed\xe8\xf2\xfc" as "plain text?"  My assumption is that
> when you're reading that in, Python is interpreting each byte as an
> ASCII value (and rightfully so) rather than the corresponding '\x'
> escapes.
> 
> As an experiment:
> 
> (t)jeff at marvin:~/t$ cat test.py
> import chardet
> 
> s = "\xea\xe0\xea+\xef\xee\xe7\xe2\xee\xed\xe8\xf2\xfc"
> with open('test.txt', 'w') as f:
>        print >>f, s
> 
> print chardet.detect(open('test.txt').read())
> (t)jeff at marvin:~/t$ python test.py
> {'confidence': 0.98999999999999999, 'encoding': 'windows-1251'}
> (t)jeff at marvin:~/t$
> 
> HTH,
> 
> Jeff
> mcjeff.blogspot.com

    
Thank you for your reply.
You are right, Python reads data form the file in bytes and all data in this 
case is ASCII


I solved the problem, just added line = line.decode('string_escape')

f = open ("aword.txt", "r")
for line in f:
     line = line.decode('string_escape')
     print chardet.detect(line)
     b = line.decode('cp1251')
     print b
-- 
Only one 0_o



More information about the Python-list mailing list