reading from file
Sydoruk Yaroslav
swift at mirohost.net
Thu Jun 11 17:06:14 EDT 2009
Jeff McNeil <jeff at jmcneil.net> wrote:
> Is the string in your text file literally "\xea\xe0\xea+\xef\xee
> \xe7\xe2\xee\xed\xe8\xf2\xfc" as "plain text?" My assumption is that
> when you're reading that in, Python is interpreting each byte as an
> ASCII value (and rightfully so) rather than the corresponding '\x'
> escapes.
>
> As an experiment:
>
> (t)jeff at marvin:~/t$ cat test.py
> import chardet
>
> s = "\xea\xe0\xea+\xef\xee\xe7\xe2\xee\xed\xe8\xf2\xfc"
> with open('test.txt', 'w') as f:
> print >>f, s
>
> print chardet.detect(open('test.txt').read())
> (t)jeff at marvin:~/t$ python test.py
> {'confidence': 0.98999999999999999, 'encoding': 'windows-1251'}
> (t)jeff at marvin:~/t$
>
> HTH,
>
> Jeff
> mcjeff.blogspot.com
Thank you for your reply.
You are right, Python reads data form the file in bytes and all data in this
case is ASCII
I solved the problem, just added line = line.decode('string_escape')
f = open ("aword.txt", "r")
for line in f:
line = line.decode('string_escape')
print chardet.detect(line)
b = line.decode('cp1251')
print b
--
Only one 0_o
More information about the Python-list
mailing list