Reading backslashed characters (in ASCII) from a file
Michael P. Reilly
arcege at shore.net
Thu May 27 12:11:53 EDT 1999
Dinu C. Gherman <gherman at my-deja.com> wrote:
: I'm reading data from a file that contains octal coded characters
: (and other baskslashed ones) in ASCII like this:
: \163\160\141\155 and \145\147\147\163
: Using the normal read(<n>) method on file objects I assemble parts of
: the data (containing these octal chars) into Python strings.
: The problem I face now is that I want to convert these characters
: into their ASCII representation in a Python string. I know that
: Python does that itself, but the string is composed peu a peu and
: the upper example should look like this when splitted:
: ['s','p','a','m', ...]
: instead of:
: ['\\','1','6','3','\\', ...]
: I already thought of using the re module, but so far I'm led to
: believe that this is going to be a backslashing-nightmare. Another
: idea was to use exec() or eval() to get Python interpret the
: backslashes in the usual way, but that doesn't work either. In
: fact, the same problem holds for getting \n, \t characters, etc.
: Also trying to read a whole chunk of the file to avoid dealing
: with incrementally growing strings does not help as the following
: example shows (BTW, reading with "r" or "rb" makes no difference):
: 1. Write "\163\160\141\155 and \145\147\147\163
: to an ASCII file named "backslash" using one's favorite
: editor like emacs.
: 2. Call Python and do the following:
:>>> f = open("backslash", "rb")
:>>> f.read()
: '(\\163\\160\\141\\155 and \\145\\147\\147\\163)'
:>>>
: Now I'm quite puzzled. Is there really no way but using the re module
: excessively? Anybody having a better trick?
I was doing something like this for some data encoding. Try this:
$ python
Python 1.5.1 (#3, Jul 16 1998, 10:35:48) [GCC 2.7.2.2] on aix4
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> def octalchar_convert(str):
... parts = string.splitfields(str, '\\')
... s = [parts[0]]
... for part in parts[1:]:
... num = string.atoi(part[:3], 8) # the octal number
... s.append( chr(num) + part[3:])
... return string.joinfields(s, '')
...
>>> import string
>>> octalchar_convert('\163\160\141\155 and \145\147\147\163')
'spam and eggs'
>>>
Process the parts split on the backslash.
-Arcege
More information about the Python-list
mailing list