Reading backslashed characters (in ASCII) from a file

Michael P. Reilly arcege at shore.net
Thu May 27 12:11:53 EDT 1999


Dinu C. Gherman <gherman at my-deja.com> wrote:
: I'm reading data from a file that contains octal coded characters
: (and other baskslashed ones) in ASCII like this:

: \163\160\141\155 and \145\147\147\163

: Using the normal read(<n>) method on file objects I assemble parts of
: the data (containing these octal chars) into Python strings.

: The problem I face now is that I want to convert these characters
: into their ASCII representation in a Python string. I know that
: Python does that itself, but the string is composed peu a peu and
: the upper example should look like this when splitted:

: ['s','p','a','m', ...]

: instead of:

: ['\\','1','6','3','\\', ...]

: I already thought of using the re module, but so far I'm led to
: believe that this is going to be a backslashing-nightmare. Another
: idea was to use exec() or eval() to get Python interpret the
: backslashes in the usual way, but that doesn't work either. In
: fact, the same problem holds for getting \n, \t characters, etc.

: Also trying to read a whole chunk of the file to avoid dealing
: with incrementally growing strings does not help as the following
: example shows (BTW, reading with "r" or "rb" makes no difference):

: 1. Write "\163\160\141\155 and \145\147\147\163
:    to an ASCII file named "backslash" using one's favorite
:    editor like emacs.

: 2. Call Python and do the following:

:>>> f = open("backslash", "rb")
:>>> f.read()
: '(\\163\\160\\141\\155 and \\145\\147\\147\\163)'
:>>>

: Now I'm quite puzzled. Is there really no way but using the re module
: excessively? Anybody having a better trick?

I was doing something like this for some data encoding.  Try this:

$ python
Python 1.5.1 (#3, Jul 16 1998, 10:35:48)  [GCC 2.7.2.2] on aix4
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> def octalchar_convert(str):
...   parts = string.splitfields(str, '\\')
...   s = [parts[0]]
...   for part in parts[1:]:
...     num = string.atoi(part[:3], 8)  # the octal number
...     s.append( chr(num) + part[3:])
...   return string.joinfields(s, '')
...
>>> import string
>>> octalchar_convert('\163\160\141\155 and \145\147\147\163')
'spam and eggs'
>>>

Process the parts split on the backslash.

  -Arcege





More information about the Python-list mailing list