more on unescaping escapes

Adam Olsen rhamph at gmail.com
Tue Feb 24 21:16:40 CET 2009


On Feb 23, 7:18 pm, bvdp <b... at mellowood.ca> wrote:
> Gabriel Genellina wrote:
> > En Mon, 23 Feb 2009 23:31:20 -0200, bvdp <b... at mellowood.ca> escribió:
> >> Gabriel Genellina wrote:
> >>> En Mon, 23 Feb 2009 22:46:34 -0200, bvdp <b... at mellowood.ca> escribió:
> >>>> Chris Rebert wrote:
> >>>>> On Mon, Feb 23, 2009 at 4:26 PM, bvdp <b... at mellowood.ca> wrote:
>
> >>>>> [problem with Python and Windows paths using backslashes]
> >>>>>  Is there any particular reason you can't just internally use regular
> >>>>> forward-slashes for the paths? [...]
>
> >>>> you are absolutely right! Just use '/' on both systems and be done
> >>>> with it. Of course I still need to use \x20 for spaces, but that is
> >>>> easy.
> >>> Why is that? "\x20" is exactly the same as " ". It's not like %20 in
> >>> URLs, that becomes a space only after decoding.
>
> >> I need to use the \x20 because of my parser. I'm reading unquoted
> >> lines from a file. The file creater needs to use the form "foo\x20bar"
> >> without the quotes in the file so my parser can read it as a single
> >> token. Later, the string/token needs to be decoded with the \x20
> >> converted to a space.
>
> >> So, in my file "foo bar" (no quotes) is read as 2 tokens; "foo\x20bar"
> >> is one.
>
> >> So, it's not really a problem of what happens when you assign a string
> >> in the form "foo bar", rather how to convert the \x20 in a string to a
> >> space. I think the \\ just complicates the entire issue.
>
> > Just thinking, if you was reading the string from a file, why were you
> > worried about \\ and \ in the first place? (Ok, you moved to use / so
> > this is moot now).
>
> Just cruft introduced while I was trying to figure it all out. Having to
> figure the \\ and \x20 at same time with file and keyboard input just
> confused the entire issue :) Having the user set a line like
> c:\\Program\x20File ... works just fine. I'll suggest he use
> c:/program\x20files to make it bit simple for HIM, not my parser.
> Unfortunately, due to some bad design decisions on my part about 5 years
> ago I'm afraid I'm stuck with the \x20.
>
> Thanks.

You're confusing the python source with the actual contents of the
string.  We already do one pass at decoding, which is why \x20 is
quite literally no different from a space:

>>> '\x20'
' '

However, the interactive interpreter uses repr(x), so various
characters that are considered formatting, such as a tab, get
reescaped when printing:

>>> '\t'
'\t'
>>> len('\t')
1

It really is a tab that gets stored there, not the escape for one.

Finally, if you give python an unknown escape it passes it leaves it
as an escape.  Then, when the interactive interpreter uses repr(x), it
is the backslash itself that gets reescaped:

>>> '\P'
'\\P'
>>> len('\P')
2
>>> list('\P')
['\\', 'P']

What does this all mean?  If you want to test your parser with python
literals you need to escape them twice, like so:

>>> 'c:\\\\Program\\x20Files\\\\test'
'c:\\\\Program\\x20Files\\\\test'
>>> list('c:\\\\Program\\x20Files\\\\test')
['c', ':', '\\', '\\', 'P', 'r', 'o', 'g', 'r', 'a', 'm', '\\', 'x',
'2', '0', 'F', 'i', 'l', 'e', 's', '\\', '\\', 't', 'e', 's', 't']
>>> 'c:\\\\Program\\x20Files\\\\test'.decode('string-escape')
'c:\\Program Files\\test'
>>> list('c:\\\\Program\\x20Files\\\\test'.decode('string-escape'))
['c', ':', '\\', 'P', 'r', 'o', 'g', 'r', 'a', 'm', ' ', 'F', 'i',
'l', 'e', 's', '\\', 't', 'e', 's', 't']

However, there's an easier way: use raw strings, which prevent python
from unescaping anything:

>>> r'c:\\Program\x20Files\\test'
'c:\\\\Program\\x20Files\\\\test'
>>> list(r'c:\\Program\x20Files\\test')
['c', ':', '\\', '\\', 'P', 'r', 'o', 'g', 'r', 'a', 'm', '\\', 'x',
'2', '0', 'F', 'i', 'l', 'e', 's', '\\', '\\', 't', 'e', 's', 't']



More information about the Python-list mailing list