raw strings

Michele Simionato mis6 at pitt.edu
Fri Oct 11 21:29:46 CEST 2002


Duncan Booth <duncan at rcp.co.uk> wrote in message 

>> s/regexp1/regexp2/

>... where regexp1 is a regular expression and regexp2 is a string.

Maybe regexp2 is not a regular expression, but certainly is not a
standard string, since can contain grouping characters. For instance
in a text I needed to change expressions of kind

[decimal number] --> (decimal number)

and I used

sub(r'\[(\d+)\]', r'(\1)')

If the second expression was a real string, '(\1)' would be replaced instead
of the correct decimal number ! With this in mind I used the term regular
expression for regexp2, even if I agree which is not a regular expression
in the same sense of regexp1. But it is not a standard string. In lack of
a good term I used the notation regexp2.

> You could try writing re.sub(regexp1, replacement, string), or using
> your terminology: 
>   re.sub(r'regexp1', r'regexp2', text)
> where regexp2 is not a regular expression.

I had the impression that the use of re.sub(), without compiling first
the regular expression, was quite inefficient. Now I did some profiling and
discovered that it is worse, but only by 10%, practically nothing.
Therefore I will use the non-compiled form in the future.

> I think you have a fundamental misunderstanding of what a 'raw 
> string' actually is.

Even if at the time of my first posting I was unsure about the exact
meaning of a raw string, after the reply by Bengt Richter I quickly
realized how things work, this is the reason why I wrote

> The problem seems much more complicated than I expected.

Now I understand well the way Python interprets strings and the reason
why it is not obvious at all to define a raw_string function.

I had already thought to the preprocessor idea suggested by Gerhard Häring
but I discarded it since I wanted raw_string() working on variables, not
only on string constants which would be the case for a preprocessor. In
this way I would simply give a longer name to the r operation !

Therefore for the moment I will stay with the ugly r notation.

Still, I don't believe I am the only one who thinks the "r" is ugly!
It seems to me a last minute hack more than a pythonic construct.
At least, IMHO.

Thanks to all people who answered and helped me to understand,

                                  Michele



More information about the Python-list mailing list