f*cking re module

George Sakkis gsakkis at rutgers.edu
Mon Jul 4 15:01:31 CEST 2005


"jwaixs" <jwaixs at gmail.com> wrote:

> Thank you for your replies, it's much obvious now. I know more what I
> can and can't do with the re module. But is it possible to search for
> more than one string in the same line?
>
> bv. I want to replace the <python> with " "
> </python> with "\n" and every thing that's not between the two python
> tags must begin with "\nprint \"\"\"" and end with "\"\"\"\n"? Or do I
> need more than one call?

You can do it in one call, but it's ugly; as other have told you
already, use HTMLParser or some other parsing package. Now if you
insist...

regex = re.compile(r'''(?:
                        (?:<python>)
                        (.*?)              # group 1: inside tags
                        (?:</python>)
                        ) |                # OR
                        ([^<]*)            # group 2: outside tags
                   ''', re.DOTALL | re.VERBOSE)

def replace(match):
    g1,g2 = match.groups()
    if g1:
        return g1
    else:
        return '\nprint """%s"""\n' % g2


text = '''this is <python>a stupid
sentence</python> but still I
<python>have to</python> write it.'''

print regex.sub(replace,text)

===== Output ==================

print """this is """
a stupid
sentence
print """ but still I
"""
have to
print """ write it."""

=======================

George




More information about the Python-list mailing list