Text Suffix to Prefix Conversion
Peter Otten
__peter__ at web.de
Thu Apr 19 03:11:28 EDT 2007
7stud wrote:
> On Apr 18, 11:08 pm, Steven Bethard <steven.beth... at gmail.com> wrote:
>> EMC ROY wrote:
>> > Original Sentence: An apple for you.
>> > Present: An<AT0> apple<NN1> for<PRP> you<PNP> .<.>
>> > Desire: <AT0>An <NN1>apple <PRP>for <PNP>you <.>.
>> >>> text = 'An<AT0> apple<NN1> for<PRP> you<PNP> .<.>'
>> >>> import re
>> >>> re.sub(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3', text)
>>
>> '<AT0>An <NN1>apple <PRP>for <PNP>you <.>.'
>
> If you end up calling re.sub() repeatedly, e.g. for each line in your
> file, then you should "compile" the regular expression so that python
> doesn't have to recompile it for every call:
>
> import re
>
> text = 'An<AT0> apple<NN1> for<PRP> you<PNP> .<.>'
> myR = re.compile(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3')
re.compile() doesn't accept a replacement pattern:
"""
Help on function compile in module re:
compile(pattern, flags=0)
Compile a regular expression pattern, returning a pattern object.
"""
> re.sub(myR, r'\2\1\3', text)
>
>
> Unfortunately, I must be doing something wrong because I can't get
> that code to work. When I run it, I get the error:
>
> Traceback (most recent call last):
> File "2pythontest.py", line 3, in ?
> myR = re.compile(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3')
> File "/Library/Frameworks/Python.framework/Versions/2.4/lib/
> python2.4/sre.py", line 180, in compile
> return _compile(pattern, flags)
> File "/Library/Frameworks/Python.framework/Versions/2.4/lib/
> python2.4/sre.py", line 225, in _compile
> p = sre_compile.compile(pattern, flags)
> File "/Library/Frameworks/Python.framework/Versions/2.4/lib/
> python2.4/sre_compile.py", line 496, in compile
> p = sre_parse.parse(p, flags)
> File "/Library/Frameworks/Python.framework/Versions/2.4/lib/
> python2.4/sre_parse.py", line 668, in parse
> p = _parse_sub(source, pattern, 0)
> File "/Library/Frameworks/Python.framework/Versions/2.4/lib/
> python2.4/sre_parse.py", line 308, in _parse_sub
> itemsappend(_parse(source, state))
> File "/Library/Frameworks/Python.framework/Versions/2.4/lib/
> python2.4/sre_parse.py", line 396, in _parse
> if state.flags & SRE_FLAG_VERBOSE:
> TypeError: unsupported operand type(s) for &: 'str' and 'int'
>
>
> Yet, these two examples work without error:
>
> ------
> import re
>
> text = 'An<AT0> apple<NN1> for<PRP> you<PNP> .<.>'
> #myR = re.compile(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3')
> print re.sub(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3', text)
>
> myR = re.compile(r'(hello)')
> text = "hello world"
> print re.sub(myR, r"\1XXX", text)
>
> ---------output:
> <AT0>An <NN1>apple <PRP>for <PNP>you <.>.
> helloXXX world
>
>
> Can anyone help?
You can precompile the regular expression like this:
>>> text = 'An<AT0> apple<NN1> for<PRP> you<PNP> .<.>'
>>> r = re.compile(r'(\S+)(<[^>]+>)(\s*)')
>>> r.sub(r'\2\1\3', text)
'<AT0>An <NN1>apple <PRP>for <PNP>you <.>.'
or even
>>> sub = re.compile(r'(\S+)(<[^>]+>)(\s*)').sub
>>> sub(r'\2\1\3', text)
'<AT0>An <NN1>apple <PRP>for <PNP>you <.>.'
Note that this is not as much more efficient as you might think since
re.sub() and the other re functions look up already compiled regexps in a
cache.
Peter
More information about the Python-list
mailing list