Regular expression problem

Asheesh Laroia pan-news at asheeshenterprises.com
Wed Feb 27 22:32:35 EST 2002


This is great, thanks!

Only one problem.  I'm having trouble (I did give it a try) making the
following work:

	<@Trap Body text:>Useful Text

I need to still be able to extract "Useful Text", not delete it.

Thanks again!

-- Asheesh.

On Wed, 27 Feb 2002 20:19:56 -0500, Wolfgang Grafen wrote:

> import re
> 
> rc=re.compile("<@Trap Body text\s*"
>                           "(?:(?P<assigned>=)|(?P<unassigned>>))\s*"
>                           "(?P<rest>.*?)\s*\Z",
>                          re.MULTILINE|re.DOTALL).match
> 
> t1='<@Trap Body text>'
> t2='<@Trap Body text=<FONT "Times">'
> t3="""<@Trap Body text=<FONT "Times"><CCOLOR\n   "Black"><SIZE
> 11><HORIZONTAL 100><LETTERSPACE 0><CTRACK 127><CSSIZE 70><C+SIZE\n
> 58.3><C-POSITION 33.3><C+POSITION 33.3><P><CBASELINE 0><CNOBREAK
> 0><CLEADING -0.05\n  ><GGRID 0><GLEFT 0><GRIGHT 0><GFIRST 19.2><G+BEFORE
> 0><G+AFTER 0><GALIGNMENT \n  "justify\n  "><GMETHOD "proportional"><G&
> "ENGLISH"><GPAIRS 4><G% 120><GKNEXT 0><GKWIDOW \n  1><GKORPHAN\n
> 1><GTABS $><GHYPHENATION 2 36 0><GWORDSPACE 75 100 150><GSPACE -5 0
> 25>>"""
> 
> rc(t1).groups()
> (None, '>', '')
> 
> rc(t2).groups()
> ('=', None, '<FONT "Times">')
> 
> rc(t3).groups()
> ('=', None, '<FONT "Times"><CCOLOR\n   "Black"><SIZE 11><HORIZONTAL
> 100><LETTERSPACE 0><CTRACK 127><CSSIZE 70><C+SIZE\n  58.3><C-POSITION
> 33.3><C+POSITION 33.3><P><CBASELINE 0><CNOBREAK 0><CLEADING -0.05\n
> ><GGRID 0><GLEFT 0><GRIGHT 0><GFIRST 19.2><G+BEFORE 0><G+AFTER
> 0><GALIGNMENT \n  "justify\n  "><GMETHOD "proportional"><G&
> "ENGLISH"><GPAIRS 4><G% 120><GKNEXT 0><GKWIDOW \n  1><GKORPHAN\n
> 1><GTABS $><GHYPHENATION 2 36 0><GWORDSPACE 75 100 150><GSPACE -5 0
> 25>>')
> 
> cheers
> 
> wolfgang
> 
> 
> Asheesh Laroia schrieb:
> 
>> I have some SGML input (PageMaker 6.5 tagged text), and I want to be
>> able to recognize (and delete) a tag.  That tag looks like:
>>
>>         <@Trap Body text:>
>>
>> It may also look like <@Trap Body text: useless-data>.
>>
>> So, I tried the regular expression r"<@.?>".  That doesn't match the
>> above string.  Nor does r"<@.?Trap Body text.?>".  What RE should I be
>> using, and why doesn't this work?
>>
>> Thanks in advance!
>>
>> -- Asheesh Laroia.
>>
>> PS: An example of the tag "in the wild" is the following string:
>>
>> <@Trap Body text=<FONT "Times"><CCOLOR
>>  "Black"><SIZE 11><HORIZONTAL 100><LETTERSPACE 0><CTRACK 127><CSSIZE
>>  70><C+SIZE
>> 58.3><C-POSITION 33.3><C+POSITION 33.3><P><CBASELINE 0><CNOBREAK
>> 0><CLEADING -0.05
>> ><GGRID 0><GLEFT 0><GRIGHT 0><GFIRST 19.2><G+BEFORE 0><G+AFTER
>> >0><GALIGNMENT "justify
>> "><GMETHOD "proportional"><G& "ENGLISH"><GPAIRS 4><G% 120><GKNEXT
>> 0><GKWIDOW 1><GKORPHAN 1><GTABS $><GHYPHENATION 2 36 0><GWORDSPACE 75
>> 100 150><GSPACE -5 0 25>>



More information about the Python-list mailing list