Regular expression problem
pan-news at asheeshenterprises.com
Fri Mar 1 03:14:16 CET 2002
Actually, I think this is the most elegant solution I've seen so far.
Good thinking; I forgot to "Use the Source," as some put it.
Only one problem: the parser still balks on embedded tags, like:
<@Trap Body text:<P><I><B>>
It leaves an extra '>' character at the end. Any suggestions? I can
write a simple workaround for something like this, but it seems like
it should work "the right way."
Thanks for everything!
On Thu, 28 Feb 2002 01:17:11 -0500, Sean 'Shaleh' Perry wrote:
> On 28-Feb-2002 Asheesh Laroia wrote:
>> I've been trying to use sgmllib, actually, to delete all the other
>> It just doesn't handle the <@ [...] > condition well. It refuses to
>> parse it, treating it as text.
> The reason is this:
> starttagopen = re.compile('<[>a-zA-Z]') tagfind =
> near the top of sgmllib.py.
> Changing them in your code will allow the parser to understand the tag.
> However there is another problem which requires more work. When a tag
> is found the parser tries to run 'start_' + tag. start_ at Trap() is not a
> valid python name. You could redefine the function which calls the
> handlers so that it looks for perhaps start_atTrap(). This would allow
> you to use the SGMLParser for all of your parsing needs, but may also be
> overkill for the problem.
More information about the Python-list