Regular expression problem
Sean 'Shaleh' Perry
shalehperry at attbi.com
Thu Feb 28 07:17:11 CET 2002
On 28-Feb-2002 Asheesh Laroia wrote:
> I've been trying to use sgmllib, actually, to delete all the other tags.
> It just doesn't handle the <@ [...] > condition well. It refuses to
> parse it, treating it as text.
The reason is this:
starttagopen = re.compile('<[>a-zA-Z]')
tagfind = re.compile(r'[a-zA-Z][-_.a-zA-Z0-9]*')
near the top of sgmllib.py.
Changing them in your code will allow the parser to understand the tag.
However there is another problem which requires more work. When a tag is found
the parser tries to run 'start_' + tag. start_ at Trap() is not a valid python
name. You could redefine the function which calls the handlers so that it
looks for perhaps start_atTrap(). This would allow you to use the SGMLParser
for all of your parsing needs, but may also be overkill for the problem.
More information about the Python-list