Overcoming regex memory limits?
hst at empolis.co.uk
Thu Oct 31 17:51:41 CET 2002
It's best to avoid .*? whenever you can as, as you have found, it is defeated with long runs of matching text. Often you can use a negated character class. This is both faster and doesn't fail on long runs of matches. for example instead of
'<(.*?)>' use '<([^>]*>'.
Where negated character classes aren't sufficient, other more complex expressions can be used, but it's difficult to generalise. Could you post a sample of what you are trying to match? Jeffrey Friedl's "Mastering Regular Expressions" 2nd edition is an excellent book on the subject.
> -----Original Message-----
> From: Yin [mailto:yin_12180 at yahoo.com]
> Sent: 31 October 2002 15:28
> To: python-list at python.org
> Subject: Overcoming regex memory limits?
> I am using python to parse a large text file. I am using the (.*?)
> construct in regular expressions to do matching. Unfortunately, I
> exceed the limit for the match size in this regular expression due to
> an overflow of the stack.
> I've heard that it may be possible to match without using the (.*?)
> construct and this may solve the problem. Any suggestions short of a
> rewriting the parsing routine would be appreciated.
> Thanks in advance.
> This message has been checked for all known viruses by Star Internet
> delivered through the MessageLabs Virus Scanning Service. For further
> information visit http://www.star.net.uk/stats.asp or
> alternatively call
> Star Internet for details on the Virus Scanning Service.
This message has been checked for all known viruses by the MessageLabs Virus Scanning Service.
More information about the Python-list