Overcoming regex memory limits?

Harvey Thomas hst at empolis.co.uk
Thu Oct 31 17:51:41 CET 2002


It's best to avoid .*? whenever you can as, as you have found, it is defeated with long runs of matching text. Often you can use a negated character class. This is both faster and doesn't fail on long runs of matches. for example instead of
'<(.*?)>' use '<([^>]*>'.
Where negated character classes aren't sufficient, other more complex expressions can be used, but it's difficult to generalise. Could you post a sample of what you are trying to match? Jeffrey Friedl's "Mastering Regular Expressions" 2nd edition is an excellent book on the subject.

> -----Original Message-----
> From: Yin [mailto:yin_12180 at yahoo.com]
> Sent: 31 October 2002 15:28
> To: python-list at python.org
> Subject: Overcoming regex memory limits?
> 
> 
> I am using python to parse a large text file.  I am using the (.*?)
> construct in regular expressions to do matching.  Unfortunately, I
> exceed the limit for the match size in this regular expression due to
> an overflow of the stack.
> 
> I've heard that it may be possible to match without using the (.*?)
> construct and this may solve the problem.  Any suggestions short of a
> rewriting the parsing routine would be appreciated.
> 
> Thanks in advance.
> Yin
> -- 
> http://mail.python.org/mailman/listinfo/python-list
> 
> _____________________________________________________________________
> This message has been checked for all known viruses by Star Internet
> delivered through the MessageLabs Virus Scanning Service. For further
> information visit http://www.star.net.uk/stats.asp or 
> alternatively call
> Star Internet for details on the Virus Scanning Service.
> 

_____________________________________________________________________
This message has been checked for all known viruses by the MessageLabs Virus Scanning Service.




More information about the Python-list mailing list