Regular expression for file name

Christopher T King squirrel at WPI.EDU
Mon Jul 19 09:48:04 EDT 2004


On Sun, 18 Jul 2004, Miki Tebeka wrote:

> In a configuration file there can be ID's and filename tokens.
> The file names have a known suffix (.o or .mls) and I need to get a regular
> expression that will catch filename but not an ID.
> 
> Currently:
> ID = r"[a-zA-Z\.]\w+(?![/\\])"
> FILENAME = r"([a-zA-Z]:)?[\w./\\]+\.((mls)|(o))" 
> 
> However if I have the filename "Sources/kernel/rom_kernel.mls" then
> "Source" is interrupted as ID and "s/kernel/rom_kernel.mls" is interrupted
> as file name.

I'm not familiar with PLY, but my guess as to the cause is that it gives 
you those results because it is trying to match ID first, and then 
FILENAME.  The best way to solve this is to incorporate another restraint 
in your RE, that is, the delimiter at the end of the pattern (presumably 
whitespace):

ID = r"[a-zA-Z\.]\w+(?=\s)"
FILENAME = r"([a-zA-Z]:)?[\w./\\]+\.((mls)|(o))(?=\s)" 

I'm not sure if PLY supports (?=...) or not, but I assume it does, since 
you used its complement ((?!...)) in your original REs.




More information about the Python-list mailing list