Efficient mechanism for str.startswith on a set.
Fredrik Lundh
fredrik at pythonware.com
Tue Jan 10 09:56:47 EST 2006
Brian Cole wrote:
>I need to iterate through a file, checking whether each line
> 'startswith()' a string that belongs to a set.
>
> Normally, the most efficient way I would do this would be:
> strs=set(['foo','bar'])
> for line in file:
> if line.strip() in strs:
> print line
>
> However, for this case I need to do a startswith like this:
> for line in file:
> for s in strs:
> if line.startswith(s):
> print line
>
> This is obviously the least efficient manner to do this as I'll always
> be iterating over the entire 'strs'. I know I could make a binary tree
> out of 'strs' but that's a little more work that don't have time to do
> today. I know there should be something out there if I just ask nicely
> enough.
the RE approach used in this MultiReplace class can be quite efficient:
http://effbot.org/zone/python-replace.htm
e.g.
strs.sort() # lexical order
strs.reverse() # use longest match first
pattern = string.join(map(re.escape, strs), "|")
match = re.compile(pattern).match
for line in file:
if match(line):
print line
a less general solution would be to check line[:x] against a dictionary, where
x is max(len(s) for s in strs), and use a fallback test if the test fails (which up-
dates the dictionary). this only works well if x is relatively small, the files are
relatively regular and not too large, and most lines match.
</F>
More information about the Python-list
mailing list