
def filter(rule, whatever): if rule.x in whatever.x: return True
rules = get_rules() whatevers = get_whatevers() for rule in rules: for whatever in whatevers: if filter(rule, whatever): cnt = cnt + 1
return cnt
This code seems almost certainly broken as written. Not just suboptimal, but just plain wrong. As a start, it has a return statement outside a function or method body, so it's not even syntactical (also, `cnt` is never initialized). But assuming we just print out `cnt` it still gives an answer we probably don't want. For example, I wrote get_rules() and get_whatevers() functions that produce a list of "things" that have an x attribute. Each thing holds a (distinct) word from a Project Gutenberg book (Title: Animal Locomotion: Or walking, swimming, and flying, with a dissertation on aëronautics; Author: J. Bell Pettigrew). Whatevers omit a few of the words, since the code suggests there should be more rules. In particular: In [1]: len(get_rules()), len(get_whatevers()) Out [1]: (12306, 12301) Running the probably wrong code is indeed slow: In [2]: %%time def filter(rule, whatever): if rule.x in whatever.x: return True rules = get_rules() whatevers = get_whatevers() cnt = 0 for rule in rules: for whatever in whatevers: if filter(rule, whatever): cnt = cnt + 1 print(cnt) Out [2]: 110134 CPU times: user 53.1 s, sys: 190 ms, total: 53.3 s Wall time: 53.6 s It's hard for me to imagine why this is the question one would want answered. It seems much more likely you'd want to know: In [3]: %%time len({thing.x for thing in get_rules()} - {thing.x for thing in get_whatevers()}) Out [3]: CPU times: user 104 ms, sys: 4.89 ms, total: 109 ms Wall time: 112 ms 5 So that's 500 times faster, more Pythonic, and seems to actually answer the question one would want answered. However, let's suppose there really is a reason to answer the question in the original code. Using more sensible basic datatypes, we only get about a 3x speedup: In [4]: %%time rules = {thing.x for thing in get_rules()} whatevers = {thing.x for thing in get_whatevers()} cnt = 0 for rule in rules: for whatever in whatevers: if rule in whatever: cnt += 1 print(cnt) Out [4]: 110134 CPU times: user 18.3 s, sys: 96.9 ms, total: 18.4 s Wall time: 18.5 s I'm sure there is room for speedup if the actual problem being solved was better described. Maybe something involving itertools.product(), but it's hard to say without knowing what the correct behavior actually is. Overall this is similar to saying you could implement bogosort in Python, and it would be much slower than calling Timsort with `sorted(my_stuff)`