[Tutor] Python - help with something most essential

Sun Jun 11 04:33:06 EDT 2017

Japhy Bartlett wrote:

> I'm not sure that they cared about how you used file.readlines(), I think
> the memory comment was a hint about instantiating Counter()s

Then they would have been clueless ;)

Both Schtvveer's original script and his subsequent "Verschlimmbesserung" -- 
beautiful german word for making things worse when trying to improve them --  
use only two Counters at any given time. The second version is very 
inefficient because it builds the same Counter over and over again -- but 
this does not affect peak memory usage much.

Here's the original version that triggered the comment:

[Schtvveer Schvrveve]

> import sys
> from collections import Counter
> 
> def main(args):
>     filename = args[1]
>     word = args[2]
>     print countAnagrams(word, filename)
> 
> def countAnagrams(word, filename):
> 
>     fileContent = readFile(filename)
> 
>     counter = Counter(word)
>     num_of_anagrams = 0
> 
>     for i in range(0, len(fileContent)):
>         if counter == Counter(fileContent[i]):
>             num_of_anagrams += 1
> 
>     return num_of_anagrams
> 
> def readFile(filename):
> 
>     with open(filename) as f:
>         content = f.readlines()
> 
>     content = [x.strip() for x in content]
> 
>     return content
> 
> if __name__ == '__main__':
>     main(sys.argv)
> 

referenced as before.py below, and here's a variant that removes 
readlines(), range(), and the [x.strip() for x in content] list 
comprehension, the goal being minimal changes, not code as I would write it 
from scratch.

# after.py
import sys
from collections import Counter

def main(args):
    filename = args[1]
    word = args[2]
    print countAnagrams(word, filename)

def countAnagrams(word, filename):

    fileContent = readFile(filename)
    counter = Counter(word)
    num_of_anagrams = 0

    for line in fileContent:
        if counter == Counter(line):
            num_of_anagrams += 1

    return num_of_anagrams

def readFile(filename):
    # this relies on garbage collection to close the file
    # which should normally be avoided
    for line in open(filename):
        yield line.strip()

if __name__ == '__main__':
    main(sys.argv)

How to measure memoryview? I found
<https://stackoverflow.com/questions/774556/peak-memory-usage-of-a-linux-unix-process> and as test data I use files containing 10**5 and 10**6 
integers. With that setup (snipping everything but memory usage from the 
time -v output):

$ /usr/bin/time -v python before.py anagrams5.txt 123
6
        Maximum resident set size (kbytes): 17340
$ /usr/bin/time -v python before.py anagrams6.txt 123
6
        Maximum resident set size (kbytes): 117328

$ /usr/bin/time -v python after.py anagrams5.txt 123
6
        Maximum resident set size (kbytes): 6432
$ /usr/bin/time -v python after.py anagrams6.txt 123
6
        Maximum resident set size (kbytes): 6432

See the pattern? before.py uses O(N) memory, after.py O(1). 

Run your own tests if you need more datapoints or prefer a different method 
to measure memory consumption.