[Tutor] Python - help with something most essential
Peter Otten
__peter__ at web.de
Sun Jun 11 04:33:06 EDT 2017
Japhy Bartlett wrote:
> I'm not sure that they cared about how you used file.readlines(), I think
> the memory comment was a hint about instantiating Counter()s
Then they would have been clueless ;)
Both Schtvveer's original script and his subsequent "Verschlimmbesserung" --
beautiful german word for making things worse when trying to improve them --
use only two Counters at any given time. The second version is very
inefficient because it builds the same Counter over and over again -- but
this does not affect peak memory usage much.
Here's the original version that triggered the comment:
[Schtvveer Schvrveve]
> import sys
> from collections import Counter
>
> def main(args):
> filename = args[1]
> word = args[2]
> print countAnagrams(word, filename)
>
> def countAnagrams(word, filename):
>
> fileContent = readFile(filename)
>
> counter = Counter(word)
> num_of_anagrams = 0
>
> for i in range(0, len(fileContent)):
> if counter == Counter(fileContent[i]):
> num_of_anagrams += 1
>
> return num_of_anagrams
>
> def readFile(filename):
>
> with open(filename) as f:
> content = f.readlines()
>
> content = [x.strip() for x in content]
>
> return content
>
> if __name__ == '__main__':
> main(sys.argv)
>
referenced as before.py below, and here's a variant that removes
readlines(), range(), and the [x.strip() for x in content] list
comprehension, the goal being minimal changes, not code as I would write it
from scratch.
# after.py
import sys
from collections import Counter
def main(args):
filename = args[1]
word = args[2]
print countAnagrams(word, filename)
def countAnagrams(word, filename):
fileContent = readFile(filename)
counter = Counter(word)
num_of_anagrams = 0
for line in fileContent:
if counter == Counter(line):
num_of_anagrams += 1
return num_of_anagrams
def readFile(filename):
# this relies on garbage collection to close the file
# which should normally be avoided
for line in open(filename):
yield line.strip()
if __name__ == '__main__':
main(sys.argv)
How to measure memoryview? I found
<https://stackoverflow.com/questions/774556/peak-memory-usage-of-a-linux-unix-process> and as test data I use files containing 10**5 and 10**6
integers. With that setup (snipping everything but memory usage from the
time -v output):
$ /usr/bin/time -v python before.py anagrams5.txt 123
6
Maximum resident set size (kbytes): 17340
$ /usr/bin/time -v python before.py anagrams6.txt 123
6
Maximum resident set size (kbytes): 117328
$ /usr/bin/time -v python after.py anagrams5.txt 123
6
Maximum resident set size (kbytes): 6432
$ /usr/bin/time -v python after.py anagrams6.txt 123
6
Maximum resident set size (kbytes): 6432
See the pattern? before.py uses O(N) memory, after.py O(1).
Run your own tests if you need more datapoints or prefer a different method
to measure memory consumption.
More information about the Tutor
mailing list