memory usage multi value hash

Fri Apr 15 04:01:45 EDT 2011

On Friday 15 April 2011 02:13:51 christian wrote:
> Hello,
> 
> i'm not very experienced in python. Is there a way doing
> below more memory efficient and maybe faster.
> I import a  2-column file and  then concat for every unique
> value in the first column ( key) the value from the second
> columns.
> 
> So The ouptut is something like that.
> A,1,2,3
> B,3,4
> C,9,10,11,12,90,34,322,21
> 
> 
> Thanks for advance & regards,
> Christian
> 
> 
> import csv
> import random
> import sys
> from itertools import groupby
> from operator import itemgetter
> 
> f=csv.reader(open(sys.argv[1]),delimiter=';')
> z=[[i[0],i[1]] for i in f]
> z.sort(key=itemgetter(0))
> mydict = dict((k,','.join(map(itemgetter(1), it)))
>            for k, it in groupby(z, itemgetter(0)))
> del(z)
> 
> f = open(sys.argv[2], 'w')
> for k,v in mydict.iteritems():
>     f.write(v + "\n")
> 
> f.close()
Two alternative solutions - the second one with generators is 
probably the  most economical as far as RAM usage is concerned.

For  you example data1.txt is taken as follows:
A, 1
B, 3
C, 9
A, 2
B, 4
C, 10
A, 3
C, 11
C, 12
C, 90
C, 34
C, 322
C, 21

The "two in one" program is:
#!/usr/bin python
'''generate.py - Example of reading long two column csv list and
sorting. Thread "memory usage multi value hash"
'''

# Determine a set of unique column 1 values
unique_set = set()
with open('data1.txt') as f:
    for line in f:
        unique_set.add(line.split(',')[0])
    print(unique_set)
with open('data1.txt') as f:
    for x in unique_set:
        ls = [line.split(',')[1].rstrip() for line in f if 
line.split(',')[0].rstrip() == x]
        print(x.rstrip(), ','.join(ls))
        f.seek(0)

print ('\n Alternative solution with generators')
with open('data1.txt') as f:
    for x in unique_set:
        gs = (line.split(',')[1].rstrip() for line in f if 
line.split(',')[0].rstrip() == x)
        s = ''
        for ds in gs:
            s = s + ds
        print(x.rstrip(), s)
        f.seek(0)

The output is:
{'A', 'C', 'B'}
A  1, 2, 3
C  9, 10, 11, 12, 90, 34, 322, 21
B  3, 4

 Alternative solution with generators
A  1 2 3
C  9 10 11 12 90 34 322 21
B  3 4

Notice that data sequence could be different, without any effect 
on output.

OldAl.

-- 
Algis
http://akabaila.pcug.org.au/StructuralAnalysis.pdf