memory usage multi value hash
Algis Kabaila
akabaila at pcug.org.au
Fri Apr 15 04:01:45 EDT 2011
On Friday 15 April 2011 02:13:51 christian wrote:
> Hello,
>
> i'm not very experienced in python. Is there a way doing
> below more memory efficient and maybe faster.
> I import a 2-column file and then concat for every unique
> value in the first column ( key) the value from the second
> columns.
>
> So The ouptut is something like that.
> A,1,2,3
> B,3,4
> C,9,10,11,12,90,34,322,21
>
>
> Thanks for advance & regards,
> Christian
>
>
> import csv
> import random
> import sys
> from itertools import groupby
> from operator import itemgetter
>
> f=csv.reader(open(sys.argv[1]),delimiter=';')
> z=[[i[0],i[1]] for i in f]
> z.sort(key=itemgetter(0))
> mydict = dict((k,','.join(map(itemgetter(1), it)))
> for k, it in groupby(z, itemgetter(0)))
> del(z)
>
> f = open(sys.argv[2], 'w')
> for k,v in mydict.iteritems():
> f.write(v + "\n")
>
> f.close()
Two alternative solutions - the second one with generators is
probably the most economical as far as RAM usage is concerned.
For you example data1.txt is taken as follows:
A, 1
B, 3
C, 9
A, 2
B, 4
C, 10
A, 3
C, 11
C, 12
C, 90
C, 34
C, 322
C, 21
The "two in one" program is:
#!/usr/bin python
'''generate.py - Example of reading long two column csv list and
sorting. Thread "memory usage multi value hash"
'''
# Determine a set of unique column 1 values
unique_set = set()
with open('data1.txt') as f:
for line in f:
unique_set.add(line.split(',')[0])
print(unique_set)
with open('data1.txt') as f:
for x in unique_set:
ls = [line.split(',')[1].rstrip() for line in f if
line.split(',')[0].rstrip() == x]
print(x.rstrip(), ','.join(ls))
f.seek(0)
print ('\n Alternative solution with generators')
with open('data1.txt') as f:
for x in unique_set:
gs = (line.split(',')[1].rstrip() for line in f if
line.split(',')[0].rstrip() == x)
s = ''
for ds in gs:
s = s + ds
print(x.rstrip(), s)
f.seek(0)
The output is:
{'A', 'C', 'B'}
A 1, 2, 3
C 9, 10, 11, 12, 90, 34, 322, 21
B 3, 4
Alternative solution with generators
A 1 2 3
C 9 10 11 12 90 34 322 21
B 3 4
Notice that data sequence could be different, without any effect
on output.
OldAl.
--
Algis
http://akabaila.pcug.org.au/StructuralAnalysis.pdf
More information about the Python-list
mailing list