Peter Otten __peter__ at web.de
Thu Mar 20 15:08:31 CET 2014

ishish wrote:

> This might sound weird, but is there a limit how many dictionaries a 
> can create/use in a single script?

> My reason for asking is I split a 2-column-csv (phone#, ref#) file into 
> a dict and am trying to put duplicated phone numbers with different ref 
> numbers into new dictionaries. The script deducts the duplicated 46 
> numbers but it only creates batch1.csv. Since I obviously can't see the 
> wood for the trees here, can someone pls punch me into the right 
> direction....
> ...(No has_key is fine, its python 2.7)
> f = open("file.csv", 'r')

Consider a csv with the lines

> myDict = {}
> Batch1 = {}
> Batch2 = {}
> Batch3 = {}
> for line in f:
>         if line.startswith('Number' ):
>                 print "First line ignored..."
>         else:
>                 k, v = line.split(',')
>                 myDict[k] = v

the first time around the assignment is

myDict["123"] = "first\n"

the second time it is

myDict["123"] = "second\n"

i. e. you are overwriting the previous value and only keep the value 
corresponding to the last occurrence of a key.

A good approach to solve the problem of keeping an arbitrary number of 
values per key is to make the dict value a list:

myDict = {}
with open("data.csv") as f:
    next(f) # skip first line
    for line in f:
        k, v = line.split(",")
        myDict.setdefault(k, []).append(v)

This will produce a myDict
   "123": ["first\n", "second\n"],
   "456": ["third\n"]

You can then proceed to find out the number of batches:

num_batches = max(len(v) for v in myDict.values())

Now write the files:

for index in range(num_batches):
    with open("batch%s.csv" % (index+1), "w") as f:
        for key, values in myDict.items():
            if len(values) > index: # there are more than index duplicates
                f.write("%s,%s" % (key, values[index]))

More information about the Python-list mailing list