[Tutor] counting a list of elements

Sat Apr 2 07:00:13 CEST 2011

Am 01.04.2011 21:31, schrieb Karim:
> On 04/01/2011 08:41 PM, Knacktus wrote:
>> Am 01.04.2011 19:16, schrieb Karim:
>>>
>>>
>>> Hello,
>>>
>>> I would like to get advice on the best practice to count elements in a
>>> list (built from scractch).
>>> The number of elements is in the range 1e3 and 1e6.
>>>
>>> 1) I could create a generator and set a counter (i +=1) in the loop.
>>>
>>> 2) or simply len(mylist).
>>>
>>> I don't need the content of the list, indeed, in term of memory I don't
>>> want to wast it. But I suppose len() is optimized too (C impementation).
>>>
>>> If you have some thought to share don't hesitate.
>>
>> Just a general suggestion: Provide code examples. I know most of the
>> times you don't have code examples yet as you're thinking of how to
>> solve your problems. But if you post one of the possible solutions the
>> experienced guys here will very likely direct you in the proper
>> direction. But without code it's hard to understand what you're after.
>>
>> Cheers,
>>
>> Jan
>>
>
> Thank you all for you answers to clarified I built a collection of
> dictionnaries which represent database query on a bug tracking system:
>
> backlog_tables , csv_backlog_table = _backlog_database(table=table,
> periods=intervals_list)
>
> backlog_tables is a dictionnary of bug info dictionnaries. The keys of
> backlog_tables is a time intervall (YEAR-MONTH) as shown below:
>
> backlog_tables= {'2011-01-01': [{'Assigned Date': datetime.date(2010,
> 10, 25),
> 'Category': 'Customer_claim',
> 'Date': datetime.date(2010, 10, 22),
> 'Duplicate Date': None,
> 'Fixed Reference': None,
> 'Headline': 'Impovement for all test',
> 'Identifier': '23269',
> 'Last Modified': datetime.date(2010, 10, 25),
> 'Priority': 'Low',
> 'Project': 'MY_PROJECT',
> 'Reference': 'MY_PROJECT at 1.7beta2@20101006.0',
> 'Resolved Date': None,
> 'Severity': 'improvement',
> 'State': 'A',
> 'Submitter': 'Somebody'},
> .....
> }
>
> _backlog_database() compute the tuple backlog_tables , csv_backlog_table:
> In fact csv_backlog_table is the same as backlog_tables but instead of
> having
> the query dictionnaries it holds only the number of query which I use to
> create
> a CSV file and a graph over time range.
>
> _backlog_database() is as follow:
>
> def _backlog_database(table=None, periods=None):
> """Internal function. Re-arrange database table
> according to a time period. Only monthly management
> is computed in this version.
>
> @param table the database of the list of defects. Each defect is a
> dictionnary with fixed keys.
> @param periods the intervals list of months and the first element is the
> starting date and the
> the last element is the ending date in string format.
> @return (periods_tables, csv_table), a tuple of periodic dictionnary
> table and
> the same keys dictionnary with defect numbers associated values.
> """
> if periods is None:
> raise ValueError('Time interval could not be empty!')
>
> periods_tables = {}
> csv_table = {}
>
> interval_table = []
>
> for interval in periods:
> split_date = interval.split('-')
> for row in table:
> if not len(split_date) == 3:
> limit_date = _first_next_month_day(year=int(split_date[0]),
> month=int(split_date[1]), day=1)
> if row['Date'] < limit_date:
> if not row['Resolved Date']:
> if row['State'] == 'K':
> if row['Last Modified'] >= limit_date:
> interval_table.append(row)
> elif row['State'] == 'D':
> if row['Duplicate Date'] >= limit_date:
> interval_table.append(row)
> # New, Assigned, Opened, Postponed, Forwarded, cases.
> else:
> interval_table.append(row)
> else:
> if row['Resolved Date'] >= limit_date:
> interval_table.append(row)
>
> periods_tables[interval] = interval_table
> csv_table[interval] = str(len(interval_table))
>
> interval_table = []
>
> return periods_tables, csv_table
>
>
> This is not the whole function I reduce it on normal case but it shows
> what I am doing.
> In fact I choose to have both dictionnaries to debug my function and
> analyse what's going
> on. When everything will be fine I will need only the csv table (with
> number per period) to create the graphs.
> That's why I was asking for length computing. Honnestly, the actual
> queries number is 500 (bug id) but It could be more
> in other project. I was ambitious when I sais 1000 to 100000
> dictionnaries elements but for the whole
> list of products we have internally It could be 50000.

I see some similarity with my coding style (doing things "by the way"), 
which might not be so good ;-).

With this background information I would keep the responsibilities 
seperated. Your _backlog_database() function is supposed to do one 
thing: Return a dictionary which holds the interval and a list of result 
dicts. You could call this dict interval_to_result_tables (to indicate 
that the values are lists). That's all your function should do.

Then you want to print a report. This piece of functionality needs to 
know how long the lists for each dictionary entry are. Then this 
print_report function should be responsible to get the information it 
needs by creating it itself or calling another function, which has the 
purpose to create the information. Latter would be a bit too much, as 
the length would be simply be:

number_of_tables = len(interval_to_result_tables[interval])

I hope I understood your goals correctly and could help a bit,

Jan

>
> Regards
> Karim
>
>>
>>>
>>> Karim
>>> _______________________________________________
>>> Tutor maillist - Tutor at python.org
>>> To unsubscribe or change subscription options:
>>> http://mail.python.org/mailman/listinfo/tutor
>>
>> _______________________________________________
>> Tutor maillist - Tutor at python.org
>> To unsubscribe or change subscription options:
>> http://mail.python.org/mailman/listinfo/tutor
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor