Update: With the last batch of checkins, all sorts on Kevin's company database are faster (a little to a killer lot) under 2.3a0 than under 2.2.1. A reminder of what this looks like:
A record looks like this after running his script to turn them into Python dicts:
{'Address': '395 Page Mill Road\nPalo Alto, CA 94306', 'Company': 'Agilent Technologies Inc.', 'Exchange': 'NYSE', 'NumberOfEmployees': '41,000', 'Phone': '(650) 752-5000', 'Profile': 'http://biz.yahoo.com/p/a/a.html', 'Symbol': 'A', 'Web': 'http://www.agilent.com'}
It appears to me that the XML file is maintained by hand, in order of ticker symbol. But people make mistakes when alphabetizing by hand, and there are 37 indices i such that
data[i]['Symbol'] > data[i+1]['Symbol']
So it's "almost sorted" by that measure ... The proper order of Yahoo profile URLs is also strongly correlated with ticker symbol, while both the company name and web address look weakly correlated [and Address, NumberOfEmployess, and Phone are essentially randomly ordered]
Here are the latest (and I expect the last) timings, in milliseconds per sort, on the list of (key, index, record) tuples values = [(x.get(fieldname), i, x) for i, x in enumerate(data)] [I wrote a little generator to simulate 2.3's enumerate() in 2.2.1] There are 6635 companies in the database, but not all fields are present in all records; .get() plugs in a key of None for those cases, and the index is to prevent equal-key cases from falling into breaking the tie via expensive dict comparison (each record x is a dict!): Sorting on field 'Address' 2.2.1: 41.57 2.3a0: 40.96 Sorting on field 'Company' 2.2.1: 40.14 2.3a0: 29.79 Sorting on field 'Exchange' 2.2.1: 53.83 2.3a0: 24.79 Sorting on field 'NumberOfEmployees' 2.2.1: 47.89 2.3a0: 45.74 Sorting on field 'Phone' 2.2.1: 48.09 2.3a0: 47.15 Sorting on field 'Profile' 2.2.1: 58.41 2.3a0: 8.77 Sorting on field 'Symbol' 2.2.1: 40.78 2.3a0: 6.30 Sorting on field 'Web' 2.2.1: 46.79 2.3a0: 35.64 This may have been sorted more times by now than any other database on Earth <wink>.