searching and storing large quantities of xml!

dads wayne.dads.bell at gmail.com
Sat Jan 16 19:10:37 CET 2010


I work in as 1st line support and python is one of my hobbies. We get
quite a few requests for xml from our website and its a long strung
out process. So I thought I'd try and create a system that deals with
it for fun.

I've been tidying up the archived xml and have been thinking what's
the best way to approach this issue as it took a long time to deal
with big quantities of xml. If you have 5/6 years worth of 26000+
5-20k xml files per year. The archived stuff is zipped but what is
better, 26000 files in one big zip file, 26000 files in one big zip
file but in folders for months and days, or zip files in zip files!

I created an app in wxpython to search the unzipped xml files by the
modified date and just open them up and just using the something like
l.find('>%s<' % fiveDigitNumber) != -1: is this quicker than parsing
the xml?

Generally the requests are less than 3 months old so that got me into
thinking should I create a script that finds all the file names and
corresponding web number of old xml and bungs them into a db table one
for each year and another script that after everyday archives the xml
and after 3months zip it up, bungs info into table etc. Sorry for the
ramble I just want other peoples opinions on the matter. =)



More information about the Python-list mailing list