python and very large data sets???

John Machin sjmachin at lexicon.net
Thu Apr 25 22:41:28 EDT 2002


holger krekel <pyth at devel.trillke.net> wrote in message news:<mailman.1019747686.17849.python-list at python.org>...
> On Thu, Apr 25, 2002 at 07:29:27AM -0700, Rad wrote:
 
> > All four files have dates in them but they do come in YYYYMMDD format
> > and I was planning to use string comparisons, same for the rest of
> > data,I was thinking to treat it all as strings.
> 
> Huh? YYMMDD is easily convertible to a 32bit value. see "help('time')"
> on the python-prompt.

Fortunately, Rad believes/hopes he is getting the dates in YYYYMMDD
format, not YYMMDD.

When filtering the first input file, Rad should be concentrating on
rejecting irrelevant records as cheaply as possible e.g.

   if transaction_date < '19990701': continue

Once all irrelevant records are rejected, *IF* he needs to do date
arithmetic, then he can convert his date strings to whatever other
format.

Rad is doing some _data processing_. The mxDateTime module seems to
have been designed with data processing in mind. The time module seems
to be provided by Python for Unix compatibility <wink>.

> You should really start to generate some example files (made up
> by you) and start to code *right now*.

If Rad has never seen the data before, generating some example files
made up by himself would be an exercise in utter futility. He should
be getting sample test files from the source.



More information about the Python-list mailing list