Need design advice. What's my best approach for storing this data?
lycka at carmen.se
Mon Mar 20 11:00:21 CET 2006
> I am trying to build a tool that analyzes stock data. Therefore I am
> going to download and store quite a vast amount of it. Just for a
> general number - assuming there are about 7000 listed stocks on the two
> major markets plus some extras, 255 tradying days a year for 20 years,
> that is about 36 million entries.
> Obviously a database is a logical choice for that. However I've never
> used one, nor do I know what benefits I would get from using one. I am
> worried about speed, memory usage, and disk space.
This is a typical use case for relational database systems.
With something like DB2 or Oracle here, you can take advantage
of more than 20 years of work by lots of developers trying to
solve the kind of problems you will run into.
You haven't really stated all the facts to decide what product
to choose though. Will this be a multi-user applications?
Do you forsee a client/server application? What operating
system(s) do you need to support?
With relational databases, it's plausible to move some of
the hard work in the data analysis into the server. Using
this well means that you need to learn a bit about how
relational databases work, but I think it's with the trouble.
It could mean that much less data ever needs to reach your
Python program for processing, and that will mean a lot for
your performance. Relational databases are very good at
searching, sorting and simple aggregations of data. SQL is
a declarative language, and in principle, your SQL code
will just declare the correct queries and manipulations that
you want to achieve, and tuning will be a separate activity,
which doesn't need to involve program changes. In reality,
there are certainly cases where changes in SQL code will
influence performance, but to a very large extent, you can
achieve good performance through building indices and by
letting the database gather statistics and analyze the
queries your programs contain. As a bonus, you also have
advanced systems for security, transactional safety, on-
line backup, replication etc.
You don't get these advantages with any other data storage
I'd get Chris Fehily's "SQL Visual Quickstart Guide", which
is as good as his Python book. As database, it depends a bit
on your platform you work with. I'd avoid MySQL. Some friends
of mine have used it for needs similar to yours, and they are
now running into its severe shortcomings. (I did warn them.)
For Windows, I think the single user version of SQL Server
(MSDE?) is gratis. For both Windows and Linux/Unix, there are
(I think) gratis versions of both Oracle 10g, IBM DB2 UDB and
Mimer SQL. Mimer SQL is easy to install, Oracle is a pain, and
I think DB2 is somewhere in between. PostgreSQL is also a good
Either way, it certainly seems natural to learn relational
databases and SQL if you want to work with financial software.
More information about the Python-list