Fastest database solution

M.-A. Lemburg mal at
Fri Feb 6 07:19:55 EST 2009

On 2009-02-06 09:10, Curt Hash wrote:
> I'm writing a small application for detecting source code plagiarism that
> currently relies on a database to store lines of code.
> The application has two primary functions: adding a new file to the database
> and comparing a file to those that are already stored in the database.
> I started out using sqlite3, but was not satisfied with the performance
> results. I then tried using psycopg2 with a local postgresql server, and the
> performance got even worse. My simple benchmarks show that sqlite3 is an
> average of 3.5 times faster at inserting a file, and on average less than a
> tenth of a second slower than psycopg2 at matching a file.
> I expected postgresql to be a lot faster ... is there some peculiarity in
> psycopg2 that could be causing slowdown? Are these performance results
> typical? Any suggestions on what to try from here? I don't think my
> code/queries are inherently slow, but I'm not a DBA or a very accomplished
> Python developer, so I could be wrong.
> Any advice is appreciated.

In general, if you do bulk insert into a large table, you should consider
turning off indexing on the table and recreate/update the indexes in one
go afterwards.

But regardless of this detail, I think you should consider a filesystem
based approach. This is going to be a lot faster than using a
database to store the source code line by line. You can still use
a database for the administration and indexing of the data, e.g.
by storing a hash of each line in the database.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 06 2009)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try our new mxODBC.Connect Python Database Interface for free ! :::: Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

More information about the Python-list mailing list