Quickest marshal.loads from unicode?

Alex Martelli aleax at aleax.it
Fri Feb 21 08:26:14 EST 2003


Giles Brown wrote:

> Problem
> -------
> How can the I take the result of a marshal.dumps call and decode
> (encode?) it
> into a unicode string so that the conversion back to a marshal.loads
> compatible string is as quick as possible?

Measuring is better than guessing -- measure on data that
resemble that you'll be marshaling, of course, but, for example
(with Python 2.3a2):

import time

import marshal

trydata = {4: 'scores', 'and': 7, 'years': 'ago' }
dumped = marshal.dumps(trydata)

def trytime(dumped, codecname, N=100000):
    loads = marshal.loads
    repeat = N*[None]

    unidumped = unicode(dumped, codecname)

    start = time.clock()

    for x in repeat:
        rebuilt = loads(unidumped.encode(codecname))

    stend = time.clock()

    return "%.2f %s" % (stend-start, codecname)


for codecname in 'utf-8', 'latin-1', 'iso-8859-1', 'raw-unicode-escape':

    print trytime(dumped, codecname)

[alex at lancelot Python-2.3a2]$ python -O trym.py
0.34 utf-8
0.37 latin-1
0.52 iso-8859-1
0.66 raw-unicode-escape
[alex at lancelot Python-2.3a2]$ python -O trym.py
0.36 utf-8
0.36 latin-1
0.54 iso-8859-1
0.68 raw-unicode-escape
[alex at lancelot Python-2.3a2]$

So, utf-8 and latin-1 seem the best candidate codecs, with the
others out of the running -- and the difference can be as much
as 1 or 2 microseconds per string encoded-and-unmarshaled back,
even for shortish strings such as I'm using here; of course you
probably want to try adding storing/fetching in an Access DB
(whose timing might swamp such differences anyway, perhaps) to
the benchmark.


Alex





More information about the Python-list mailing list