Quickest marshal.loads from unicode?
Alex Martelli
aleax at aleax.it
Fri Feb 21 08:26:14 EST 2003
Giles Brown wrote:
> Problem
> -------
> How can the I take the result of a marshal.dumps call and decode
> (encode?) it
> into a unicode string so that the conversion back to a marshal.loads
> compatible string is as quick as possible?
Measuring is better than guessing -- measure on data that
resemble that you'll be marshaling, of course, but, for example
(with Python 2.3a2):
import time
import marshal
trydata = {4: 'scores', 'and': 7, 'years': 'ago' }
dumped = marshal.dumps(trydata)
def trytime(dumped, codecname, N=100000):
loads = marshal.loads
repeat = N*[None]
unidumped = unicode(dumped, codecname)
start = time.clock()
for x in repeat:
rebuilt = loads(unidumped.encode(codecname))
stend = time.clock()
return "%.2f %s" % (stend-start, codecname)
for codecname in 'utf-8', 'latin-1', 'iso-8859-1', 'raw-unicode-escape':
print trytime(dumped, codecname)
[alex at lancelot Python-2.3a2]$ python -O trym.py
0.34 utf-8
0.37 latin-1
0.52 iso-8859-1
0.66 raw-unicode-escape
[alex at lancelot Python-2.3a2]$ python -O trym.py
0.36 utf-8
0.36 latin-1
0.54 iso-8859-1
0.68 raw-unicode-escape
[alex at lancelot Python-2.3a2]$
So, utf-8 and latin-1 seem the best candidate codecs, with the
others out of the running -- and the difference can be as much
as 1 or 2 microseconds per string encoded-and-unmarshaled back,
even for shortish strings such as I'm using here; of course you
probably want to try adding storing/fetching in an Access DB
(whose timing might swamp such differences anyway, perhaps) to
the benchmark.
Alex
More information about the Python-list
mailing list