[Python-Dev] Can I make marshal.dumps() slower but stabler?

INADA Naoki songofacandy at gmail.com
Thu Jul 12 04:15:39 EDT 2018


On Thu, Jul 12, 2018 at 3:22 PM Serhiy Storchaka <storchaka at gmail.com> wrote:
>
> 12.07.18 08:43, INADA Naoki пише:
> > I'm working on making pyc stable, via stablizing marshal.dumps()
> > https://bugs.python.org/issue34093
>
> This is not enough for making pyc stable. The order in frozesets still
> is arbitrary.

But we can use PYTHONHASHSEED to make pyc stable.
Currently, refcnt is the only known issue for reproducible pyc build.

>
> > Sadly, it makes marshal.dumps() 40% slower.
> > Luckily, this overhead is small (only 4%) for dumps(compile(source)) case.
>
> What about the memory consumption?

No overhead, because we already used same hashtable for w_ref.
I just make it two-pass, instead of one-pass.

>
> > So my question is:  May I remove unstable but faster code?
> >
> > Or should I make this optional and we maintain two complex code?
> > If so, should this option enabled by default or not?
>
> My concern is that even if not make it optional, this will complicate
> the code.

When it's not optional, it makes almost duplicate of w_object for
reference counting in object tree.
https://github.com/python/cpython/pull/8226/commits/e170116e80dfd27f923c88fc11e42f0d6f687a00

>
> > For example, xmlrpc uses marshal.  But xmlrpc has significant overhead
> > other than marshaling, like dumps(compile(source)) case.  So I expect
> > marshal.dumps() performance is not critical for it too.
>
> xmlrpc doesn't use the marshal module. It uses terms marshalling and
> unmarshalling, but in different meaning.
>

Oh, I just grepped and misunderstood.

> > Is there any real application which marshal.dumps() performance is critical?
> EVE Online is a well known example.
>

Do they use version>=3?
In version 3, FLAG_REF is introduced and it made significant runtime
overhead already.
If marshaling speed is very important, version<2 should be used.

> What if write a script which loads .pyc files and stabilize them? This
> could solve the problem for applications which need stable .pyc files,
> with zero impact on common use.
>

Hmm, do you mean which?:

* Adding marshal.dump_stable_pyc() and use it like
  `marshal.dump_stable_pyc(marshal.loads(code))`
* Implementing pure Python marshal.dumps in distutils

--
INADA Naoki  <songofacandy at gmail.com>


More information about the Python-Dev mailing list