[SciPy-Dev] Custom __repr__?

Robert Kern robert.kern at gmail.com
Thu May 13 21:39:52 EDT 2021


On Thu, May 13, 2021 at 9:03 PM Andrew Nelson <andyfaff at gmail.com> wrote:

> Having a proper __repr__ that allows to recreate objects is nice, but
>> serialization seems to be a misuse of __repr__. We have the __getstate__
>> __setstate__ mechanism in Python for serialization. Could you explain your
>> use case a bit more for my curiosity?
>>
>
> It's the recreation of objects that I wanted to have. My use case is for a
> piece of code that's running in a PyQt5 GUI to output a standalone Python
> script that can then be executed separately (on a cluster). The script has
> to recreate an Object that has a few layers of complexity (Objective -->
> Model --> Structure --> Layers --> Parameters --> Parameter --> Bounds). By
> implementing a __repr__ for all of those it allows me to call
> `repr(Objective)` to create the script. The Bounds are more than likely a
> uniform distribution, which I've implemented myself for speed, but they can
> also be any scipy.stats.rv_continuous distribution. At the moment I can't
> recreate rv_continuous from their __repr__.
>

You'll run into more such objects. Having a round-trippable `__repr__()`
isn't all that common. Even `ndarray` will trip you up
with its summarization. And that's not even considering what kind of
imports you need to have done in order for even the eval()able
representations to actually evaluate. You will probably need to incorporate
a subsystem in your code generation to get eval()able string
representations of the objects you care about. You can often bound the
effort a bit more than what the class author would need to do since you
likely only deal with a subset of functionality (do you need eval()able
datetime64 arrays? Probably not. Do you need array parameter arguments to
the distributions or just scalars? Maybe). You'll need that to make sure
the imports are there in any case.

I suppose I could create a script and associated pickle file, with the
> script loading the pickle file. This will be less robust/reproducible, both
> laterally and vertically (laterally = use at same time, but on different
> machine, with roughly similar Python environment; vertically = use N years
> later). The main issue with vertically is that new attributes may get added
> to classes which means that unpickling issues often occur because the
> pickle doesn't possess the new attributes. Writing out classes with a
> __repr__ doesn't have that issue because they're created from scratch.
>
> There is a slightly different aspect to this kind of issue which is also
> important to me, saving the state of a GUI program. At the moment I save
> the state of the GUI into a pickle, but that's not necessarily readable in
> later versions of the program because of the forwards compatibility of the
> pickle against updated classes. I'd be interested in knowing how to
> serialise these kinds of complex structures into something that's e.g.
> json/text file based.
>

Having done this many times in many ways, there's no easy road for this.
You can either have a system that can serialize any object and be subject
to the forward compatibility problem, or you can have a system that has to
know about every object and give it information about where to put the
serialized data from outdated classes. Moving from pickle to JSON or other
format is irrelevant: those are just file formats. Any serialization system
that can just take any Python object is also going to record the same kind
of "fragile" information that is going to go out of date as you change the
classes. You can build a serialization system that manages the update
information with any format, even pickle. Building that off of pickle has a
lot of advantages as the __getstate__()/__setstate__() machinery is
standard; handling changes of the class itself like the one you describe
above just takes the author treating the serialization as part of its API.
The only thing you have to handle in your serialization system, per se, is
dealing with class renamings/refactorings. The pickle system doesn't
necessarily make that *easy*, but it's not terrible.

Moving to a JSON file format doesn't really solve the problem; it just
makes you rewrite all that machinery from scratch so you have to solve it
at the same time as you're rewriting everything else. The JSON (or HDF5 or
whatever) file format may have other benefits, like communication with
non-Python programs. The "opportunity" for writing everything from scratch
will make you take serialization more seriously and not make changes that
would muck things up by accident.

I have done it all the different ways. They are all painful.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scipy-dev/attachments/20210513/ffc023f0/attachment.html>


More information about the SciPy-Dev mailing list