I thought I'd share a particularly evil pickle issue. In my refactor of Series to not subclass ndarray, the new pickling tests were breaking. No suprise because I changed __getstate__ to pickle via the BlockManager. In order to ensure compat I thought I could just fix __setstate__ and figure out what to do based on the return state (e.g. the len of the state returned as a tuple or dict or whatever). But no...apparently the reconstruction algorithm takes the class name that it see and tries to create it w/o using __new__ (or anything else that you can intercept), it uses a builtin method called _reconstruct (which is a builtin, but I can't figure out how to override it at all, must be only c-code). And then numpy gets ahold of it (as its an extension type), and complains becuase the class I am trying to instantiate actually isn't a sub-class of ndarray (which it pre-supposes). So, a bit hacky, but using a custom unpickler, then matching on a compatbility class (that sub-classes from ndarray), allows me to return the correct class. The good thing here is that this whole routine isn't even called unless there is a TypeError on the original unpickle whoosh! -------- # new module: compat/unpickle_compat.py import numpy as np import pandas from pandas.core.series import Series from pandas.sparse.series import SparseSeries import pickle class Unpickler(pickle.Unpickler): pass def load_reduce(self): stack = self.stack args = stack.pop() func = stack[-1] if type(args[0]) is type: n = args[0].__name__ if n == 'DeprecatedSeries': stack[-1] = object.__new__(Series) return elif n == 'DeprecatedSparseSeries': stack[-1] = object.__new__(SparseSeries) return value = func(*args) stack[-1] = value Unpickler.dispatch['R'] = load_reduce def load(file): # try to load a compatibility pickle # fake the old class hierarchy # if it works, then return the new type objects try: pandas.core.series.Series = DeprecatedSeries pandas.sparse.series.SparseSeries = DeprecatedSparseSeries with open(file,'rb') as fh: return Unpickler(fh).load() except: raise finally: pandas.core.series.Series = Series pandas.sparse.series.SparseSeries = SparseSeries class DeprecatedSeries(Series, np.ndarray): pass class DeprecatedSparseSeries(DeprecatedSeries): pass
On Sun, Apr 21, 2013 at 3:01 PM, Jeff Reback <jeffreback@gmail.com> wrote:
I thought I'd share a particularly evil pickle issue. In my refactor of Series to not subclass ndarray, the new pickling tests were breaking. No suprise because I changed __getstate__ to pickle via the BlockManager. In order to ensure compat I thought I could just fix __setstate__ and figure out what to do based on the return state (e.g. the len of the state returned as a tuple or dict or whatever).
But no...apparently the reconstruction algorithm takes the class name that it see and tries to create it w/o using __new__ (or anything else that you can intercept), it uses a builtin method called _reconstruct (which is a builtin, but I can't figure out how to override it at all, must be only c-code).
And then numpy gets ahold of it (as its an extension type), and complains becuase the class I am trying to instantiate actually isn't a sub-class of ndarray (which it pre-supposes).
So, a bit hacky, but using a custom unpickler, then matching on a compatbility class (that sub-classes from ndarray), allows me to return the correct class.
The good thing here is that this whole routine isn't even called unless there is a TypeError on the original unpickle
whoosh!
-------- # new module: compat/unpickle_compat.py
import numpy as np import pandas from pandas.core.series import Series from pandas.sparse.series import SparseSeries import pickle
class Unpickler(pickle.Unpickler): pass
def load_reduce(self): stack = self.stack args = stack.pop() func = stack[-1] if type(args[0]) is type: n = args[0].__name__ if n == 'DeprecatedSeries': stack[-1] = object.__new__(Series) return elif n == 'DeprecatedSparseSeries': stack[-1] = object.__new__(SparseSeries) return
value = func(*args) stack[-1] = value
Unpickler.dispatch['R'] = load_reduce
def load(file): # try to load a compatibility pickle # fake the old class hierarchy # if it works, then return the new type objects
try: pandas.core.series.Series = DeprecatedSeries pandas.sparse.series.SparseSeries = DeprecatedSparseSeries with open(file,'rb') as fh: return Unpickler(fh).load() except: raise finally: pandas.core.series.Series = Series pandas.sparse.series.SparseSeries = SparseSeries
class DeprecatedSeries(Series, np.ndarray): pass
class DeprecatedSparseSeries(DeprecatedSeries): pass
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org http://mail.python.org/mailman/listinfo/pandas-dev
Yes, pickle is evil. Will this fix affect pickle.loads/pickle.dumps? I would prefer to get a msgpack or Avro-based serialization format for Series or DataFrame sorted out before we start gutting the internals of the objects. - Wes
avro (better choice that msgpack I think) will be very straightforward add on the format should prob be done independently of internals anyhow at the price of a bit more code, or could store block managers and be somewhat code simpler On Apr 21, 2013, at 9:01 PM, Wes McKinney <wesmckinn@gmail.com> wrote:
On Sun, Apr 21, 2013 at 3:01 PM, Jeff Reback <jeffreback@gmail.com> wrote:
I thought I'd share a particularly evil pickle issue. In my refactor of Series to not subclass ndarray, the new pickling tests were breaking. No suprise because I changed __getstate__ to pickle via the BlockManager. In order to ensure compat I thought I could just fix __setstate__ and figure out what to do based on the return state (e.g. the len of the state returned as a tuple or dict or whatever).
But no...apparently the reconstruction algorithm takes the class name that it see and tries to create it w/o using __new__ (or anything else that you can intercept), it uses a builtin method called _reconstruct (which is a builtin, but I can't figure out how to override it at all, must be only c-code).
And then numpy gets ahold of it (as its an extension type), and complains becuase the class I am trying to instantiate actually isn't a sub-class of ndarray (which it pre-supposes).
So, a bit hacky, but using a custom unpickler, then matching on a compatbility class (that sub-classes from ndarray), allows me to return the correct class.
The good thing here is that this whole routine isn't even called unless there is a TypeError on the original unpickle
whoosh!
-------- # new module: compat/unpickle_compat.py
import numpy as np import pandas from pandas.core.series import Series from pandas.sparse.series import SparseSeries import pickle
class Unpickler(pickle.Unpickler): pass
def load_reduce(self): stack = self.stack args = stack.pop() func = stack[-1] if type(args[0]) is type: n = args[0].__name__ if n == 'DeprecatedSeries': stack[-1] = object.__new__(Series) return elif n == 'DeprecatedSparseSeries': stack[-1] = object.__new__(SparseSeries) return
value = func(*args) stack[-1] = value
Unpickler.dispatch['R'] = load_reduce
def load(file): # try to load a compatibility pickle # fake the old class hierarchy # if it works, then return the new type objects
try: pandas.core.series.Series = DeprecatedSeries pandas.sparse.series.SparseSeries = DeprecatedSparseSeries with open(file,'rb') as fh: return Unpickler(fh).load() except: raise finally: pandas.core.series.Series = Series pandas.sparse.series.SparseSeries = SparseSeries
class DeprecatedSeries(Series, np.ndarray): pass
class DeprecatedSparseSeries(DeprecatedSeries): pass
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org http://mail.python.org/mailman/listinfo/pandas-dev
Yes, pickle is evil. Will this fix affect pickle.loads/pickle.dumps? I would prefer to get a msgpack or Avro-based serialization format for Series or DataFrame sorted out before we start gutting the internals of the objects.
- Wes _______________________________________________ Pandas-dev mailing list Pandas-dev@python.org http://mail.python.org/mailman/listinfo/pandas-dev
I realized I didn't answer your question this just catches on pickle.load try: pickle.load except (TypeError): pickle_compat.load except: if not PY3: raise # try to I unpickle with an encoding here On Apr 21, 2013, at 9:12 PM, Jeff Reback <jeffreback@gmail.com> wrote:
avro (better choice that msgpack I think) will be very straightforward add on
the format should prob be done independently of internals anyhow at the price of a bit more code, or could store block managers and be somewhat code simpler
On Apr 21, 2013, at 9:01 PM, Wes McKinney <wesmckinn@gmail.com> wrote:
On Sun, Apr 21, 2013 at 3:01 PM, Jeff Reback <jeffreback@gmail.com> wrote:
I thought I'd share a particularly evil pickle issue. In my refactor of Series to not subclass ndarray, the new pickling tests were breaking. No suprise because I changed __getstate__ to pickle via the BlockManager. In order to ensure compat I thought I could just fix __setstate__ and figure out what to do based on the return state (e.g. the len of the state returned as a tuple or dict or whatever).
But no...apparently the reconstruction algorithm takes the class name that it see and tries to create it w/o using __new__ (or anything else that you can intercept), it uses a builtin method called _reconstruct (which is a builtin, but I can't figure out how to override it at all, must be only c-code).
And then numpy gets ahold of it (as its an extension type), and complains becuase the class I am trying to instantiate actually isn't a sub-class of ndarray (which it pre-supposes).
So, a bit hacky, but using a custom unpickler, then matching on a compatbility class (that sub-classes from ndarray), allows me to return the correct class.
The good thing here is that this whole routine isn't even called unless there is a TypeError on the original unpickle
whoosh!
-------- # new module: compat/unpickle_compat.py
import numpy as np import pandas from pandas.core.series import Series from pandas.sparse.series import SparseSeries import pickle
class Unpickler(pickle.Unpickler): pass
def load_reduce(self): stack = self.stack args = stack.pop() func = stack[-1] if type(args[0]) is type: n = args[0].__name__ if n == 'DeprecatedSeries': stack[-1] = object.__new__(Series) return elif n == 'DeprecatedSparseSeries': stack[-1] = object.__new__(SparseSeries) return
value = func(*args) stack[-1] = value
Unpickler.dispatch['R'] = load_reduce
def load(file): # try to load a compatibility pickle # fake the old class hierarchy # if it works, then return the new type objects
try: pandas.core.series.Series = DeprecatedSeries pandas.sparse.series.SparseSeries = DeprecatedSparseSeries with open(file,'rb') as fh: return Unpickler(fh).load() except: raise finally: pandas.core.series.Series = Series pandas.sparse.series.SparseSeries = SparseSeries
class DeprecatedSeries(Series, np.ndarray): pass
class DeprecatedSparseSeries(DeprecatedSeries): pass
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org http://mail.python.org/mailman/listinfo/pandas-dev
Yes, pickle is evil. Will this fix affect pickle.loads/pickle.dumps? I would prefer to get a msgpack or Avro-based serialization format for Series or DataFrame sorted out before we start gutting the internals of the objects.
- Wes _______________________________________________ Pandas-dev mailing list Pandas-dev@python.org http://mail.python.org/mailman/listinfo/pandas-dev
On Sun, Apr 21, 2013 at 6:19 PM, Jeff Reback <jeffreback@gmail.com> wrote:
I realized I didn't answer your question
this just catches on pickle.load
try: pickle.load except (TypeError): pickle_compat.load except: if not PY3: raise # try to I unpickle with an encoding here
On Apr 21, 2013, at 9:12 PM, Jeff Reback <jeffreback@gmail.com> wrote:
avro (better choice that msgpack I think) will be very straightforward add on
the format should prob be done independently of internals anyhow at the price of a bit more code, or could store block managers and be somewhat code simpler
On Apr 21, 2013, at 9:01 PM, Wes McKinney <wesmckinn@gmail.com> wrote:
On Sun, Apr 21, 2013 at 3:01 PM, Jeff Reback <jeffreback@gmail.com> wrote:
I thought I'd share a particularly evil pickle issue. In my refactor of Series to not subclass ndarray, the new pickling tests were breaking. No suprise because I changed __getstate__ to pickle via the BlockManager. In order to ensure compat I thought I could just fix __setstate__ and figure out what to do based on the return state (e.g. the len of the state returned as a tuple or dict or whatever).
But no...apparently the reconstruction algorithm takes the class name that it see and tries to create it w/o using __new__ (or anything else that you can intercept), it uses a builtin method called _reconstruct (which is a builtin, but I can't figure out how to override it at all, must be only c-code).
And then numpy gets ahold of it (as its an extension type), and complains becuase the class I am trying to instantiate actually isn't a sub-class of ndarray (which it pre-supposes).
So, a bit hacky, but using a custom unpickler, then matching on a compatbility class (that sub-classes from ndarray), allows me to return the correct class.
The good thing here is that this whole routine isn't even called unless there is a TypeError on the original unpickle
whoosh!
-------- # new module: compat/unpickle_compat.py
import numpy as np import pandas from pandas.core.series import Series from pandas.sparse.series import SparseSeries import pickle
class Unpickler(pickle.Unpickler): pass
def load_reduce(self): stack = self.stack args = stack.pop() func = stack[-1] if type(args[0]) is type: n = args[0].__name__ if n == 'DeprecatedSeries': stack[-1] = object.__new__(Series) return elif n == 'DeprecatedSparseSeries': stack[-1] = object.__new__(SparseSeries) return
value = func(*args) stack[-1] = value
Unpickler.dispatch['R'] = load_reduce
def load(file): # try to load a compatibility pickle # fake the old class hierarchy # if it works, then return the new type objects
try: pandas.core.series.Series = DeprecatedSeries pandas.sparse.series.SparseSeries = DeprecatedSparseSeries with open(file,'rb') as fh: return Unpickler(fh).load() except: raise finally: pandas.core.series.Series = Series pandas.sparse.series.SparseSeries = SparseSeries
class DeprecatedSeries(Series, np.ndarray): pass
class DeprecatedSparseSeries(DeprecatedSeries): pass
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org http://mail.python.org/mailman/listinfo/pandas-dev
Yes, pickle is evil. Will this fix affect pickle.loads/pickle.dumps? I would prefer to get a msgpack or Avro-based serialization format for Series or DataFrame sorted out before we start gutting the internals of the objects.
- Wes _______________________________________________ Pandas-dev mailing list Pandas-dev@python.org http://mail.python.org/mailman/listinfo/pandas-dev
The Deprecated hack have to be careful with as there could be threading issues. Oh boy. I'm not sure how much I want to support legacy pickles anyway, it would be better to have a release of pandas that enables pickle -> avro/msgpack serialized form so that people can migrate all their pickle data to that format, then we can feel free to break all the pickles, or at least versioning of serialized data becomes easier (when pickling/unpickling, we just pack the serialized bytes into the pickle, and that becomes something we can always unserialize) Sigh, it's 2013 and I've been talking about fixing the pickle/serialization problem since 2011, actually even earlier I think. Weekend project one of these days. - Wes
participants (2)
-
Jeff Reback -
Wes McKinney