Standardize error message for non-pickleable types

When you try to to pickle or copy a non-pickleable object, you will get an error. In most cases this will be a TypeError with one of few similar, but different variants: "can't pickle XXX objects" (default) "Cannot serialize XXX object" (socket, BZ2Compressor, BZ2Decompressor) "can not serialize a 'XXX' object" (buffered files in _pyio) "cannot serialize 'XXX' object" (FileIO, TextWrapperIO, WinConsoleIO, buffered files in _io, LZMACompressor, LZMADecompressor) "cannot serialize {} object" (proposed for SSLContext) Perhaps some of them where added without deep thinking and then were replicated in different places. I'm going to replace all of them with a standardized error message. But I'm unsure what variant is better. 1. "pickle" or "serialize"? 2. "can't", "Cannot", "can not" or "cannot"? 3. "object" or "objects"? 4. Use the "a" article or not? 5. Use quotes around type name or not? Please help me to choose the best variant.

Le lun. 29 oct. 2018 à 20:42, Serhiy Storchaka <storchaka@gmail.com> a écrit :
1. "pickle" or "serialize"?
serialize
2. "can't", "Cannot", "can not" or "cannot"?
cannot
3. "object" or "objects"?
object
4. Use the "a" article or not?
no: "cannot serialize xxx object" (but i'm not a native english speaker, so don't trust me :-))
5. Use quotes around type name or not?
Use repr() in Python, but use '%s' is C since it would be too complex to write code to properly implement repr() (decode tp_name from UTF-8, handle error, call repr, handle error, etc.). To use repr() on tp_name, I would prefer to have a new formatter, see the thread of last month. https://mail.python.org/pipermail/python-dev/2018-September/155150.html Victor

On Mon, Oct 29, 2018 at 08:51:34PM +0100, Victor Stinner wrote:
-1 Serializing is more general; pickle is merely one form of serializing: https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats When practical, error messages should be more specific, not less. We don't say "arithmetic operation by zero" for division by zero errors, we specify which arithmetic operation failed. Unlike most serialization formats, "pickle" is both a noun (the name of the format) and the verb to convert to that format. -- Steve

On 2018-10-29 19:38, Serhiy Storchaka wrote:
1. If you're pickling, then saying "pickle" is more helpful. 2. In English the usual long form is "cannot". Error messages tend to avoid abbreviations, and also tend to have lowercase after the colon, e.g.: "ZeroDivisionError: division by zero" "ValueError: invalid literal for int() with base 10: 'foo'" 3. If it's failing on an object (singular), then it's clearer to say "object". 4. Articles tend to be omitted. 5. Error messages tend to have quotes around the type name. Therefore, my preference is for: "cannot pickle 'XXX' object"

On 10/29/2018 5:17 PM, MRAB wrote:
Great idea.
Agree x 3
I had not noticed, but IndexError: list index out of range NameError: name 'sqrt' is not defined
Grammatically, the two examples above could/should start with 'The'. But that is routinely omitted. Matching a/an to 'xxx' would be a terrible nuisance. "a 'str'" (a string)?, "an 'str'" (an ess tee ar)?
-- Terry Jan Reedy

On 2018-10-30 08:12, Serhiy Storchaka wrote:
Well, the other examples you gave did not say explicitly that all instances of that type would fail. If you look at what 'hash' says:
that would suggest "TypeError: unpicklable type: 'list'", but I'm not sure I'd like too much of "unpicklable", "unmarshallable", "unserializable", etc. :-)

Le lun. 29 oct. 2018 à 22:20, MRAB <python@mrabarnett.plus.com> a écrit :
1. If you're pickling, then saying "pickle" is more helpful.
I'm not sure that it's really possible to know if the error occurs while pickle is trying to serialize an object, or if it's a different serialization protocol. We are talking about the very generic __getstate__() method. I'm in favor of being more general and say "cannot serialize". Victor

Le lun. 29 oct. 2018 à 20:42, Serhiy Storchaka <storchaka@gmail.com> a écrit :
1. "pickle" or "serialize"?
serialize
2. "can't", "Cannot", "can not" or "cannot"?
cannot
3. "object" or "objects"?
object
4. Use the "a" article or not?
no: "cannot serialize xxx object" (but i'm not a native english speaker, so don't trust me :-))
5. Use quotes around type name or not?
Use repr() in Python, but use '%s' is C since it would be too complex to write code to properly implement repr() (decode tp_name from UTF-8, handle error, call repr, handle error, etc.). To use repr() on tp_name, I would prefer to have a new formatter, see the thread of last month. https://mail.python.org/pipermail/python-dev/2018-September/155150.html Victor

On Mon, Oct 29, 2018 at 08:51:34PM +0100, Victor Stinner wrote:
-1 Serializing is more general; pickle is merely one form of serializing: https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats When practical, error messages should be more specific, not less. We don't say "arithmetic operation by zero" for division by zero errors, we specify which arithmetic operation failed. Unlike most serialization formats, "pickle" is both a noun (the name of the format) and the verb to convert to that format. -- Steve

On 2018-10-29 19:38, Serhiy Storchaka wrote:
1. If you're pickling, then saying "pickle" is more helpful. 2. In English the usual long form is "cannot". Error messages tend to avoid abbreviations, and also tend to have lowercase after the colon, e.g.: "ZeroDivisionError: division by zero" "ValueError: invalid literal for int() with base 10: 'foo'" 3. If it's failing on an object (singular), then it's clearer to say "object". 4. Articles tend to be omitted. 5. Error messages tend to have quotes around the type name. Therefore, my preference is for: "cannot pickle 'XXX' object"

On 10/29/2018 5:17 PM, MRAB wrote:
Great idea.
Agree x 3
I had not noticed, but IndexError: list index out of range NameError: name 'sqrt' is not defined
Grammatically, the two examples above could/should start with 'The'. But that is routinely omitted. Matching a/an to 'xxx' would be a terrible nuisance. "a 'str'" (a string)?, "an 'str'" (an ess tee ar)?
-- Terry Jan Reedy

On 2018-10-30 08:12, Serhiy Storchaka wrote:
Well, the other examples you gave did not say explicitly that all instances of that type would fail. If you look at what 'hash' says:
that would suggest "TypeError: unpicklable type: 'list'", but I'm not sure I'd like too much of "unpicklable", "unmarshallable", "unserializable", etc. :-)

Le lun. 29 oct. 2018 à 22:20, MRAB <python@mrabarnett.plus.com> a écrit :
1. If you're pickling, then saying "pickle" is more helpful.
I'm not sure that it's really possible to know if the error occurs while pickle is trying to serialize an object, or if it's a different serialization protocol. We are talking about the very generic __getstate__() method. I'm in favor of being more general and say "cannot serialize". Victor
participants (7)
-
Barry Warsaw
-
Glenn Linderman
-
MRAB
-
Serhiy Storchaka
-
Steven D'Aprano
-
Terry Reedy
-
Victor Stinner