
Martin, I'm looking at Python/marshal.c and there are a lot of places that don't support sequences that are larger than would fit into size(int). I looked for marshal referenced in the PEP and didn't find anything. Was this an oversight or intentional? To give you some examples of what I mean from the code: (line 255) n = PyString_GET_SIZE(v); if (n > INT_MAX) { /* huge strings are not supported */ p->depth--; p->error = 1; return; } w_long((long)n, p); w_string(PyString_AS_STRING(v), (int)n, p); ... (line 717) n = r_long(p); if (n < 0 || n > INT_MAX) { PyErr_SetString(PyExc_ValueError, "bad marshal data"); return NULL; } v = PyTuple_New((int)n); if (v == NULL) return v; for (i = 0; i < n; i++) { v2 = r_object(p); if ( v2 == NULL ) { if (!PyErr_Occurred()) PyErr_SetString(PyExc_TypeError, "NULL object in marshal data"); Py_DECREF(v); v = NULL; break; } PyTuple_SET_ITEM(v, (int)i, v2); Also, the PEP references the ssize_t branch which no longer exists. Is it possible to reference the specific revision: 42382? Thanks, n

I'm looking at Python/marshal.c and there are a lot of places that don't support sequences that are larger than would fit into size(int). I looked for marshal referenced in the PEP and didn't find anything. Was this an oversight or intentional?
These changes were only made after merging the ssize_t branch, namely in r42883. They were intentional, in the sense that the ssize_t changes were meant to *only* change the API. Supporting larger strings would have been a change to the marshal format as well, and that was not within the mandate of PEP 353. Now, if you think the marshal format should change as well to support large strings, that may be worth considering. There are two design alternatives: - change the 's', 't', and 'u' codes to use an 8-byte argument That would be an incompatible change that would also blow up marshal data which don't need it (by 4 bytes per string value). - introduce additional codes (like 'S', 'T', and 'U') that take 8-byte lengths. That would be (forward?) compatible, in that old marshal data can be still read in new implementations, and mostly backwards-compatible, assuming that S/T/U get used only when needed. However, it would complicate the implementation. I'm still leaning towards "don't change", since I don't expect that such string objects occur in source code, and since I still think source code / .pyc is/should be the major application of marshal. Regards, Martin

Ah, forgot to mention that a browsable version of the branch is at http://svn.python.org/view/python/branches/ssize_t/?rev=42382 Unfortunately, you cannot check out that URL. OTOH, you can checkout "peg revisions" (I have no clue what a peg is) http://svn.python.org/projects/python/branches/ssize_t@42382 but that URL is, unfortunately, not browsable. Regards, Martin

On 5/15/07, "Martin v. Löwis" <martin@v.loewis.de> wrote:
I'm looking at Python/marshal.c and there are a lot of places that don't support sequences that are larger than would fit into size(int). I looked for marshal referenced in the PEP and didn't find anything. Was this an oversight or intentional?
These changes were only made after merging the ssize_t branch, namely in r42883.
They were intentional, in the sense that the ssize_t changes were meant to *only* change the API. Supporting larger strings would have been a change to the marshal format as well, and that was not within the mandate of PEP 353.
Now, if you think the marshal format should change as well to support large strings, that may be worth considering. There are two design alternatives: - change the 's', 't', and 'u' codes to use an 8-byte argument That would be an incompatible change that would also blow up marshal data which don't need it (by 4 bytes per string value). - introduce additional codes (like 'S', 'T', and 'U') that take 8-byte lengths. That would be (forward?) compatible, in that old marshal data can be still read in new implementations, and mostly backwards-compatible, assuming that S/T/U get used only when needed. However, it would complicate the implementation.
I'm still leaning towards "don't change", since I don't expect that such string objects occur in source code, and since I still think source code / .pyc is/should be the major application of marshal.
Agreed. I see little use to changing .pyc files to support >2G literals or bytecode. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

On 5/15/07, "Martin v. Löwis" <martin@v.loewis.de> wrote:
I'm looking at Python/marshal.c and there are a lot of places that don't support sequences that are larger than would fit into size(int). I looked for marshal referenced in the PEP and didn't find anything. Was this an oversight or intentional?
I'm still leaning towards "don't change", since I don't expect that such string objects occur in source code, and since I still think source code / .pyc is/should be the major application of marshal.
I agree this is fine. I'll update the PEP with this rationale and the link(s) you provided unless anyone objects. That way we have it clearly documented that this was intentional. I didn't remember this discussion back when ssize_t was done. n
participants (3)
-
"Martin v. Löwis"
-
Guido van Rossum
-
Neal Norwitz