[Python-Dev] cStringIO.StringIO() buffer behavior

Georg Brandl g.brandl at gmx.net
Mon Aug 6 09:48:25 CEST 2007


Guido van Rossum schrieb:
> Methinks that this was a fundamental limitation of cStringIO, not  a
> bug. Certainly not something to be "fixed" in a bugfix release.

I'm sorry.

Martin v. Löwis schrieb:
>> See bugs #1548891 and #1730114.
>> 
>> In the former, it was reported that cStringIO works differently from StringIO
>> when handling unicode strings; it used GetReadBuffer which returned the raw
>> internal UCS-2 or UCS-4 encoded string.
>> 
>> I changed it to use GetCharBuffer, which converts to a string using the
>> default encoding first. This fix was also in 2.5.1.
>> 
>> The latter bug now complains that this excludes things like array.array()s
>> from being used as an argument to cStringIO.StringIO(), which worked before
>> with GetReadBuffer.
>> 
>> What's the preferred solution here?
> 
> I think the 2.5.0 behavior to accept array.array should be restored (and
> a test case be added). What to do about Unicode strings, I don't know.
> I agree with Guido that they are officially not supported in cStringIO,
> so it would be best to reject them. OTOH, since 2.5.1 already supports
> them, another choice would be continue supporting them, in the same way
> as they are supported in 2.5.1. Either solution would special-case
> Unicode strings.

Okay, I propose the following patch:

Index: Modules/cStringIO.c
===================================================================
--- Modules/cStringIO.c (Revision 56763)
+++ Modules/cStringIO.c (Arbeitskopie)
@@ -673,12 +673,26 @@
   char *buf;
   Py_ssize_t size;

-  if (PyObject_AsCharBuffer(s, (const char **)&buf, &size) != 0)
+  /* special-case Unicode objects: encode them in the default encoding */
+  if (PyUnicode_Check(s)) {
+    s = PyUnicode_AsEncodedString(s, NULL, NULL);
+    if (s == NULL)
       return NULL;
+  } else {
+    Py_INCREF(s);
+  }

+  if (PyObject_AsReadBuffer(s, (const char **)&buf, &size)) {
+    PyErr_Format(PyExc_TypeError, "expected read buffer, %.200s found",
+                 s->ob_type->tp_name);
+    return NULL;
+  }
+
   self = PyObject_New(Iobject, &Itype);
-  if (!self) return NULL;
-  Py_INCREF(s);
+  if (!self) {
+    Py_DECREF(s);
+    return NULL;
+  }
   self->buf=buf;
   self->string_size=size;
   self->pbuf=s;


Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.



More information about the Python-Dev mailing list