[Numpy-discussion] Re: Bytes Object and Metadata
Perry Greenfield
perry at stsci.edu
Tue Mar 29 07:46:47 EST 2005
On Mar 28, 2005, at 6:25 PM, Travis Oliphant wrote:
>
> One could see it as a "flaw" in the buffer object, but I prefer to see
> it as problesm with objects that use the PyBufferProcs protocol. It
> is at worst, a "limitation" of the buffer interface that should be
> advertised (in my mind the problem lies with the objects that make use
> of the buffer protocol and also reallocate memory willy-nilly since
> Python does not allow for this). To me, an analagous situation
> occurs when an extension module writes into memory it does not own and
> causes a seg-fault. I suppose a casual observer could say this is a
> Python flaw but clearly the problem is with the extension object.
>
> It certinaly does not mean at all that something like a buffer object
> should never exist or that the buffer protocol should not be used. I
> get the feeling sometimes, that some naive (to Numeric and numarray)
> people on python-dev feel that way.
>
Certainly there needs to be something like this (that's why we used it
for numarray after all).
>>
>> I'm not sure how the support for large data sets should be handled. I
>> generally think that it will be very awkward to handle these until
>> Python does as well. Speaking of which...
>>
>> I had been in occasional contact with Martin von Loewis about his
>> work to update Python to handle 64-bit addressing. We weren't
>> planning to handle this in nummarray (nor Numeric3, right Travis or
>> do I have that wrong?) until Python did. A few months ago Martin said
>> he was mostly done. I had a chance to talk to him at Pycon about
>> where that work stood. Unfortunately, it is not turning out to be as
>> easy as he hoped. This is too bad. I have a feeling that this work is
>> going to stall without help on our (numpy community) part to help
>> make the changes or drum beating to make it a higher priority. At the
>> moment the Numeric3 effort should be the most important focus, but I
>> think that after that, this should become a high priority.
>>
>
> I would be interested to hear what the problems are. Why can't you
> just change the protocol replacing all int's with Py_intptr_t? Is
> backward compatibilty the problem? This seems like it's on the
> extension code level (and then only on 64-bit systesm), and so would
> be easier to force through the change in Python 2.5.
>
As Martin explained it, he said there is a lot of code that uses int
declarations. If you are saying that it would be easy just to replace
all int declarations in Python, I doubt it is that simple since there
are calls to many other libraries that must use ints. So it means that
there are thousands (so Martin says) of declarations that one must
change by hand. It has to be changed for strings, lists, tuples and
everything that uses them (Guido was open to doing this but everything
had to be updated at once, not just strings or certain objects, and he
is certainly right about that). Martin also said that we would need a
system with enough memory to test all of these. Lists in particular
would need a system with 16GB of memory to test lists that use more
than the current limit (because of the size of list objects). I'm not
sure I agree with that. It would be nice to have that kind of test, but
I think it would be reasonable to have tested on the largest memory
systems available at the time for our testing. If there are latent list
sequence bugs that surface when 16 GB systems become available, then
the bugs can be dealt with at that time (IMHO). (Anybody out there have
a system with that much memory available for test purposes :-).
Of course, this change will change the C API for Python too as far as
sequence use goes (or is there some way around that? A compatibility
API and a new one that supports extended indices?) It would be nice if
there were some way of handling that gracefully without requiring all
extensions to have to change to match this. I imagine that this is
going to be the biggest objection to making any changes unless the old
API is supported for a while. Perhaps someone has thought this all out
already. I haven't thought about it at all.
Perry
More information about the NumPy-Discussion
mailing list