Is this saying that either NULL or a pointer to "B" can be supplied by getbufferproc to indicate to the caller that the array is unsigned bytes? If so, is there a specific reason to put the (minor) complexity of handling this case in the caller's hands, instead of dealing with it internally to getbufferproc? In either case, the wording is a bit unclear, I think.
Yes, the wording could be more clear. I'm trying to make it easy for exporters to change to the new buffer interface.
The main idea I really want to see is that if the caller just passes NULL instead of an address then it means they are assuming the data will be "unsigned bytes" It is up to the exporter to either allow this or raise an error.
The exporter should always be explicit if an argument for returning the format is provided (I may have thought differently a few days ago).
Understood -- I'm for the exporters being as explicit as possible if the argument is provided.
The general question is that there are several other instances where getbufferproc is allowed to return ambiguous information which must be handled on the client side. For example, C-contiguous data can be indicated either by a NULL strides pointer or a pointer to a properly- constructed strides array.
Here. I'm trying to be easy on the exporter and the consumer. If the data is contiguous, then neither the exporter nor will likely care about the strides. Allowing this to be NULL is like the current array protocol convention which allows this to be None.
See below. My comments here aren't suggesting that NULL should be disallowed. I'm basically wondering whether it is a good idea to allow NULL and something else to represent the same information. (E.g. as above, an exporter could choose to show C-contiguous data with a NULL returned to the client, or with a trivial strides array). Otherwise two different exporters exporting identical data could provide different representations, which the clients would need to be able to handle. I'm not sure that this is a recipe for perfect interoperability.
Clients that can't handle C-contiguous data (contrived example, I know there is a function to deal with that) would then need to check both for NULL *and* inside the strides array if not null, before properly deciding that the data isn't usable them. Not really. A client that cannot deal with strides will simply not pass an address to a stride array to the buffer protocol (that argument will be NULL). If the exporter cannot provide memory without stride information, then an error will be raised.
This doesn't really address my question, which I obscured with a poorly-chosen example. The PEP says (or at least that's how I read it) that if the client *does* provide an address for the stride array, then for un-strided arrays, the exporter may either choose to fill on NULL at that address, or provide a strides array. Might it be easier for clients if the PEP required that NULL be returned if the array is C-contiguous? Or at least strongly suggested that? (I understand that there might be cases where an naive exporter "thinks" it is dealing with a strided array but it really is contiguous, and the exporter shouldn't be required to do that detection.) The use-case isn't too strong here, but I think it's clear in the suboffsets case (see below).
Similarly, the suboffsets can be either all negative or NULL to indicate the same thing. I think it's much easier to check if suboffsets is NULL rather than checking all the entries to see if they are -1 for the very common case (i.e. the NumPy case) of no dereferencing. Also, if you can't deal with suboffsets you would just not provide an address for them.
My point exactly! As written, the PEP allows an exporter to either return NULL, or an array of all negative numbers (in the case that the client requested that information), forcing a fully -conforming client to make *both* checks in order to decide what to do. Especially in this case, it would make sense to require a NULL be returned in the case of no suboffsets. This makes things easier for both clients that can deal with both suboffsets or non-offsets (e.g. they can branch on NULL, not on NULL or all-are-negative), and also for clients that can *only* deal with suboffsets. Now, in these two cases, the use-case is pretty narrow, I agree. Basically it makes things easier for savvy clients that can deal with different data types, by not forcing them to make two checks (strides == NULL or strides array is trivial; suboffsets == NULL or suboffsets are all negative) when one would do. Again, this PEP allows the same information can be passed in two very different ways, when it really doesn't seem like that ambiguity makes life that much easier for exporters. Maybe I'm wrong about this last point, though. Then there comes the trade-off -- should savvy clients bear the complexity of checking two different things? (Simple clients needn't check anything -- they just pass in NULL.) Or should the complexity be pushed to the savvy exporter to do those checks? (Simple exporters just return NULL in those variables.) I guess the question comes down to which side of the API to make the simplest, given that it appears to me that the complexity has to live somewhere. As a separate suggestion, I think a few sentences in the PEP about the protocol design, and what parts are explicitly added to make it easy for simple clients and exporters, would be helpful. Something like: "Clients that cannot deal with strided or suboffset-ed arrays should put NULL values in the corresponding getbufferproc call parameters. Then exporters will provide data in that format (either because the data are already in that format, or because the exporter chooses to convert it on behalf of the client), or the exporter will set an exception. This simplifies matters greatly for simple clients. Likewise, simple exporters which only provide C-contiguous data, or data with no suboffsets can simply return NULL if those values are requested."
Might it be more appropriate to specify only one canonical behavior in these cases? Otherwise clients which don't do all the checks on the data might not properly interoperate with providers which format these values in the alternate manner.
It's important to also be easy to use. I don't think clients should be required to ask for strides and suboffsets if they can't handle them.
Again, that wasn't my suggestion. My suggestion was merely that if clients ask for that information, that it come in a canonical form, so that NULL values are meaningful, as opposed to that client potentially needing to check two different things before deciding which code-path to embark upon. As above, this might or might not have the effect of adding extra complexity to the exporters. If not, good; if so, then it's worth specifically deciding which side that complexity ought to live upon.
279 Get the buffer and optional information variables about the buffer. 280 Return an object-specific view object (which may be simply a 281 borrowed reference to the object itself).
This phrasing (and similar phrasing elsewhere) is somewhat opaque to me. What's an "object-specific view object"?
At the moment it's the buffer provider. It is not defined because it could be a different thing for each exporter. We are still discussing this particular point and may drop it.
Fair enough. Definitely worth a clear explanation if its not dropped though.
333 The struct string-syntax is missing some characters to fully 334 implement data-format descriptions already available elsewhere (in 335 ctypes and NumPy for example). Here are the proposed additions:
Is the following table just the additions? If so, it might be good to show the full spec, and flag the specific additions. If not, then the additions should be flagged.
Yes, these are just the additions. I don't want to do the full spec, it is already available elsewhere in the Python docs.
Would be useful to link to the full spec from the PEP, in that case.
341 't' bit (number before states how many bits)
vs.
372 According to the struct-module, a number can preceed a character 373 code to specify how many of that type there are. The
I'm confused -- could this be phrased more clearly? Does '5t' refer to a field 5-bits wide, or 5-one bit fields? Is 'ttttt' allowed? If so, is it equivalent to or different from '5t'?
Yes, 'ttttt' is equivalent to '5t' and the difference between one field 5-bits wide or 5-one bit fields is a confusion based on thinking there are fields at all. Both of those are equivalent. If you want "fields" then you have to define names.
In that case, line 341 should be clarified. Right now, that line sort of makes it seem like the struct module should somehow unpack a '5t' into a single python object of some type, analogous to how the other entities are unpacked into single objects (which may themselves be composite, e.g. the list-of-lists and nested cases). Lines 372-3 make it clear that, say, '5g' would be unpacked into five python floats, but the fact that the 'bit' definition in line 341 explicitly mentions the number before, while no other definitions do so, almost make it appear that '5t' is supposed to be treated as a single atomic object in the same way that 'g' alone would be. Since this isn't the case, I would suggest dropping the parenthetical in line 341 as redundant and potentially misleading.
In general, the logic of the 'locking mechanism' should be described at a high level at some point. It's described in nitty-gritty details, but at least I would have appreciated a bit more of a discussion about the general how and why -- this would be helpful to clients trying to use the locking mechanism properly.
The point of locking is so that the exporter knows when it can reallocate its buffer. Right now, reference counting is the only way to do that. But reference counting is not specific enough. Perhaps the reference is because of an object that is using the same memory but perhaps the reference is just another name pointing to exactly the same object.
In the case of NumPy, NumPy needs to know when the resize method can be safely applied. Currently, it is ambiguous and un-clear when a NumPy array can re-allocate its own buffer. Also, in the past exposing the array object in Python's memory and then later re-allocating it led to problems.
I'll try and address this more clearly.
That makes sense. What I think is needed is a high-ish level introduction to the "moving parts" of locking -- essentially covering what a simple client or exporter needs to know in order to use the interface. I felt that the discussion was a bit too low-level, leaving folks in danger of missing the forest for the trees as it were. Zach