Ability to specify item sizes for array.array instances in a platform-independent way
Currently, if I know that I want an `array.array` object with `itemsize` of 4 there is no way to do that without first determining what the item sizes are for `'i'`/`'I'` and `'l'`/`'L'` on the current platform. Presumably, things could get even more hairy with future platforms. Below are some ideas for how to support explicit, platform-agnostic item size specification. Allow a non-str sequence with itemsize and signedness members to be given as the `typecode` value. `a = array.array((4, array.SIGNED))` Allow a numeric `itemsize` value to be given as the first positional argument instead of a `typecode` string, and have an optional named argument for signedness, signed by default. `a = array.array(4) # Signed by default.` `a = array.array(4, signedness=array.UNSIGNED)` Allow the "@" and "=" prefixes (same as in `struct` format strings) in `typecode` strings. This is my least preferred because I won't always have the typecode or prefix value choices memorized, and looking them up is an extra step. Also, the appropriate size and signedness might be determined at runtime, so having to write additional code to map from size/signedness to a typecode is an unnecessary annoyance. `a = array.array('=i') # Signed integer of "standard" integer size.`
Steve Jorgensen wrote:
Currently, if I know that I want an array.array object with itemsize of 4 there is no way to do that without first determining what the item sizes are for 'i'/'I' and 'l'/'L' on the current platform. Presumably, things could get even more hairy with future platforms. Below are some ideas for how to support explicit, platform-agnostic item size specification. …
Other ideas: Allow supplying an integer `itemsize` value as the `typecode` with a positive value for unsigned or a negative value for signed. ``` a = array.array(-4) # Signed a = array.array(4) # Unsigned ``` Allow supplying a simple slice as the `typecode` defining the smallest rage of values that an item should be able to represent. The array function object would also support indexing to produce a curried `array` function with the given a slice as its first argument. When either the signed or unsigned item type of a given size could encompass the given range, then the unsigned type would be used. ``` # Signed short items initialized with [1, 2, 3] a = array.array(slice(-1, 200), Range(1, 4)) # or... a = array.array[-1:200](Range(1, 4)) # Unsigned 4-byte items (int (I) or long (L) depending on platform) a = array.array(slice(0, 0x90_00_00_00)) # or... a = array.array[0:0x90_00_00_00]() # or… a = array.array[0:0x70_00_00_00]() # Would fit into signed or unsigned. ```
Better yet: Provide type codes for the “new” C exact width integer types: https://pubs.opengroup.org/onlinepubs/009695399/basedefs/stdint.h.html (and any number of tother references) I've always thought is was unfortunate that Python inherited C's compiler-specific type definitions. But it's nice to be able to write platform-independent code that deals with binary data. numpy as provided these pretty much forever: https://numpy.org/devdocs/user/basics.types.html in addition to the dtype objects, numpy uses character typecodes, where you can specify the number of bytes of the type: e.g. "i4" is a 32 bit (four byte) signed integer, "u4" is a 32 bit unsigned integer. it seems the array module could adopt a similar system. Allow supplying an integer `itemsize` value as the `typecode` with a
positive value for unsigned or a negative value for signed.
this seem pretty "magic" -- why not simply extend the typecode system as above?
Allow supplying a simple slice as the `typecode` defining the smallest rage of values that an item should be able to represent.
this is even more magic -- it's only practical to provide types that are a standard numbe rof bytes: 1,2,4,8 And this makes the goal of knowing exactly how many bytes your type is using even harder.
The array function object would also support indexing to produce a curried `array` function with the given a slice as its first argument. When either the signed or unsigned item type of a given size could encompass the given range, then the unsigned type would be used.
even more magic, and more confusing -- why would you want this? i have to say, I've been staring at a a bit, and am still not quite sure what this is intended to do, and why? How does a slice object help here???
# Signed short items initialized with [1, 2, 3] a = array.array(slice(-1, 200), Range(1, 4)) # or... a = array.array[-1:200](Range(1, 4))
# Unsigned 4-byte items (int (I) or long (L) depending on platform) a = array.array(slice(0, 0x90_00_00_00)) # or... a = array.array[0:0x90_00_00_00]() # or… a = array.array[0:0x70_00_00_00]() # Would fit into signed or unsigned.
-CHB
Christopher Barker wrote:
Better yet: Provide type codes for the “new” C exact width integer types: https://pubs.opengroup.org/onlinepubs/009695399/basedefs/stdint.h.html (and any number of tother references) I've always thought is was unfortunate that Python inherited C's compiler-specific type definitions. But it's nice to be able to write platform-independent code that deals with binary data. numpy as provided these pretty much forever: https://numpy.org/devdocs/user/basics.types.html in addition to the dtype objects, numpy uses character typecodes, where you can specify the number of bytes of the type: e.g. "i4" is a 32 bit (four byte) signed integer, "u4" is a 32 bit unsigned integer. it seems the array module could adopt a similar system. Allow supplying an integer itemsize value as the typecode with a
positive value for unsigned or a negative value for signed. this seem pretty "magic" -- why not simply extend the typecode system as above?
It doesn't seem too magical to me. Nevertheless, I do like your suggestion a little better than mine.
Allow supplying a simple slice as the typecode defining the smallest rage of values that an item should be able to represent. this is even more magic -- it's only practical to provide types that are a
standard numbe rof bytes: 1,2,4,8 And this makes the goal of knowing exactly how many bytes your type is using even harder.
How often is knowing the itemsize the code will produce the most important goal? More often, one simply wants the most space-efficient type that can handle values within a particular range, whatever that size might be.
The array function object would also support indexing to produce a curried array function with the given a slice as its first argument. When either the signed or unsigned item type of a given size could encompass the given range, then the unsigned type would be used. even more magic, and more confusing -- why would you want this? i have to say, I've been staring at a a bit, and am still not quite sure what this is intended to do, and why? How does a slice object help here???
A slice is a specification of a range of values, so seems like the most obvious type of thing to pass as a range of values that the item must be able to represent. The currying is simply a means of using the slice notation rather than having to create a slice using the `slice` function.
# Signed short items initialized with [1, 2, 3] a = array.array(slice(-1, 200), Range(1, 4)) # or... a = array.array-1:200) # Unsigned 4-byte items (int (I) or long (L) depending on platform) a = array.array(slice(0, 0x90_00_00_00)) # or... a = array.array0:0x90_00_00_00 # or… a = array.array0:0x70_00_00_00 # Would fit into signed or unsigned. -CHB
participants (2)
-
Christopher Barker
-
Steve Jorgensen