[Python-Dev] PyArg_ParseTuple format specifiers again
Trent Mick
trentm@ActiveState.com
Thu, 20 Jul 2000 08:29:37 -0700
On Thu, Jul 20, 2000 at 12:03:19AM +0200, Jack Jansen wrote:
> Of course, after I added the H specifier to PyArg_ParseTuple it turns
> out that solved ninetysomething% of my problems, but it turns out that
> what I really need is not an "unsigned short" converter but a "16 bit"
> converter, i.e. something that will happily accept anything from
> -32768 to 65535 without complaints. (The old "h" format, in other
> words).
>
That is too bad that the unsigned OR signed decision could not be made.
> I need this because all the MacOS API modules are machine generated,
> both the C modules and the .py files with all the constants in it, and
> some header file authors use -1 to mean "16 1 bits", some use
> 0xffff. And I really don't want to hand-massage 40K lines of C and 6K
> lines of Python everytime a new MacOS release comes out....
Very fair.
>
> And I also need such a format char for 8 bit values.
>
> Does anyone mind if I change the H specifier to do no value checking
> other than making sure it fits in 16 bits, and add a B specifier for
> unchecked bytes?
Now I think that the best answer is to have a separate
PyArg_UncheckedParseTuple() that gives you what you want Jack. Or, maybe
more easily, PyArg_ParseTuple could revert to the 1.5.2 form and not do
bounds checking and a new PyArg_CheckedParseTuple could do bounds checking.
Actually, yes, the latter is a better idea. It would allow us to clean up the
L as unsigned long rather than LONG_LONG issue if we wanted. Thereafter
PyArg_CheckedParseTuple would be the preferred method if possible.
Hence,
PyArg_ParseTuple():
b - byte
h - short
i - int
l - long
L - LONG_LONG
Pros:
- This is as in 1.5.2. All the values are either not bounds checked at all
or (maybe a little better) checked to be within [XXX_MIN, UXXX_MAX],
i.e. in the range of the union of the signed and unsigned ranges. This
will be backward compatible and only break code that had an overflow bug
in it anyway.
- It should make Jack happy, except that he would now have to go change
all his 'H's back to 'h's. :(
PyArg_CheckedParseTuple():
b - signed byte
B - unsigned byte
h - signed short
H - unsigned short
i - signed int
I - unsigned int
l - signed long
L - unsigned long
q - signed LONG_LONG
Q - unsigned LONG_LONG
Pros:
- b,h,i,l and the capital equivs are the same as the formatters in the
struct module for consistency
- the use of capital for unsigned version of all the variables is intuitive
(or at least straight forward and consistent)
- q,Q for a 'quad' or long long is better than the current 'L'
Cons:
- New function may raise questions of "which do I use PyArg_ParseTuple or
PyArg_CheckedParseTuple?" This is a documentation issue.
- New function may require parallel maintenance in both functions. Or maybe
not if one is made to use the other (efficiency prob?)
- 'L' has changed meaning from one function to the other. I would propose
just leaving is as is for 2.0 (unless the code breakage is okay) and then
change 'L' in PyArg_ParseTuple to mean 'unsigned long' in Py3k.
I can code this up after the O'Reilly conference if people (Jack, Guido, Tim,
others?) think that this is a good idea.
Trent
--
Trent Mick
TrentM@ActiveState.com