[Python-Dev] PyArg_ParseTuple format specifiers again

Thu, 20 Jul 2000 08:29:37 -0700

On Thu, Jul 20, 2000 at 12:03:19AM +0200, Jack Jansen wrote:
> Of course, after I added the H specifier to PyArg_ParseTuple it turns
> out that solved ninetysomething% of my problems, but it turns out that 
> what I really need is not an "unsigned short" converter but a "16 bit" 
> converter, i.e. something that will happily accept anything from
> -32768 to 65535 without complaints. (The old "h" format, in other
> words).
> 
That is too bad that the unsigned OR signed decision could not be made.

> I need this because all the MacOS API modules are machine generated,
> both the C modules and the .py files with all the constants in it, and 
> some header file authors use -1 to mean "16 1 bits", some use
> 0xffff. And I really don't want to hand-massage 40K lines of C and 6K
> lines of Python everytime a new MacOS release comes out....

Very fair.

> 
> And I also need such a format char for 8 bit values.
> 
> Does anyone mind if I change the H specifier to do no value checking
> other than making sure it fits in 16 bits, and add a B specifier for
> unchecked bytes?

Now I think that the best answer is to have a separate
PyArg_UncheckedParseTuple() that gives you what you want Jack. Or, maybe
more easily, PyArg_ParseTuple could revert to the 1.5.2 form and not do
bounds checking and a new PyArg_CheckedParseTuple could do bounds checking.
Actually, yes, the latter is a better idea. It would allow us to clean up the
L as unsigned long rather than LONG_LONG issue if we wanted. Thereafter
PyArg_CheckedParseTuple would be the preferred method if possible.

Hence,

PyArg_ParseTuple():

 b - byte
 h - short
 i - int
 l - long
 L - LONG_LONG

 Pros:
   - This is as in 1.5.2. All the values are either not bounds checked at all
     or (maybe a little better) checked to be within [XXX_MIN, UXXX_MAX],
     i.e. in the range of the union of the signed and unsigned ranges. This
     will be backward compatible and only break code that had an overflow bug
     in it anyway.
   - It should make Jack happy, except that he would now have to go change
     all his 'H's back to 'h's. :(   

PyArg_CheckedParseTuple():
 b - signed byte
 B - unsigned byte 
 h - signed short
 H - unsigned short
 i - signed int
 I - unsigned int
 l - signed long
 L - unsigned long
 q - signed LONG_LONG
 Q - unsigned LONG_LONG

 Pros:
  - b,h,i,l and the capital equivs are the same as the formatters in the
    struct module for consistency
  - the use of capital for unsigned version of all the variables is intuitive
    (or at least straight forward and consistent)
  - q,Q for a 'quad' or long long is better than the current 'L'
 Cons:
  - New function may raise questions of "which do I use PyArg_ParseTuple or
    PyArg_CheckedParseTuple?" This is a documentation issue.
  - New function may require parallel maintenance in both functions. Or maybe
    not if one is made to use the other (efficiency prob?)
  - 'L' has changed meaning from one function to the other. I would propose
    just leaving is as is for 2.0 (unless the code breakage is okay) and then
    change 'L' in PyArg_ParseTuple to mean 'unsigned long' in Py3k.

I can code this up after the O'Reilly conference if people (Jack, Guido, Tim,
others?) think that this is a good idea.

Trent

-- 
Trent Mick
TrentM@ActiveState.com