[Python-Dev] Access to bits for a PyLongObject

Eric V. Smith eric+python-dev at trueblade.com
Tue Mar 6 14:45:53 CET 2007


Martin v. Löwis wrote:
> Eric V. Smith schrieb:
>> I'm working on PEP 3101, Advanced String Formatting.  About the only 
>> built-in numeric formatting I have left to do is for converting a 
>> PyLongOjbect to binary.
>>
>> I need to know how to access the bits in a PyLong.  
> 
> I think it would be a major flaw in PEP 3101 if you really needed it.
> The long int representation should be absolutely opaque - even the
> fact that it is a sign+magnitude representation should be hidden.
> 
> Looking at the PEP, I see that a class can implement __format__.
> Wouldn't it be appropriate if the long type implemented that? 
> Implementation-wise, I would expect that long_format already does the
> bulk of what you need to do.

Yes, I think that would be appropriate.  However, it conflicts with the
current implementation strategy, which is to make a stand-alone module
until we can flush out all of the issues.  Not that our current
implementation should drive the correct decision, of course.

Also, it would either mean duplicating lots of code from the int
formatter, or having a formatter library that both can call.  This is
because __format__ must implement all formats, including padding,
parenthesis for negatives, etc., not just the "missing" binary format.
Not that that's necessarily bad, either. But see the next point.

> OTOH, also look at _PyString_FormatLong.

I think a solution would be to add 'b' to _PyString_FormatLong, which 
I'm already calling for hex, octal, and decimal formatting.  Does that 
sound reasonable?  It seems to me that if binary is useful enough for 
PEP 3101, it should generally be available in _PyString_FormatLong.

The obvious implementation of this would require adding a nb_binary to
PyNumberMethods.  I'm not sure what the impact of that change would be,
but it sounds really big and probably a show-stopper.  Maybe a direct 
call to a binary formatter would be better.

OTOH, this approach isn't as efficient as I'd like (for all formatting
outputs, not just binary), because it has to build a string object and
then copy data out of it.

Having written all of this, I'm now thinking that Nick's suggestion of 
_PyLong_AsByteArray might be the way to go.  I would use that for all of
my formatting for longs.  I think I can use my output buffer as the 
buffer for _PyLong_AsByteArray, since all formats (binary, decimal, 
octal, hex) are less "bit dense" than the byte array.  As long as I 
read, format, and write the data in the correct order, I'd be okay.  So 
even though I'd copy the data into my buffer, at least I wouldn't be 
allocating more memory or another object just to extract data from the long.

Maybe I'm over-emphasizing performance, given the early status of the
implementation.  But I'd like PEP 3101 to be as efficient as possible,
because once it's available I'll replace all of the '%' string
formatting in my code with it.  I think others will consider that as well.

Thank you for your insights.  I apologize for the length of this 
message, but as I believe Pascal said, I did not have time to make it 
shorter.

Eric.



More information about the Python-Dev mailing list