[Python-Dev] Alignment assumptions

Tim Peters tim.one@comcast.net
Thu, 28 Feb 2002 14:42:35 -0500

[Jack, skip to the end please]

[David Abrahams, on
    Include/objimpl.h:275: double dummy;  /* force worst-case alignment */
> As I read the code, it affects all types (doesn't this header begin every
> object, regardless of its GC flags?)

Nope, only objects that go through _PyObject_GC_Malloc().  It could be a
nightmare if, e.g., every string and int object consumed another (at least)
12 bytes.

> and I think that's a very unhappy circumstance for your numeric
> community. Remember, the type that raised the alarm here was just a
> long double.

The *Python* numeric community is far more likely to embed a float than a
long double, and in any case seems unlikely to build a container type
mixing long double with PyObject* members (i.e., one that ought to
participate in cyclic gc).

I expect we have a blind spot towards long double in general since Python
doesn't expose or use such a thing, all the developers run on platforms
where (as far as they know <wink>) it's the same as a double, and "long
double" was introduced after K&R (so some old-timers likely aren't even
aware C89 introduced it).

But I'll change the code here to use long double instead -- it's harmless,
as it doesn't make a lick of difference on any platform that matters <0.7

>> Only the objimpl.h trick might benefit from maximal alignment.

> I'm not actually after maximal alignment; I look for a minimally-
> sized/aligned type whose alignment is a multiple of the target
> type's alignment. In any case, I was just using the assumption that
> double was maximally aligned since I was linking with Python code
> and the EDG front-end was too slow to handle the metaprogram -- I
> figured that if the assumption was good enough for Python

Well, nobody has complained yet, but the core never needs alignment stricter
than double, and-- as above --an extension type that both did and needed to
participate in GC is unlikey.

> and my clients were depending on it anyway, it was good enough for
> my code (not!).

One of the secrets to Python's success is that we tell unreasonable users to
go away and bother the C++ committee instead.

[128-byte alignment needed for KSR's _subpage type]
> I was aware that this was a theoretical possibility, but not that it
> was a practical one. What's KSR?

Kendall Square Research, my (and Tani's, Tamah's and Steve Breit's) employer
before Dragon.  The address space was carved into 128-byte "subpages", and
the hardware supported Python-style (non-owned non-reentrant) locks directly
on a per-subpage basis (Python's lock.acquire() and lock.release() were one
machine instruction each!).  Subpages were also the unit for cache coherency
across processors.  So use of _subpage in our system code, and in
speed-obsessed app code, was ubiquitous.  I guess the main thing KSR proved
was that you can't stay in business designing custom hardware to execute
Python's semantics directly <wink>.

> ...
> Seriously, though, I think it would be reasonable to stick to aligning
> the standard builtin types, in which can you can do the test without
> calling malloc, FWIW.

I checked this in:

	long double dummy;  /* force worst-case alignment */

[Guido, on
 	long	aligner;
> The malloc 8-byte align argument doesn't apply, since this struct is
> used in an array.

I was composing email while asleep <wink>.  Gotcha.

> ...
> This was added by Jack Jansen ages ago -- I think he did measure a
> speedup on an old Mac compiler, or he wouldn't have added it, and I
> bet there was a #define USE_CACHE_ALIGNED in his config.h then.
> But that's all history; I agree it should be deleted.

Jack, do you still want this?

fighting-code-rot-ly y'rs  - tim