[Pythonmac-SIG] two-level namespaces enabled?

bbum@mac.com bbum@mac.com
Thu, 2 Jan 2003 17:47:34 -0500


On Thursday, Jan 2, 2003, at 16:11 US/Eastern, Jack Jansen wrote:
> Bill,
> could you explain this, please? Why would building without two-level 
> namespaces cause extra indirection? I'll also look into building with 
> two-level namespace, there was a reason originally why I didn't do it 
> for the framework build, but it may well have been solved in the mean 
> time (I think it was the environ problem). Still, I'd like to know why 
> flat namespaces could cause a significant drop in performance...

It isn't simply the namespaces that is causing the difference in 
performance.   I used the DYLD_FORCE_FLAT_NAMESPACE to force the Python 
build included with OS X to execute in a flat namespace and there was 
no change in the performance.

So, I misremembered, misread, and/or am lacking in clue.   To rectify 
my cluelessness, I did some testing.   Not exactly the most scientific 
of testing, but I was careful not to actually *do* anything on the 
machine as the tests were running and I let each test run about a dozen 
times just to make sure things were consistent.

Short answer:  calls from the main executable into dynamically linked 
code is slower than non-dynamically linked code.   Not by much, but it 
can really add up over the lifespan of an app.

In the output below, case '1:' is a call to a simple function (see 
below) linked in a framework.   Case '2:' is a call to the same 
function [renamed] that is statically linked into the executable.

2003-01-02 17:30:12.327 foo[9579] 1: 8.642446
2003-01-02 17:30:12.329 foo[9579] 2: 6.734165
2003-01-02 17:30:29.401 foo[9580] 1: 8.977464
2003-01-02 17:30:29.403 foo[9580] 2: 6.754741
2003-01-02 17:30:45.818 foo[9581] 1: 8.462819
2003-01-02 17:30:45.820 foo[9581] 2: 6.706616

The main source file:

#import <Foundation/Foundation.h>
#import <stdio.h>
void baz(int i) {
     static int total;
     total = total + i;
     if (total == 100) fprintf(stderr, "Foo bar baz\n");
}
int main (int argc, const char * argv[]) {
     int i;
     NSTimeInterval s, l, e;

     s = [NSDate timeIntervalSinceReferenceDate];
     for(i=0; i<100000000; i++)
         bar(i);
     l = [NSDate timeIntervalSinceReferenceDate];
     for(i=0; i<100000000; i++)
         baz(i);
     e = [NSDate timeIntervalSinceReferenceDate];

     NSLog(@"1: %f", l - s);
     NSLog(@"2: %f", e - l);

     return 0;
}

bar() is exactly like baz(), but in a framework.   Switching the order 
of the test does not affect output.

Of course, this would only explain the performance difference Kevin is 
seeing if the python binary effectively contains the interpreter loop 
within itself and constantly makes calls into the Python framework.   
Seems unlikely, but I honestly didn't look.

Now, if the cost is high for even within a framework, that would 
explain it...  and it does.   Wow.  That hurts.

2003-01-02 17:42:20.903 foo[9691] 1: 7.284420
2003-01-02 17:42:20.906 foo[9691] 2: 6.735255
2003-01-02 17:42:38.152 foo[9692] 1: 7.334371
2003-01-02 17:42:38.155 foo[9692] 2: 6.752173
2003-01-02 17:42:55.392 foo[9693] 1: 7.282018
2003-01-02 17:42:55.395 foo[9693] 2: 6.757783

In this case, I moved the loop into the framework (and added a loop in 
the main executable for parity).

#import <Foundation/Foundation.h>
#import <stdio.h>
void baz(int i) {
     static int total;
     total = total + i;
     if (total == 100)fprintf(stderr, "Foo bar baz\n");
}
void dobaz() {
     int i;
     for(i=0; i<100000000; i++)
         baz(i);
}
int main (int argc, const char * argv[]) {
     NSTimeInterval s, l, e;

     s = [NSDate timeIntervalSinceReferenceDate];
     dobar();
     l = [NSDate timeIntervalSinceReferenceDate];
     dobaz();
     e = [NSDate timeIntervalSinceReferenceDate];

     NSLog(@"1: %f", l - s);
     NSLog(@"2: %f", e - l);

     return 0;
}

-- and the framework --

#import <stdio.h>
void bar(int i) {
     static int total;
     total = total + i;
     if (total == 100) fprintf(stderr, "Foo bar baz\n");
}
void dobar() {
     int i;
     for(i=0; i<100000000; i++)
         bar(i);
}