The way to a faster python [was Python IS slow !]

Markus Kohler markus_kohler at hp.com
Tue May 4 07:25:44 EDT 1999


Hi all, 
I would like to give you a summary of what I think would be the
way to go to get a faster python ( and world domination ;-) ). 
Also I don't know if everything is feasible because I haven't looked
into python's interpreter loop in detail it seems to me It should be
possible, given that Smalltalk is probably as dynamic as python. 


1. One can use a dispatch table instead of a case statement by using
an extension of gnu C. For squeak (www.squeak.org) that gave me a speedup
of almost a factor of 2 in message sending speed on a HP-UX machine.
The statement used is a goto(address). Some other C compilers might be 
able to do that as well, and for those you don't have it the old code
should still be there. 
The good looks like this in squeak :

  static void *jumpTable[256]= { &&_0,   &&_1,   &&_2,	&&_3,	&&_4,   &&_5,   &&_6,   &&_7,	&&_8,   &&_9, &&_10,  &&_11,  &&_12,  &&_13,  &&_14,  &&_15,  &&_16,  &&_17,  &&_18,  &&_19, &&_20,  &&_21,  &&_22,  &&_23,  &&_24,  &&_25,  &&_26,  &&_27,  &&_28,  &&_29, &&_30,  &&_31,  &&_32,  &&_33,  &&_34,  &&_35,  &&_36,  &&_37,  &&_38,  &&_39, &&_40,  &&_41,  &&_42,  &&_43,  &&_44,  &&_45,  &&_46,  &&_47,  &&_48,  &&_49, &&_50,  &&_51,  &&_52,  &&_53,  &&_54,  &&_55,  &&_56,  &&_57,  &&_58,  &&_59, &&_60,  &&_61,  &&_62,  &&_63,  &&_64,  &&_65,  &&_66,  &&_67,  &&_68,  &&_69, &&_70,  &&_71,  &&_72,  &&_73,  &&_74,  &&_75,  &&_76,  &&_77,  &&_78,  &&_79, &&_80,  &&_81,  &&_82,  &&_83,  &&_84,  &&_85,  &&_86,  &&_87,  &&_88,  &&_89, &&_90,  &&_91,  &&_92,  &&_93,  &&_94,  &&_95,  &&_96,  &&_97,  &&_98,  &&_99, &&_100, &&_101, &&_102, &&_103, &&_104, &&_105, &&_106, &&_107, &&_108, &&_109, &&_110, &&_111, &&_112, &&_113, &&_114, &&_115, &&_116, &&_117, &&_118, &&_119, &&_120, &&_121, &&_122, &&_123, &&_124, &&_125, &&_126, &&_127, &&_128, &&_129, &&_130, &&_131, &&_132, &&_133, &&_134, &&_135, &&_136, &&_137, &&_138, &&_139, &&_140, &&_141, &&_142, &&_143, &&_144, &&_145, &&_146, &&_147, &&_148, &&_149, &&_150, &&_151, &&_152, &&_153, &&_154, &&_155, &&_156, &&_157, &&_158, &&_159, &&_160, &&_161, &&_162, &&_163, &&_164, &&_165, &&_166, &&_167, &&_168, &&_169, &&_170, &&_171, &&_172, &&_173, &&_174, &&_175, &&_176, &&_177, &&_178, &&_179, &&_180, &&_181, &&_182, &&_183, &&_184, &&_185, &&_186, &&_187, &&_188, &&_189, &&_190, &&_191, &&_192, &&_193, &&_194, &&_195, &&_196, &&_197, &&_198, &&_199, &&_200, &&_201, &&_202, &&_203, &&_204, &&_205, &&_206, &&_207, &&_208, &&_209, &&_210, &&_211, &&_212, &&_213, &&_214, &&_215, &&_216, &&_217, &&_218, &&_219, &&_220, &&_221, &&_222, &&_223, &&_224, &&_225, &&_226, &&_227, &&_228, &&_229, &&_230, &&_231, &&_232, &&_233, &&_234, &&_235, &&_236, &&_237, &&_238, &&_239, &&_240, &&_241, &&_242, &&_243, &&_244, &&_245, &&_246, &&_247, &&_248, &&_249, &&_250, &&_251, &&_252, &&_253, &&_254, &&_255 } ;

	 
	localIP = ((char *) instructionPointer);
	localSP = ((char *) stackPointer);
	localHomeContext = theHomeContext;
	 
	currentBytecode = (*((unsigned char *) ( ++localIP ))) ;
	while (1) {
		switch (currentBytecode) {
		case  0 : _0 : 
			 
			 
			currentBytecode = (*((unsigned char *) ( ++localIP ))) ;
			 
			 
			(*((int *) ( localSP += 4 )) =   (*((int *) ( ((((char *) receiver)) + 4) + ((0 & 15) << 2) )))  ) ;
			goto *jumpTable[currentBytecode] ;
		case  1 : _1 : 
			 
			 
			currentBytecode = (*((unsigned char *) ( ++localIP ))) ;
			 
			 
			(*((int *) ( localSP += 4 )) =   (*((int *) ( ((((char *) receiver)) + 4) + ((1 & 15) << 2) )))  ) ;
			goto *jumpTable[currentBytecode] ;
		case  2 : _2 : 
			 
			 
			currentBytecode = (*((unsigned char *) ( ++localIP ))) ;
			 
			 
			(*((int *) ( localSP += 4 )) =   (*((int *) ( ((((char *) receiver)) + 4) + ((2 & 15) << 2) )))  ) ;
			goto *jumpTable[currentBytecode] ;
		case  3 : _3 : 
			 
			 
			currentBytecode = (*((unsigned char *) ( ++localIP ))) ;
			 
		.....



2. Implement a method cache. For Squeak as far as I can remember this
improved message sending performance by about 30%. 


3. Implement a Just in time compiler that compiles to threaded code. 
http://www.complang.tuwien.ac.at/forth/threaded-code.html
A direct threading code has been implemented for an older version of
squeak and improved performance by as much as a factor of two. The
actual squeak does not come with a threaded code interpreter. There
will be a redesigned one pretty soon.  

Up to here the interpreter would be still fairly portable. It would compile
out of the box for all platforms that gcc runs on. 

4. Implement a native just in time compiler producing machine code
based on 3. 



Markus
-- 
Markus Kohler  mailto:markus_kohler at hp.com




More information about the Python-list mailing list