The way to a faster python [was Python IS slow !]
Markus Kohler
markus_kohler at hp.com
Tue May 4 07:25:44 EDT 1999
Hi all,
I would like to give you a summary of what I think would be the
way to go to get a faster python ( and world domination ;-) ).
Also I don't know if everything is feasible because I haven't looked
into python's interpreter loop in detail it seems to me It should be
possible, given that Smalltalk is probably as dynamic as python.
1. One can use a dispatch table instead of a case statement by using
an extension of gnu C. For squeak (www.squeak.org) that gave me a speedup
of almost a factor of 2 in message sending speed on a HP-UX machine.
The statement used is a goto(address). Some other C compilers might be
able to do that as well, and for those you don't have it the old code
should still be there.
The good looks like this in squeak :
static void *jumpTable[256]= { &&_0, &&_1, &&_2, &&_3, &&_4, &&_5, &&_6, &&_7, &&_8, &&_9, &&_10, &&_11, &&_12, &&_13, &&_14, &&_15, &&_16, &&_17, &&_18, &&_19, &&_20, &&_21, &&_22, &&_23, &&_24, &&_25, &&_26, &&_27, &&_28, &&_29, &&_30, &&_31, &&_32, &&_33, &&_34, &&_35, &&_36, &&_37, &&_38, &&_39, &&_40, &&_41, &&_42, &&_43, &&_44, &&_45, &&_46, &&_47, &&_48, &&_49, &&_50, &&_51, &&_52, &&_53, &&_54, &&_55, &&_56, &&_57, &&_58, &&_59, &&_60, &&_61, &&_62, &&_63, &&_64, &&_65, &&_66, &&_67, &&_68, &&_69, &&_70, &&_71, &&_72, &&_73, &&_74, &&_75, &&_76, &&_77, &&_78, &&_79, &&_80, &&_81, &&_82, &&_83, &&_84, &&_85, &&_86, &&_87, &&_88, &&_89, &&_90, &&_91, &&_92, &&_93, &&_94, &&_95, &&_96, &&_97, &&_98, &&_99, &&_100, &&_101, &&_102, &&_103, &&_104, &&_105, &&_106, &&_107, &&_108, &&_109, &&_110, &&_111, &&_112, &&_113, &&_114, &&_115, &&_116, &&_117, &&_118, &&_119, &&_120, &&_121, &&_122, &&_123, &&_124, &&_125, &&_126, &&_127, &&_128, &&_129, &&_130, &&_131, &&_132, &&_133, &&_134, &&_135, &&_136, &&_137, &&_138, &&_139, &&_140, &&_141, &&_142, &&_143, &&_144, &&_145, &&_146, &&_147, &&_148, &&_149, &&_150, &&_151, &&_152, &&_153, &&_154, &&_155, &&_156, &&_157, &&_158, &&_159, &&_160, &&_161, &&_162, &&_163, &&_164, &&_165, &&_166, &&_167, &&_168, &&_169, &&_170, &&_171, &&_172, &&_173, &&_174, &&_175, &&_176, &&_177, &&_178, &&_179, &&_180, &&_181, &&_182, &&_183, &&_184, &&_185, &&_186, &&_187, &&_188, &&_189, &&_190, &&_191, &&_192, &&_193, &&_194, &&_195, &&_196, &&_197, &&_198, &&_199, &&_200, &&_201, &&_202, &&_203, &&_204, &&_205, &&_206, &&_207, &&_208, &&_209, &&_210, &&_211, &&_212, &&_213, &&_214, &&_215, &&_216, &&_217, &&_218, &&_219, &&_220, &&_221, &&_222, &&_223, &&_224, &&_225, &&_226, &&_227, &&_228, &&_229, &&_230, &&_231, &&_232, &&_233, &&_234, &&_235, &&_236, &&_237, &&_238, &&_239, &&_240, &&_241, &&_242, &&_243, &&_244, &&_245, &&_246, &&_247, &&_248, &&_249, &&_250, &&_251, &&_252, &&_253, &&_254, &&_255 } ;
localIP = ((char *) instructionPointer);
localSP = ((char *) stackPointer);
localHomeContext = theHomeContext;
currentBytecode = (*((unsigned char *) ( ++localIP ))) ;
while (1) {
switch (currentBytecode) {
case 0 : _0 :
currentBytecode = (*((unsigned char *) ( ++localIP ))) ;
(*((int *) ( localSP += 4 )) = (*((int *) ( ((((char *) receiver)) + 4) + ((0 & 15) << 2) ))) ) ;
goto *jumpTable[currentBytecode] ;
case 1 : _1 :
currentBytecode = (*((unsigned char *) ( ++localIP ))) ;
(*((int *) ( localSP += 4 )) = (*((int *) ( ((((char *) receiver)) + 4) + ((1 & 15) << 2) ))) ) ;
goto *jumpTable[currentBytecode] ;
case 2 : _2 :
currentBytecode = (*((unsigned char *) ( ++localIP ))) ;
(*((int *) ( localSP += 4 )) = (*((int *) ( ((((char *) receiver)) + 4) + ((2 & 15) << 2) ))) ) ;
goto *jumpTable[currentBytecode] ;
case 3 : _3 :
currentBytecode = (*((unsigned char *) ( ++localIP ))) ;
.....
2. Implement a method cache. For Squeak as far as I can remember this
improved message sending performance by about 30%.
3. Implement a Just in time compiler that compiles to threaded code.
http://www.complang.tuwien.ac.at/forth/threaded-code.html
A direct threading code has been implemented for an older version of
squeak and improved performance by as much as a factor of two. The
actual squeak does not come with a threaded code interpreter. There
will be a redesigned one pretty soon.
Up to here the interpreter would be still fairly portable. It would compile
out of the box for all platforms that gcc runs on.
4. Implement a native just in time compiler producing machine code
based on 3.
Markus
--
Markus Kohler mailto:markus_kohler at hp.com
More information about the Python-list
mailing list