Of course, you still have the actual interpretation of
the top-level module code - if it's not the marshalling
but this part that actually costs performance, this
efficient marshalling algorithm won't help. It would be
interesting to find out which modules have a particularly
high startup cost - perhaps they can be rewritten

I am afraid this is the case. I hope we could marshal an arbitary application state (not even Python specific) into a fast loading dump file (hibernation/snapshot).

We have tried to use lazy importing as much as possible to distribute the importing cost across the application UI states.

Out of my head I know at least two particular module which could be refactored. I'd recommend as the best practice that everything should be imported lazily if it's possible. However, looks like currently Python community is moving to another direction, since doing explict imports in __init__ etc. makes APIs cleaner (think Django) and debugging more sane task - Python is mainly used on the server and limited environments haven't been particular interesting until lately.

logging - defines lots of classes which are used only if they are specified by logging options. I once hacked this for my personal use to be little lighter.

urllib - particular heavy, imports httplib, ftplib and stuff even if it is not used

Nokia has just released Python 2.5 based PyS60. I think we'll come back this after a while with a nice generic profiler which will tell the import cost.
 
Merry XMas,
-Mikko