Hi fellow snakemen and lizard ladies,
We have been recently done lots of Python work on Nokia Series 60 phones and even managed to roll out some commercial Python based applications. In the future we plan to create some iPhone Python apps also.
Python runs fine in phones - after it has been launched. Currently the biggest issue preventing the world dominance of Python based mobile applications is the start up time. We cope with the issue by using fancy splash screens and progress indicators, but it does't cure the fact that it takes a minute to show the main user interface of the application. Most of the time is spend in import executing opcodes and forming function and class structures in memory - something which cannot be easily boosted.
Now, we have been thinking. Maemo has fork() based Python launcher ( http://blogs.gnome.org/johan/2007/01/18/introducing-python-launcher/) which greatly speed ups the start up time by holding Python in memory all the time. We cannot afford such luxury on Symbian and iPhone, since we do not control the operating system. So how about this
1. A Python application is launched normally
2. After VM has initialized module importing and reached a static launch state (meaning that the state is same on every launch) the VM state is written on to disk
3. Application continues execution and starts doing dynamic stuff
4. On the following launches, special init code is used which directly blits VM image from disk back to memory and we have reached the static state again without going whoops of executing import related opcodes
5. Also, I have heard a suggestion that VM image could be defragmented and analyzed offline
Any opinions?
Cheers, Mikko
Any opinions?
I would use a different marshal implementation. Instead of defining a stream format for marshal, make marshal dump its graph of objects along with the actual memory layout. On load, copying can be avoided; just a few pointers need to be updated. The resulting marshal files would be platform-specific (wrt. endianness and pointer width).
On marshaling, you copy all objects into a contiguous block of memory (8-aligned), and dump that. On unmarshaling, you just map that block. If the target supports true memory mapping with page boundaries, you might be able to store multiple .pyc files into a single page. This reformatting could be done offline also.
A few things need to be considered: - compatibility. The original marshal code would probably need to be preserved for the "marshal" module. - relative pointers. Code objects, tuples, etc. contain pointers. Assuming the marshaled object cannot be loaded back into the same address, you need to adjust pointers. A common trick is to put a desired load address into the memory block, then try to load into that address. If the address is already taken, load into a different address, and walk though all objects, adjusting pointers. - type references. On loading, you will need to patch all ob_type fields. Put the marshal codes into the ob_type field on marshalling, then switch on unmarshalling. - references to interned strings. On loading, you can either intern them all, or you have a "fast interning" algorithm that assigns a fixed table of interned-string numbers. - reference counting. Make sure all these objects start out with a reference count of 1, so they will never become garbage.
If you use a container file for multiple .pyc files, you can have additional savings by sharing strings across modules; this should help in particular for reference to builtin symbols, and for common method names. A fixed interning might become unnecessary as the unique single string object in the container will either become the interned string itself, or point it it after being interned once. With such a container system, unmarshalling should be lazy; e.g. for each object, the value of ob_type can be used to determine whether the object was unmarshalled.
Of course, you still have the actual interpretation of the top-level module code - if it's not the marshalling but this part that actually costs performance, this efficient marshalling algorithm won't help. It would be interesting to find out which modules have a particularly high startup cost - perhaps they can be rewritten.
Regards, Martin
Of course, you still have the actual interpretation of the top-level module code - if it's not the marshalling but this part that actually costs performance, this efficient marshalling algorithm won't help. It would be interesting to find out which modules have a particularly high startup cost - perhaps they can be rewritten
I am afraid this is the case. I hope we could marshal an arbitary application state (not even Python specific) into a fast loading dump file (hibernation/snapshot).
We have tried to use lazy importing as much as possible to distribute the importing cost across the application UI states.
Out of my head I know at least two particular module which could be refactored. I'd recommend as the best practice that everything should be imported lazily if it's possible. However, looks like currently Python community is moving to another direction, since doing explict imports in __init__ etc. makes APIs cleaner (think Django) and debugging more sane task - Python is mainly used on the server and limited environments haven't been particular interesting until lately.
logging - defines lots of classes which are used only if they are specified by logging options. I once hacked this for my personal use to be little lighter.
urllib - particular heavy, imports httplib, ftplib and stuff even if it is not used
Nokia has just released Python 2.5 based PyS60. I think we'll come back this after a while with a nice generic profiler which will tell the import cost.
Merry XMas, -Mikko
Mikko Ohtamaa wrote:
Out of my head I know at least two particular module which could be refactored. I'd recommend as the best practice that everything should be imported lazily if it's possible.
We actually have a reason for discouraging lazy imports - using them carelessly makes it much easier to accidentally deadlock yourself on the import lock.
I agree that this contributes to the problem of long startup times though.
One sledgehammer approach to lazy imports is to modify the actual import system to use lazy imports by default, rather than having to explicitly enable them in a given module or for each particular import.
Mercurial does this quite nicely by overriding the __import__ implementation [1].
Perhaps PyS60 could install something similar in site.py? The trade-off will be whether enough time is saved in avoiding "wasted" module loads to make up for the extra time spent managing the bookkeeping for the lazy imports.
Cheers, Nick.
[1] From a recent thread on Python-Ideas that Google found for me: http://selenic.com/repo/index.cgi/hg-stable/file/967adcf5910d/mercurial/dema...
Of course, you still have the actual interpretation of the top-level module code - if it's not the marshalling but this part that actually costs performance, this efficient marshalling algorithm won't help. It would be interesting to find out which modules have a particularly high startup cost - perhaps they can be rewritten
I am afraid this is the case.
Is that an unfounded or a founded fear? IOW, do you have hard numbers proving that it is the actual interpretation time (rather than the marshaling time) that causes the majority of the startup cost?
I hope we could marshal an arbitary application state (not even Python specific) into a fast loading dump file (hibernation/snapshot).
I understand that this is what you want to get. I'm proposing that there might be a different approach to achieve a similar speedup.
logging - defines lots of classes which are used only if they are specified by logging options. I once hacked this for my personal use to be little lighter.
So what speedup did you gain by rewriting it? (i.e. how many microseconds did "import logging" take before, how much afterwards?) How much of it was parsing/unmarshaling, and how much time byte code interpretation? Of the byte code interpretation, what opcodes in particular?
urllib - particular heavy, imports httplib, ftplib and stuff even if it is not used
Same questions here. This doesn't sound like any heavy computation is being done during startup.
Nokia has just released Python 2.5 based PyS60. I think we'll come back this after a while with a nice generic profiler which will tell the import cost.
Looking forward to hear your numbers!
Regards, Martin