Python is faster than C
jcarlson at uci.edu
Sun Apr 4 03:49:51 CEST 2004
> No, range should return an object that is a list, as far as you can tell
> from Python, but which is represented more efficiently than an array of
> objects internally. The distinction is between the language level (it
> would be a list, with all operations, etc.) and the implementation
> (there is no reason why all lists should be arrays of PyObjects
You can implement such a thing already. In fact, xrange up until
recently, supported basically everything that a list object did, except
for mutations. The reason it doesn't anymore is because for multiple
versions of Python, such behavior was buggy and poorly supported. If
you are bored enough, feel free to make your own xrange-like object that
supports mutation. Heck, it can even subclass 'list', though it need
not have any standard list internals.
> Another example would be 'a'*999999999: the result is a string, but
> there is no reason that it takes 100MB of memory. Instead, store it
> into a C structure that contains a pointer to the original string object
> 'a' and the repetition counter, but still give this C structure the
> Python type str, so that the difference doesn't show up and the Python
> language remains simple. (This is a bit difficult to implement
> currently in CPython, but not impossible.)
Also, you are free to implement such a thing. I believe that the
current CPython implementation doesn't follow this route (and other
suggested optimizations) is because it needlessly complicates the
implementation of CPython.
> Ideally: If you do x=range(100); x='hi' then the interpreter first
> builds this optimized range representation and assigns it to x; and when
> in the next statement you modify this list x it says 'oops! i cannot do
> that with this representation', so it reverts to an array-like
> representation (i.e. it creates all 100 elements) and then changes the
> 50th. No gain here. If on the other hand you only ever do 'easy'
> things with your list, like iterate over it or read elements, then it
> can all be done with the range representation, without falling back to
> the array representation.
Why not instead use a dictionary-based approach for special items? It
would be far more memory efficient, and wouldn't incur the modification
> Again I'm not saying it is trivial to implement it, but that not having
> to expose for optimization purposes 'xrange' and the whole 'iterator'
> part of the language would be worth it, in my opinion.
I think that one of the desires of offering 'iterator' concepts to
users, both new and seasoned, is that it allows people to think in ways
they didn't before. Allowing people to make those optimizations 'by
hand', I believe (as an educator and computer scientist), allows them to
grow as programmers (or computer scientists, as the case may be).
Don't get me wrong, it would be terribly convenient for Python to
abstract away all the nastiness of optimization, but if/when someone
were to do something that had been /very/ fast before, but was now
awfully slow (think the current Queue.Queue object for small vs. large
numbers of items), they are going to jump to the conclusion that Python
is broken in some fundamental way, maybe come here to complain about
Python being broken, and those who are familliar with the internals
would say, "you are ruining Python's early optimization by this thing
that you do".
Which really gets us to the fact that you are asking for the Python
internals to be optimized. In fact, while simultaneously saying "don't
optimize early", you are optimizing early by saying that range should be
optimized, as should string multiplication, etc. Goodness man, pick a
position and stick with it.
More information about the Python-list