Efficient way to break up a list into two pieces
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Sun Feb 21 18:21:27 EST 2010
On Sun, 21 Feb 2010 11:07:24 -0800, marwie wrote:
>> x = SomeReallyBigListOrString
>> for item in x[1:]:
>> process(item)
>>
>> has to copy the entire list or string (less the first item). But
>> honestly, I've never found a situation where it actually mattered.
>
> Good grief, it copies that, too? I assumed that the copying is at least
> delayed until the assignment and that x[1:] is represented by some
> wrapper that just shuffles the indices around (much like the .indices
> method that MRAB suggested).
Such complexity doesn't happen for free. It costs developer time, more
complexity means more bugs, and more runtime overhead, especially for
small lists. 99.9% of programs operate on small amounts of data, and
"small" gets bigger every year.
(I was reading one of my old Uni text books the other day, and one of the
projects was an application to index all the words in a text file. The
author decided to use disk storage rather than in-memory data structures
because he was hoping to read files up to 60,000 words, and considered it
unlikely that any PCs would have enough memory to store 60,000 words!)
Unless you actually profile the code, you're as likely to be *slowing
Python down* by such extra sophistication as you are to speed it up.
Copying blocks of pointers is really fast on modern CPUs.
As a general rule, Python aims for simplicity of implementation rather
than clever, but fragile, tricks.
> Maybe copying chunks of data really isn't that much of an issue as it
> used to be (and maybe I'm getting old). The application I have in mind
> has to do with symbolic computation, and expressions would be
> represented by python lists. It's too early to do any profiling (in
> fact, it's at the "deciding if python is the right language to implement
> it" stage), but it will have to handle expressions with many terms (i.e.
> long lists), and in the end taking these lists apart and putting them
> back together in different order is the only thing the code will do.
Are you likely to have expressions with a hundred or so MILLION terms?
If so, then you should start thinking about clever tricks to minimise
copying. Otherwise, you're engaged in premature optimization, which is
the root of all computer science evil :)
--
Steven
More information about the Python-list
mailing list