[Python-ideas] get() method for list and tuples
Steven D'Aprano
steve at pearwood.info
Thu Mar 2 07:58:36 EST 2017
On Wed, Mar 01, 2017 at 02:56:44AM +0100, Michel Desmoulin wrote:
> > first_item = (alist[0:1] or ["ham"])[0]
>
> Come on, I've been doing Python for more than a decade and never saw
> anybody doing that. Even reading it in a code would make me scratch my
> head for a moment with a "what is it doing that for?".
These days, it might be better to write it as:
first_item = alist[0] if len(alist) else "ham"
but I remember the days when slicing was normal and using the `or` trick
was standard operating procedure.
> You are trying to hard to provide a counter argument here.
In context, all I'm saying is that you don't *have* to catch IndexError.
There are alternatives. That is all.
> Me, I have to deal SOAP government systems, mongodb based API built by
> teenagers, geographer data set exports and FTP + CSV in marina systems
> (which I happen to work on right now).
>
> 3rd party CSV, XML and JSON processing are just a hundred of lines of
> try/except on indexing because they have many listings, data positions
> is important and a lot of system got it wrong, giving you inconsistent
> output with missing data and terrible labeling.
This is all very well and good, and I feel your pain for having to deal
with garbage data, but I don't see how this helps you. You talk about
missing data, but lists cannot contain missing data from the middle.
There's no such thing as a list like:
[0, 1, 2, 3, , , , , , , 10, 11, 12]
where alist[3] and alist[10] will succeed but alist[4] etc will raise
IndexError. So I'm still trying to understand what this proposal gets
you that wouldn't be better solved using (say) itertools.zip_longest or
a pre-processing step to clean up your data.
> And because life is unfair, the data you can extract is often a mix of
> heterogeneous mappings and lists / tuples. And your tool must manage the
> various versions of the data format they send to you, some with
> additional fields, or missing ones. Some named, other found by position.
Maybe I'm underestimating just how awful your data is, but I'm having
difficulty thinking of a scenario where you don't know what kind of
object you are processing and have to write completely type-agnostic
code:
for key_or_index in sequence_of_keys_or_indexes:
result = sequence_or_mapping[key_or_index]
I'm sure that there is lots of code where you iterate over dicts:
for key in keys:
result = mapping.get(key, default)
and likewise code where you process lists:
for i in indexes:
try:
result = sequence[i]
except IndexError:
result = default
# could be re-written using a helper function:
for i in indexes:
result = get(sequence, i default)
but I've never come across a data-processing situation where I didn't
know which was which.
That second version with the helper function would be *marginally* nicer
written as a method call. I grant you that!
> This summer, I had to convert a data set provided by polls in africa
> through an android form, generated from an XML schema, send as json
> using Ajax, then stored in mongodb... to an excel spread sheet (and also
> an HTML table and some JS graphs for good measure).
>
> Needingless to say I dealt with a looooot of IndexError. Grepping the
> project gives me:
>
> grep -R IndexError | wc -l
> 33
>
> In contrast I have 32 KeyError (most of them to allow lazy default
> value), and 3 decorators.
So 33 is "a looooot", but 32 KeyErrors is "in contrast" and presumably a
little.
> Apparently IndexError is an important error because if I grep the
> virtualenv of the project:
>
> grep -R IndexError | wc -l
> 733
>
> Ok, it's a pretty large project with 154 dependancies, but it's still
> almost 7 IndexError by package on average. So it's not a rare use case.
You don't know that every one of those can be replaced by list.get().
Some of them might be raising IndexError; some of them may be
documenting that a function or method raises IndexError; etc.
> I also see it regularly in my classes. Students try it because they
> learned it works with dict. It makes sense.
I never said it didn't. But I wonder whether it gives *enough* benefit
to be worth while.
--
Steve
More information about the Python-ideas
mailing list