[Python-Dev] [NPERS] Re: a feature i'd like to see in python #2: indexing of match objects

Thu Dec 7 22:57:49 CET 2006

"Michael Urman" <murman at gmail.com> wrote:
> On 12/6/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> > Special cases aren't special enough to break the rules.
> 
> Sure, but where is this rule that would be broken? I've seen it
> invoked, but I've never felt it myself. I seriously thought of slicing
> as returning a list of elements per range(start,stop,skip), with the
> special case being str (and then unicode followed)'s type
> preservation.

Tuple slicing doesn't return lists.  Array slicing doesn't return lists.
None of Numarray, Numeric, or Numpy array slicing returns lists.  Only
list slicing returns lists in current stdlib and major array package
Python.  Someone please correct me if I am wrong.

> This is done because a list of characters is a pain to work with in
> most contexts, so there's an implicit ''.join on the list. And because
> assuming that the joined string is the desired result, it's much
> faster to have just built it in the first place. A pure practicality
> beats purity argument.

Python returns strings from string slices not because of a "practicality
beats purity" argument, but because not returning a string from string
slicing is *insane*.  The operations on strings tend to not be of the
kind that is done with lists (insertion, sorting, etc.), they are
typically linguistic, parsing, or data related (chop, append, prepend,
scan for X, etc.).  Also, typically, each item in a list is a singular
item.  Whereas in a string, typically _blocks of characters_ represent a
single item: spaces, words, short, long, float, double, etc. (the latter
being related to packed representations of data structures, as is the
case in some socket protocols).

Semantically, lists differ from strings in various *substantial* ways,
which is why you will never see a typical user of Python asking for
list.partition(lst).  Strings have the sequence interface out of
*convenience*, not because strings are like a list.  Don't try to
combine the two ideas.  Also, when I said that strings/unicode were
special and "has already been discussed ad-nauseum", I wasn't kidding.
Take string and unicode out of the discussion and search google for the
thousands of other threads that talk about why string and unicode are
the way they are.  They don't belong in this conversation.

> We both arrive at the same place in that we have a model describing
> the behavior for list/str/unicode, but they're different models when
> extended outside.

But this isn't about str/unicode (or buffer).  For all other types
available in Python with a slice operation, slicing them returns the
same type as the original sequence.  Expand that to 3rd party modules,
and the case still holds for every package (at least those that I have
seen and used).  If you can point out an example (not str/unicode/buffer)
for which this rule is broken in the stdlib or even any *major* 3rd
party library, I'll buy you a cookie if we ever meet.

> Now that I see the connection you're drawing between your argument and
> the paper, I don't believe it's directly inspired by the paper. I read
> the paper to say those who could create and work with a set of rules,
> could learn to work with the correct rules. Consistency in Python
> makes things easier on everyone because there's less to remember, not
> because it makes us better learners of the skills necessary for
> programming well. The arguments I saw in the paper only addressed the
> second point.

Right, but if you have a set of rules:
1. RULE 1
2. RULE 2
3. RULE 3

The above will be easier to understand than:
1. RULE 1
2.
  a. RULE 2 if X
  b. RULE 2a otherwise
3. RULE 3

This is the case even ignoring the paper.  Special cases are more
difficult to learn than no special cases.  Want a real world example?
English.  The english language is so fraught with special cases that it
is the only language for which dyslexia is known to exist (or was known
to exist for many years, I haven't kept up on it).

In the context of the paper, their findings suggested that those who
could work with a *consistent* set of rules could be taught the right
consistent rules.  Toss in an inconsistency?  Who knows if in this case
it will make *any* difference in Python; regular expressions are already
confusing for many people.

This is a special case.  There's a zen.  Does practicality beat purity
zen apply?  I don't know.

At this point I've just about stopped caring.  Make the slice return a
list, don't allow slicing, or make it a full on group variant.  I don't
really care at this point.  Someone write a patch and lets go with it. 
Adding slicing producing a list should be easy after the main patch is
done, and we can emulate it in Python if necessary.

 - Josiah