[Python-Dev] An ability to specify start and length of slices

Thu Jun 3 18:07:48 EDT 2004

Hello,

Many times I find myself asking for a slice of a specific length, rather 
than a slice with a specific end. I suggest to add the syntax 
object[start:>length] (or object[start:>length:jump]), beside the 
existing syntax.

Two examples:

1. Say I have a list with the number of panda bears hunted in each 
month, starting from 1900. Now I want to know how many panda bears were 
hunted in year y. Currently, I have to write something like this:
sum(huntedPandas[(y-1900)*12:(y-1900)*12+12])
If my suggestion is accepted, I would be able to write:
sum(huntedPandas[(y-1900)*12:>12])

2. Many data files contain fields of fixed length. Just an example: say 
I want to get the color of the first pixel of a 24-bit color BMP file. 
Say I have a function which gets a 4-byte string and converts it into a 
32-bit integer. The four bytes, from byte no. 10, are the size of the 
header, in bytes. Right now, if I don't want to use temporary variables, 
I have to write:
picture[s2i(picture[10:14]):s2i(picture[10:14])+4]
I think this is nicer (and quicker):
picture[s2i(picture[10:>4]):>4]

(I mean to show that when working with data files, it's common to have 
slices of specific length, and that the proposed syntax makes things 
clear and simple. I took BMP just as an example - I know about PIL.)

Other solutions (from comp.lang.python responses):

1. Of course, the longer form may be used, and a temporary variable may 
be used to avoid repeated function calls.
2. The idiom object[start:][:length] may be used. However, it may be 
very inefficient, if the list is long. Another advantage of the proposed 
syntax is that it can be used in multi-dimensional slices (for example, 
ar[:,x:>3,:])
3. The programmer may define the function lambda object, start, length: 
object[start:start+length]. This does make expressions quite short, but 
it isn't very readable IMHO, and doesn't deal with multi-dimensional slices.

Objections (also from comp.lang.python):

1. There should be only one way to do something in Python.
2. Some don't like how it looks.
3. l[a:b] yields an empty list when a>b, and l[a:>b] doesn't.

My responses:

1. Changes should be taken seriously, and the language must be kept 
simple and easy to read, but it doesn't mean that there should be only 
one way to do something. Just an example: you could write l[:,:,:,3], 
but the ellipsis token lets you write l[...,3].
2. I can't really argue with that, besides saying that it looks fine to 
me; The symbol '>' generally means "move to the right". I think that 
l[12345:>10] can easily be read as "start from 12345, and move 10 steps 
to the right. Take all the items you passed over."
3. l[a:>b] doesn't look like l[a:b] and it means something altogether 
different. Besides, l[a:b:-1] doesn't yield an empty list when a > b.

Some technical details:

My proposal only affects the conversion from Python code into byte-code. 
This is why it is easy to implement and has no side effects, as far as I 
can see.
I changed the definition of "subscript" in the Grammar file from:
subscript: '.' '.' '.' | test | [test] ':' [test] [sliceop]
into:
subscript: '.' '.' '.' | test | ([test] ':' [test] | test ':>' test) 
[sliceop]
and added the ':>' token to tokenize.c and token.h.
I then extended compile.c to handle the new syntax.

The byte code produced is basically simple: Calculate start, calculate 
length, and add start to length to get the usual start, end. It gets a 
bit complicated because you want range(10)[3:>-5], for example, to yield 
an empty list, and using the method described, it will be equivalent to 
range(10)[3:-2], that is, to [3,4,5,6,7]. So the byte-code my 
implementation produces checks to see if the resulting end is negative 
and start is positive, and if so, puts -sys.maxint, instead of 
start+length, as end. -sys.maxint is used instead of the more obvious 
choice, 0, so that range(10)[3:>-5:-1] will yield [3,2,1,0] and not [3,2,1].
This can be optimized, because I expect that usually length will be an 
integer given explicitly in the Python code, in which case no testing 
has to be done in the byte-code.

Attached are the 4 diffs. I'm sorry, they are against the Python-2.3.3 
release (the sourceforge CVS doesn't work for me currently), but I 
expect them to work fine with the CVS head.

To summerize, this is a small addition, with no side-effects or 
backward-compatibility issues, which will help me and others.

Well, what do you think? I would like to hear your comments.

Best wishes,
Noam Raphael
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Grammar.diff
Type: text/x-patch
Size: 808 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-dev/attachments/20040604/cf70e9a5/Grammar.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: compile.c.diff
Type: text/x-patch
Size: 2278 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-dev/attachments/20040604/cf70e9a5/compile.c.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: token.h.diff
Type: text/x-patch
Size: 767 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-dev/attachments/20040604/cf70e9a5/token.h.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tokenizer.c.diff
Type: text/x-patch
Size: 550 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-dev/attachments/20040604/cf70e9a5/tokenizer.c.bin