[Tutor] don't understand iteration
Steven D'Aprano
steve at pearwood.info
Thu Nov 13 13:05:29 CET 2014
Coming in late to the conversation:
On Sun, Nov 09, 2014 at 04:34:29PM -0800, Clayton Kirkwood wrote:
> I have the following code:
>
> import urllib.request,re,string
> months = ['Jan.', 'Feb.', 'Mar.', 'Apr.', 'May.', 'Jun.', 'Jul.', 'Aug.',
> 'Sep.', 'Oct.', 'Nov.', 'Dec.']
> from urllib.request import urlopen
> for line in urlopen('http://tycho.usno.navy.mil/cgi-bin/timer.pl'):
> line = line.decode('utf-8') # Decoding the binary data to text.
> if 'EST' in line or 'EDT' in line: # look for Eastern Time
> blah = re.search(
> r'<\w\w>(\w{3}\.)\s+(\d{2}),\s+(\d{2}).+([AP]M)\s+(E[SD]T)', line)
> (month, day, time, ap, offset) = blah.group(1,2,3,4,5)
> print(blah,'\n',ap,month, offset,day, time )
In programming, just like real life, it usually helps to try to isolate
the fault to the smallest possible component. When confused by some
programming feature, eliminate everything you can and focus only on that
feature.
In this case, all that business about downloading a Perl script from the
web, decoding it, iterating over it line by line, is completely
irrelevent. You can recognise this by simplifying the code until either
the problem goes away or you have isolated where the problem is.
In this case, I would simplify the regular expression to something much
simpler, and apply it to a single known string:
text = 'xxxx1234 5678xxx'
regex = r'(\d*) (\d*)' # Match <digits><space><digits>
mo = re.search(regex, text) # "mo" = Match Object
a, b = mo.group(1, 2)
print(a, b)
Now we can focus on the part that is confusing you, namely the need to
manually write out the group numbers. In this case, writing 1,2 is no
big deal, but what if you had twenty groups?
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t = mo.group(1, 2, 3, 4, 5, ...
When you find yourself doing something ridiculously error-prone like
that, chances are there is a better way. And in this case, we have this:
a, b = mo.groups()
mo.groups() returns a tuple of all the groups. You can treat it like any
other tuple:
mo.groups() + (1, 2, 3)
=> returns a new tuple with five items ('1234', '5678', 1, 2, 3)
mo.groups() gives you *all* the groups. What if you only wanted some of
them? Well, it's a tuple, so once you have it, you can slice it the same
as any other tuple:
mo.groups()[1:] # skip the zeroth item, keep all the rest
mo.groups()[:-1] # skip the last item, keep all the rest
mo.groups()[3::2] # every second item starting from the third
Let's go back to the mo.group(1, 2) again, and suppose that there are
more than two groups. Let's pretend that there are 5 groups. How can you
do it using range?
mo.group(*range(1, 5+1))
In this case, the asterisk * behaves like a unary operator. Binary
operators take two arguments:
10-6
Unary operators take one:
-6
The single * isn't a "real" operator, because it is only legal inside a
function call. But it takes a single operand, which must be some sort of
iterable, like range, lists, tuples, strings, even dicts.
With a small helper function, we can experiment with this:
def test(a, b, *args):
print("First argument:", a)
print("Second argument:", b)
print("All the rest:", args)
And in use:
py> test(*[1, 2, 3, 4])
First argument: 1
Second argument: 2
All the rest: (3, 4)
What works with our test() function will work with mo.group() as well,
and what works with a hard-coded list will work with range:
py> test(*range(1, 10))
First argument: 1
Second argument: 2
All the rest: (3, 4, 5, 6, 7, 8, 9)
There is no need to turn the range() object into a list first.
Iterator unpacking does require an iterable object. You can't iterate
over integers:
py> for x in 10:
... pass
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'int' object is not iterable
nor can you unpack them:
py> test(*10)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: test() argument after * must be a sequence, not int
Note that the error message is a bit too conservative, in fact
any iterable is allowed as well as sequences.
> This works fine, but in the (month... line, I have blah.group(1,2,3,4,5),
> but this is problematic for me. I shouldn't have to use that 1,2,3,4,5
> sequence. I tried to use many alternatives using: range(5) which doesn't
> work, list(range(5)) which actually lists the numbers in a list, and several
> others. As I read it, the search puts out a tuple. I was hoping to just
> assign the re.search to month, day, time, ap, offset directly. Why wouldn't
> that work? Why won't a range(5) work? I couldn't find a way to get the len
> of blah.
The length of a Match Object is meaningful. What do you mean by the
length of it?
- the total number of groups in the regular expression?
- the number of groups which actually matched something?
- the total number of characters matched?
- something else?
The idea of the Match Object itself having a length is problematic.
--
Steven
More information about the Tutor
mailing list