[Tutor] need help generating table of contents

Tue Aug 28 06:14:37 EDT 2018

From: Tutor <tutor-bounces+sjeik_appie=hotmail.com at python.org> on behalf of Peter Otten <__peter__ at web.de>
Sent: Monday, August 27, 2018 6:43 PM
To: tutor at python.org
Subject: Re: [Tutor] need help generating table of contents

Albert-Jan Roskam wrote:

> 
> From: Tutor <tutor-bounces+sjeik_appie=hotmail.com at python.org> on behalf
> of Peter Otten <__peter__ at web.de> Sent: Friday, August 24, 2018 3:55 PM
> To: tutor at python.org
> <snip>
>> The following reshuffle of your code seems to work:
>> 
>> print('\r\n** Table of contents\r\n')
>> pattern = '/Title \((.+?)\).+?/Page ([0-9]+)(?:\s+/Count ([0-9]+))?'
>> 
>> def process(triples, limit=None, indent=0):
>> for index, (title, page, count) in enumerate(triples, 1):
>> title = indent * 4 * ' ' + title
>> print(title.ljust(79, ".") + page.zfill(2))
>> if count:
>> process(triples, limit=int(count), indent=indent+1)
>> if limit is not None and limit == index:
>>  break
>> 
>> process(iter(re.findall(pattern, toc, re.DOTALL)))
> 
> Hi Peter, Cameron,
> 
> Thanks for your replies! The code above indeeed works as intended, but: I
> don't really understand *why*. I would assign a name to the following line
> "if limit is not None and limit == index", what would be the most
> descriptive name? I often use "is_*" names for boolean variables. Would
> "is_deepest_nesting_level" be a good name?

> No, it's not necessarily the deepest level. Every subsection eventually ends 
> at this point; so you might call it reached_end_of_current_section
> 
> Or just 'limit' ;) 

LOL. Ok, now I get it :-)

> The None is only there for the outermost level where no /Count is provided. 
> In this case the loop is exhausted.
> 
> If you find it is easier to understand you can calculate the outer count aka 
> limit as the number of matches - sum of counts:
> 

<snip useful info>

>> Also, I don't understand why iter() is required here, and why finditer()
> >is not an alternative.

>finditer() would actually work -- I didn't use it because I wanted to make 
> as few changes as possible to your code. What does not work is a list like 
>the result of findall(). This is because the inner for loops (i. e. the ones 
>in the nested calls of process) are supposed to continue the iteration 
>instead of restarting it. A simple example to illustrate the difference:

Ah, the triples cannot be unpacked inside the "for" line of the loop. This works:
def process(triples, limit=None, indent=0):
     for index, triple in enumerate(triples, 1):
         title, page, count = triple.groups()  # unpack it here
         title = indent * 4 * ' ' + title
         print(title.ljust(79, ".") + page.zfill(2))
         if count:
             process(triples, limit=int(count), indent=indent+1)
         if limit is not None and limit == index:
             break

process(re.finditer(pattern, toc, re.DOTALL))

If I don't do this, I get this error:
  File "Q:/toc/toc.py", line 64, in <module>
    process(re.finditer(pattern, toc, re.DOTALL))
  File "Q:/Ctoc/toc.py", line 56, in process
    for index, (title, page, count) in enumerate(triples, 1):
TypeError: '_sre.SRE_Match' object is not iterable

Process finished with exit code 1

Thanks again Peter! Very insightful!

Albert-Jan