[Tutor] need help generating table of contents

Albert-Jan Roskam sjeik_appie at hotmail.com
Fri Aug 24 11:16:01 EDT 2018


Hello,

I have Ghostscript files with a table of contents (toc) and I would like to use this info to generate a human-readable toc. The problem is: I can't get the (nested) hierarchy right.

import re

toc = """\
[ /PageMode /UseOutlines
  /Page 1
  /View [/XYZ null null 0]
  /DOCVIEW pdfmark
[ /Title (Title page)
  /Page 1
  /View [/XYZ null null 0]
  /OUT pdfmark
[ /Title (Document information)
  /Page 2
  /View [/XYZ null null 0]
  /OUT pdfmark
[ /Title (Blah)
  /Page 3
  /View [/XYZ null null 0]
  /OUT pdfmark
[ /Title (Appendix)
  /Page 16
  /Count 4
  /View [/XYZ null null 0]
  /OUT pdfmark
    [ /Title (Sub1)
      /Page 17
      /Count 4
      /OUT pdfmark
    [ /Title (Subsub1)
      /Page 17
      /OUT pdfmark
    [ /Title (Subsub2)
      /Page 18
      /OUT pdfmark
    [ /Title (Subsub3)
      /Page 29
      /OUT pdfmark
    [ /Title (Subsub4)
      /Page 37
      /OUT pdfmark
    [ /Title (Sub2)
      /Page 40
      /OUT pdfmark
    [ /Title (Sub3)
      /Page 49
      /OUT pdfmark
    [ /Title (Sub4)
      /Page 56
      /OUT pdfmark
"""    
print('\r\n** Table of contents\r\n')
pattern = '/Title \((.+?)\).+?/Page ([0-9]+)(?:\s+/Count ([0-9]+))?'
indent = 0
start = True
for title, page, count in re.findall(pattern, toc, re.DOTALL):
    title = (indent * ' ') + title
    count = int(count or 0)
    print(title.ljust(79, ".") + page.zfill(2))
    if count:
        count -= 1
        start = True
    if count and start:
        indent += 2
        start = False
    if not count and not start:
        indent -= 2
        start = True

This generates the following TOC, with subsub2 to subsub4 dedented one level too much:


** Table of contents

Title page.....................................................................01
Document information...........................................................02
Blah...........................................................................03
Appendix.......................................................................16
  Sub1.........................................................................17
    Subsub1....................................................................17
  Subsub2......................................................................18
  Subsub3......................................................................29
  Subsub4......................................................................37
  Sub2.........................................................................40
  Sub3.........................................................................49
  Sub4.........................................................................56

What is the best approach to do this?

Thanks in advance!

Albert-Jan


More information about the Tutor mailing list