[Tutor] need help generating table of contents
Albert-Jan Roskam
sjeik_appie at hotmail.com
Fri Aug 24 11:16:01 EDT 2018
Hello,
I have Ghostscript files with a table of contents (toc) and I would like to use this info to generate a human-readable toc. The problem is: I can't get the (nested) hierarchy right.
import re
toc = """\
[ /PageMode /UseOutlines
/Page 1
/View [/XYZ null null 0]
/DOCVIEW pdfmark
[ /Title (Title page)
/Page 1
/View [/XYZ null null 0]
/OUT pdfmark
[ /Title (Document information)
/Page 2
/View [/XYZ null null 0]
/OUT pdfmark
[ /Title (Blah)
/Page 3
/View [/XYZ null null 0]
/OUT pdfmark
[ /Title (Appendix)
/Page 16
/Count 4
/View [/XYZ null null 0]
/OUT pdfmark
[ /Title (Sub1)
/Page 17
/Count 4
/OUT pdfmark
[ /Title (Subsub1)
/Page 17
/OUT pdfmark
[ /Title (Subsub2)
/Page 18
/OUT pdfmark
[ /Title (Subsub3)
/Page 29
/OUT pdfmark
[ /Title (Subsub4)
/Page 37
/OUT pdfmark
[ /Title (Sub2)
/Page 40
/OUT pdfmark
[ /Title (Sub3)
/Page 49
/OUT pdfmark
[ /Title (Sub4)
/Page 56
/OUT pdfmark
"""
print('\r\n** Table of contents\r\n')
pattern = '/Title \((.+?)\).+?/Page ([0-9]+)(?:\s+/Count ([0-9]+))?'
indent = 0
start = True
for title, page, count in re.findall(pattern, toc, re.DOTALL):
title = (indent * ' ') + title
count = int(count or 0)
print(title.ljust(79, ".") + page.zfill(2))
if count:
count -= 1
start = True
if count and start:
indent += 2
start = False
if not count and not start:
indent -= 2
start = True
This generates the following TOC, with subsub2 to subsub4 dedented one level too much:
** Table of contents
Title page.....................................................................01
Document information...........................................................02
Blah...........................................................................03
Appendix.......................................................................16
Sub1.........................................................................17
Subsub1....................................................................17
Subsub2......................................................................18
Subsub3......................................................................29
Subsub4......................................................................37
Sub2.........................................................................40
Sub3.........................................................................49
Sub4.........................................................................56
What is the best approach to do this?
Thanks in advance!
Albert-Jan
More information about the Tutor
mailing list