Page layout in Python
qualsivoglia at dovetipare.nz
Fri Jul 25 16:01:48 CEST 2014
The first step in grabbing information from a pdf file is to translate it
into text format with pdftotext -layout command.
Is it available any specific python tool or library to describe the
layout of a page with ascii characters and to help in identifying and
extracting the useful pieces of information? For example a function
allowing to select N characters at line I starting from column Y.
If a such tool is not available, what is in your mind the best structure
to describe in python a two dimensions page layout?
More information about the Python-list