Page layout in Python

maurog qualsivoglia at dovetipare.nz
Fri Jul 25 16:01:48 CEST 2014


The first step in grabbing information from a pdf file is to translate it 
into text format with pdftotext -layout command. 
Is it available any specific python tool or library to describe the 
layout of a page with ascii characters and to help in identifying and 
extracting the useful pieces of information? For example a function 
allowing to select N characters at line I starting from column Y. 

If a such tool is not available, what is in your mind the best structure 
to describe in python a two dimensions page layout?



More information about the Python-list mailing list