I'm sure it can be done but there is no reason to reinvent the wheel unless it's for a programming exercise. You can use pdftohtml and run it from a Python program if you want. http://pdftohtml.sourceforge.net/