[XML-SIG] Creating XML with Python

J. Clifford Dyer jcd at unc.edu
Thu Jul 24 14:30:18 CEST 2008

On Thu, 2008-07-24 at 14:04 +0200, Fredrik Lundh wrote:
> Eric Chao wrote:
> > I've been trying to convert some text that has some odd coding to xml. I 
> > am trying to use python to create a program that will process this text:
> > 
> > <CN>CHAPTER 1</CN>
> > <SH>The Creation</SH>
> > <C>{{01:1}}1 <RA>In the beginning <RB>God <RC>created the heavens and 
> > the earth.
> > <V>{{01:1}}2 The earth was <$FOr {a waste and 
> > emptiness}>><N1><RA>formless and void, and <RB>darkness was over the 
> > <V>{{01:1}}3 Then <RA>God said, ``Let there be light"; and there was light.
> > 
> > to something like this:
> > 
> > <book osisID="Gen">
> > <chapter sID="Gen.1"/>
> > <p><verse sID="Gen.1.1"/>In the beginning God created the heaven and the 
> > earth.<verse eID="Gen.1.1"/></p>
> > <p><verse sID="Gen.1.2"/>And the earth was without form, and void; and 
> > darkness was upon the face of the deep. And the Spirit of God moved upon 
> > the face of the waters.<verse eID="Gen.1.2"/></p>
> > <p><verse sID="Gen.1.3"/>And God said, Let there be light: and there was 
> > light.<verse eID="Gen.1.3"/></p>
> > 
> > I am not very good with Python and I was hoping someone could offer some 
> > advice on how to get started. I tried to write a program that produces 
> > XML, but I think I need more of a find and replace type program. Thanks !
> that looks a rather daunting task even for an experienced Python 
> programmer (especially mapping between different translations ;-).
> I'd concentrate on parsing the original file format first, before even 
> thinking about how to write it out in XML.
> it might be some kind of SGML, in which case the standard sgmllib 
> library might be helpful:
>      http://effbot.org/librarybook/sgmllib.htm
> if that seems to work, try building some suitable data structure from 
> the incoming data (lists of strings might work, but you might want to 
> create some simple container objects that holds the lists for you).

If it turns out not to be valid SGML, you may need to look into using
pyparsing.  There was a good introduction to it in a recent issue of
python magazine.  There are also a bunch of online tutorials.

J. Cliff Dyer
Carolina Digital Library and Archives
UNC Chapel Hill

