EDI parsing

Onno Ebbinge ebbingeo at logica.com
Thu Sep 12 09:23:05 EDT 2002


"Emile van Sebille" <emile at fenx.com> wrote in message news:<mailman.1031755813.4692.python-list at python.org>...
> Onno Ebbinge:
<snip>
> > I need to parse a (subset) of the UN EDIFACT (aka EDI) standard
> > (homepage: http://www.unece.org/trade/untdid/welcome.htm). After doing
> > the initial reading of the standard it looks like it is quite hard to
> > make a parser for this standard (and 'constructor').
> 
> My experience is with the UCC UCS standards, so some of this may be
> different in the UN standard.
> 
> Once you have the full specification of the documents you'll be
> exchanging in a form that python can digest, it's not that hard to build
> the parser and/or constructor.  The full specs are available to members
> on CD in PDF form.  I once wrote an extremely ugly extraction utility
> that preprocessed using pdf2text and then parsed the resulting mess into
> something I could then use as something like
> 
> from edixn import EdiXn
> for source in sourcemessages:
>     tempIn = EdiXn(source)
> 
> where sourcemessages is a python list of the individual transactions
> extracted from the edi envelope.  EdiXn knew about 810, 850, 852, 855,
> 875 and 880 transactions, allowing me to write appropriate handlers for
> each type.

Sounds like a very similar situation.

> >
> > My questions:
> >
> > 1) Is there a python EDI module? I can't find any :(
> 
> I didn't then either.

:(
 
> > 2) Is there a (python) EDI to XML converter?
> 
> That wouldn't be hard.  If there's already a DTD for edi, that would
> help a lot.  ;-)

No DTD...

> > 3) Is there anything that I can use to easily interface with EDI?
> >    (preferably in lib and DLL form)
> 
> Try some of the commercial suppliers.  There's big money (both expense
> and potential) in this, and that's probably why we don't find
> implementations strewn about the web.

Commercial suppliers are not an option! But I agree with your
remark about the 'big money'. Maybe when I'm done ;-)

> > If the above is not available or fails...
> >
> > [You have to know that I've never written a parser in Python before.
> >  The last (big) parser I wrote was in C with a few years back the help
> >  of lex and yacc if memory serves me right.]
> >
> > What is the best approach to writing an EDI parser in Python?
> 
> To parse the received messages, I've always found it easiest to use and
> parse the raw ascii coming in.  There are commercial packages that allow
> you to map and export or even map-to-map import, but as most of the
> customers I've done this for were typically forced into it by demands of
> the channel, and the commercial offerings rarely seemed as nimble as
> simply writing a one-to-one utility taking the edi order to the database
> or the resulting invoice to edi.  I have generally used a commercially
> available transport package, KMart at that time being the notable
> exception.  Outbound messages follow suit.

I'm in favour of the raw parsing too. Maybe a one-to-one mapping,
maybe with ebXML in between, I'm not sure yet.
 
> If you are writing this as a part of a portable b2b application then
> you've got your hands full as you've entered the map-making utility
> supplier market.  You'll need to address the entire specification, which
> my hardcopy of  my deprecated 003050UCS shows to be some ~1000 _dense_
> pages.  Make sure you've got the funding to do it right, as this is no
> low-budget add-on.

I'm in luck then! I only need to parse about 10 to 15 messages.
I don't want the 'quick and dirty' solution because I _know_ it
will bite me if there are changes and/or new messages. And they
_will_ come.

> The 'edi specification' always felt akin to the 'rs232 specification' to
> me, only instead of 25 pins to muck up, they offer 100s.
> 
> Not-sure-how-this-could-help-ly y'rs,

Thanks anyway, now I know that parsing is the way to go.

Regards,

Onno



More information about the Python-list mailing list