EDI parsing

Emile van Sebille emile at fenx.com
Wed Sep 11 16:50:12 CEST 2002

Onno Ebbinge:
> I'm writing an application (in Python) that is part of a bigger
> infrastructure for B2B messaging. One communication channel is an EDI
> interface to the B2B messaging system.
> I need to parse a (subset) of the UN EDIFACT (aka EDI) standard
> (homepage: http://www.unece.org/trade/untdid/welcome.htm). After doing
> the initial reading of the standard it looks like it is quite hard to
> make a parser for this standard (and 'constructor').

My experience is with the UCC UCS standards, so some of this may be
different in the UN standard.

Once you have the full specification of the documents you'll be
exchanging in a form that python can digest, it's not that hard to build
the parser and/or constructor.  The full specs are available to members
on CD in PDF form.  I once wrote an extremely ugly extraction utility
that preprocessed using pdf2text and then parsed the resulting mess into
something I could then use as something like

from edixn import EdiXn
for source in sourcemessages:
    tempIn = EdiXn(source)

where sourcemessages is a python list of the individual transactions
extracted from the edi envelope.  EdiXn knew about 810, 850, 852, 855,
875 and 880 transactions, allowing me to write appropriate handlers for
each type.

> My questions:
> 1) Is there a python EDI module? I can't find any :(

I didn't then either.

> 2) Is there a (python) EDI to XML converter?

That wouldn't be hard.  If there's already a DTD for edi, that would
help a lot.  ;-)

> 3) Is there anything that I can use to easily interface with EDI?
>    (preferably in lib and DLL form)

Try some of the commercial suppliers.  There's big money (both expense
and potential) in this, and that's probably why we don't find
implementations strewn about the web.

> If the above is not available or fails...
> [You have to know that I've never written a parser in Python before.
>  The last (big) parser I wrote was in C with a few years back the help
>  of lex and yacc if memory serves me right.]
> What is the best approach to writing an EDI parser in Python?

To parse the received messages, I've always found it easiest to use and
parse the raw ascii coming in.  There are commercial packages that allow
you to map and export or even map-to-map import, but as most of the
customers I've done this for were typically forced into it by demands of
the channel, and the commercial offerings rarely seemed as nimble as
simply writing a one-to-one utility taking the edi order to the database
or the resulting invoice to edi.  I have generally used a commercially
available transport package, KMart at that time being the notable
exception.  Outbound messages follow suit.

If you are writing this as a part of a portable b2b application then
you've got your hands full as you've entered the map-making utility
supplier market.  You'll need to address the entire specification, which
my hardcopy of  my deprecated 003050UCS shows to be some ~1000 _dense_
pages.  Make sure you've got the funding to do it right, as this is no
low-budget add-on.

The 'edi specification' always felt akin to the 'rs232 specification' to
me, only instead of 25 pins to muck up, they offer 100s.

Not-sure-how-this-could-help-ly y'rs,


Emile van Sebille
emile at fenx.com


More information about the Python-list mailing list