EDI parsing
Onno Ebbinge
ebbingeo at logica.com
Thu Sep 12 09:23:05 EDT 2002
"Emile van Sebille" <emile at fenx.com> wrote in message news:<mailman.1031755813.4692.python-list at python.org>...
> Onno Ebbinge:
<snip>
> > I need to parse a (subset) of the UN EDIFACT (aka EDI) standard
> > (homepage: http://www.unece.org/trade/untdid/welcome.htm). After doing
> > the initial reading of the standard it looks like it is quite hard to
> > make a parser for this standard (and 'constructor').
>
> My experience is with the UCC UCS standards, so some of this may be
> different in the UN standard.
>
> Once you have the full specification of the documents you'll be
> exchanging in a form that python can digest, it's not that hard to build
> the parser and/or constructor. The full specs are available to members
> on CD in PDF form. I once wrote an extremely ugly extraction utility
> that preprocessed using pdf2text and then parsed the resulting mess into
> something I could then use as something like
>
> from edixn import EdiXn
> for source in sourcemessages:
> tempIn = EdiXn(source)
>
> where sourcemessages is a python list of the individual transactions
> extracted from the edi envelope. EdiXn knew about 810, 850, 852, 855,
> 875 and 880 transactions, allowing me to write appropriate handlers for
> each type.
Sounds like a very similar situation.
> >
> > My questions:
> >
> > 1) Is there a python EDI module? I can't find any :(
>
> I didn't then either.
:(
> > 2) Is there a (python) EDI to XML converter?
>
> That wouldn't be hard. If there's already a DTD for edi, that would
> help a lot. ;-)
No DTD...
> > 3) Is there anything that I can use to easily interface with EDI?
> > (preferably in lib and DLL form)
>
> Try some of the commercial suppliers. There's big money (both expense
> and potential) in this, and that's probably why we don't find
> implementations strewn about the web.
Commercial suppliers are not an option! But I agree with your
remark about the 'big money'. Maybe when I'm done ;-)
> > If the above is not available or fails...
> >
> > [You have to know that I've never written a parser in Python before.
> > The last (big) parser I wrote was in C with a few years back the help
> > of lex and yacc if memory serves me right.]
> >
> > What is the best approach to writing an EDI parser in Python?
>
> To parse the received messages, I've always found it easiest to use and
> parse the raw ascii coming in. There are commercial packages that allow
> you to map and export or even map-to-map import, but as most of the
> customers I've done this for were typically forced into it by demands of
> the channel, and the commercial offerings rarely seemed as nimble as
> simply writing a one-to-one utility taking the edi order to the database
> or the resulting invoice to edi. I have generally used a commercially
> available transport package, KMart at that time being the notable
> exception. Outbound messages follow suit.
I'm in favour of the raw parsing too. Maybe a one-to-one mapping,
maybe with ebXML in between, I'm not sure yet.
> If you are writing this as a part of a portable b2b application then
> you've got your hands full as you've entered the map-making utility
> supplier market. You'll need to address the entire specification, which
> my hardcopy of my deprecated 003050UCS shows to be some ~1000 _dense_
> pages. Make sure you've got the funding to do it right, as this is no
> low-budget add-on.
I'm in luck then! I only need to parse about 10 to 15 messages.
I don't want the 'quick and dirty' solution because I _know_ it
will bite me if there are changes and/or new messages. And they
_will_ come.
> The 'edi specification' always felt akin to the 'rs232 specification' to
> me, only instead of 25 pins to muck up, they offer 100s.
>
> Not-sure-how-this-could-help-ly y'rs,
Thanks anyway, now I know that parsing is the way to go.
Regards,
Onno
More information about the Python-list
mailing list