evenprimes at gmail.com
Tue Nov 2 15:45:26 CET 2004
Thanks for both your comments.
My rationale for being able to change the line terminator, was 2 fold,
as you guessed:
1. It's easy and I'm lazy ;)
2. I'm guessing that lots of that code is relatively optimized C, so
it should be fairly fast.
I had considered the FSM tactic, and in fact the rest of my EDI code
usually uses that technique. Why didn't I? Again, 2 reasons:
1. I'm lazy (Is there a patten here? :)
2. I was thinking that OS buffering would help me avoid the worst of
the .read() overhead. Still, the function calling overhead in Python
probably hurts more.
I'll probably go back and add a buffer (a few K) and run some tests.
While any speed improvement would be welcome, I won't consider less
than 1.5x faster being a real success.
The psyco optimizations are very significant, like the recipe says, it
more or less doubles the speed. I'm wondering how that will compare
with a buffered FSM.
I'd love to see a different implementation as well, Jeremy. Some of
the issues I've had to deal with may have been atypical and caused me
to lose some efficiency for an odd case. You also have more
experience with EDI than I (~1.5 yrs) so I'd like to see what kind of
design decisions you'll make.
On Tue, 02 Nov 2004 09:22:09 -0500, Jeremy Jones <zanesdad at bellsouth.net> wrote:
> Peter Hansen wrote:
> > Chris Cioffi wrote:
> >> Are there any docs or examples of extending the file type? I work
> >> with EDI messages that are very like text files, just with a few
> >> quirks. ;-) and I was wondering if I could strech and twist the built
> >> in file type to make things a bit faster and more full featured.
> >> Specifically I would need to alter the iterator and ideally the line
> >> terminitor.
> > It's unclear what you want to do. Can you provide an example?
> > Also consider whether you can achieve what you want merely
> > by creating your own "file-like" object that wraps the
> > standard file type. This is the usual way to proceed.
> > -Peter
> I should really wait for the OP, but I've had too much caffeine this
> morning to just sit still. An EDI message is a (fairly well) structured
> string of text (let's just say a file for now) that may consist of
> multiple interchanges, each interchange consisting of multiple segments
> (at least two, and each segment having a specific character denoting its
> end - all segments in an interchange will have the same segment
> terminator) and each segment consisting of multiple elements. I believe
> the OP wants to be able to specify what character the file object will
> recognize as a line terminator (rather than the standard \n or \r\n),
> presumably so he can tell it that a segment terminator is the line
> terminator, do a readline(), and get an EDI segment instead of a
> "traditional" line of text. Having dealt with EDI for nearly 6 years, I
> could see the benefit of this. While I'm currently headed down the FSM
> route, it would be interesting to see the above mentioned alternative.
> Jeremy Jones
More information about the Python-list