Ideas for parsing this text?

Paul McGuire ptmcg at austin.rr.com
Thu Apr 24 18:08:10 CEST 2008


On Apr 24, 10:42 am, "Eric Wertman" <ewert... at gmail.com> wrote:
> I'm sure there are cooler ways to do some of that.  I spent most of my
> time expanding the characters that constitute content.  I'm concerned
> that over time I'll have things break as other characters show up.
> Specifically a few of the nodes are of German locale.. so I could get
> some odd international characters.
>
If you want to add international characters without going to Unicode,
a first cut would be to add pyparsing's string constant "ascii8bit".

> It looks like pyparser has a constant for printable characters.  I'm
> not sure if I can just use that, without worrying about it?
>
I would discourage you from using printables, since it also includes
'[', ']', and '"', which are significant to other elements of the
parser (but you could create your own variable initialized with
printables, and then use replace("[","") etc. to strip out the
offending characters).  I'm also a little concerned that you needed to
add \t and \n to the content word - was this really necessary?  None
of your examples showed such words, and I would rather have you let
pyparsing skip over the whitespace as is its natural behavior.

-- Paul



More information about the Python-list mailing list