[Tutor] Parsing problem
Liam Clarke
cyresse at gmail.com
Mon Jul 25 06:49:05 CEST 2005
Hi Paul,
My apologies, as I was jumping into my car after sending that email, it
clicked in my brain.
"Oh yeah... initial & body..."
But good to know about how to accept valid numbers.
Sorry, getting a bit too quick to fire off emails here.
Regards,
Liam Clarke
On 7/25/05, Paul McGuire <paul at alanweberassociates.com> wrote:
>
> Liam -
>
> The two arguments to Word work this way:
> - the first argument lists valid *initial* characters
> - the second argument lists valid *body* or subsequent characters
>
> For example, in the identifier definition,
>
> identifier = pp.Word(pp.alphas, pp.alphanums + "_/:.")
>
> identifiers *must* start with an alphabetic character, and then may be
> followed by 0 or more alphanumeric or _/: or . characters. If only one
> argument is supplied, then the same string of characters is used as both
> initial and body. Identifiers are very typical for 2 argument Word's, as
> they often start with alphas, but then accept digits and other
> punctuation.
> No whitespace is permitted within a Word. The Word matching will end when
> a
> non-body character is seen.
>
> Using this definition:
>
> integer = pp.Word(pp.nums+"-+.", pp.nums)
>
> It will accept "+123", "-345", "678", and ".901". But in a real number, a
> period may occur anywhere in the number, not just as the initial
> character,
> as in "3.14159". So your bodyCharacters must also include a ".", as in:
>
> integer = pp.Word(pp.nums+"-+.", pp.nums+".")
>
> Let me say, though, that this is a very permissive definition of integer -
> for one thing, we really should rename it something like "number", since
> it
> now accepts non-integers as well! But also, there is no restriction on the
> frequency of body characters. This definition would accept a "number" that
> looks like "3.4.3234.111.123.3234". If you are certain that you will only
> receive valid inputs, then this simple definition will be fine. But if you
> will have to handle and reject erroneous inputs, then you might do better
> with a number definition like:
>
> number = Combine( Word( "+-"+nums, nums ) +
> Optional( point + Optional( Word( nums ) ) ) )
>
> This will handle "+123", "-345", "678", and "0.901", but not ".901". If
> you
> want to accept numbers that begin with "."s, then you'll need to tweak
> this
> a bit further.
>
> One last thing: you may want to start using setName() on some of your
> expressions, as in:
>
> number = Combine( Word( "+-"+nums, nums ) +
> Optional( point + Optional( Word( nums ) ) )
> ).setName("number")
>
> Note, this is *not* the same as setResultsName. Here setName is attaching
> a
> name to this pattern, so that when it appears in an exception, the name
> will
> be used instead of an encoded pattern string (such as W:012345...). No
> need
> to do this for Literals, the literal string is used when it appears in an
> exception.
>
> -- Paul
>
>
>
--
'There is only one basic human right, and that is to do as you damn well
please.
And with it comes the only basic human duty, to take the consequences.'
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20050725/4ea79eee/attachment.htm
More information about the Tutor
mailing list