Regular Expression - old regex module vs. re module
Paul McGuire
ptmcg at austin.rr._bogus_.com
Fri Jun 30 13:38:15 EDT 2006
"Jim Segrave" <jes at nl.demon.net> wrote in message
news:12aaigaohtou291 at corp.supernews.com...
>
> If fails for floats specified as ###. or .###, it outputs an integer
> format and the decimal point separately. It also ignores \# which
> should prevent the '#' from being included in a format.
>
True. What is the spec for these formatting strings, anyway? I Googled a
while, and it does not appear that this is really a Perl string formatting
technique, despite the OP's comments to the contrary. And I'm afraid my
limited Regex knowledge leaves the OP's example impenetrable to me. I got
lost among the '\'s and parens.
I actually thought that "###." was *not* intended to be floating point, but
instead represented an integer before a sentence-ending period. You do have
to be careful of making *both* leading and trailing digits optional, or else
simple sentence punctuating periods will get converted to "%1f"!
As for *ignoring* "\#", it would seem to me we would rather convert this to
"#", since "#" shouldn't be escaped in normal string interpolation.
The following modified version adds handling for "\#", "\<" and "\>", and
real numbers with no integer part. The resulting program isn't radically
different from the first version. (I've highlighted the changes with "<==="
marks.)
-- Paul
------------------
from pyparsing import Combine,Word,Optional,Regex
"""
read Perl-style formatting placeholders and replace with
proper %x string interp formatters
###### -> %6d
##.### -> %6.3f
<<<<< -> %-5s
>>>>> -> %5s
"""
# set up patterns to be matched
# (note use of results name in realFormat, for easy access to
# decimal places substring)
intFormat = Word("#")
realFormat = Combine(Optional(Word("#"))+"."+ # <===
Word("#").setResultsName("decPlaces"))
leftString = Word("<")
rightString = Word(">")
escapedChar = Regex(r"\\[#<>]") # <===
# define parse actions for each - the matched tokens are the third
# arg to parse actions; parse actions will replace the incoming tokens with
# value returned from the parse action
intFormat.setParseAction( lambda s,l,toks: "%%%dd" % len(toks[0]) )
realFormat.setParseAction( lambda s,l,toks: "%%%d.%df" %
(len(toks[0]),len(toks.decPlaces)) )
leftString.setParseAction( lambda s,l,toks: "%%-%ds" % len(toks[0]) )
rightString.setParseAction( lambda s,l,toks: "%%%ds" % len(toks[0]) )
escapedChar.setParseAction( lambda s,l,toks: toks[0][1] ) #
<===
# collect all formatters into a single "grammar"
# - note reals are checked before ints
formatters = rightString | leftString | realFormat | intFormat | escapedChar
# <===
# set up our test string, and use transform string to invoke parse actions
# on any matched tokens
testString = r"""
This is a string with
ints: #### # ###############
floats: #####.# ###.###### #.# .###
left-justified strings: <<<<<<<< << <
right-justified strings: >>>>>>>>>> >> >
int at end of sentence: ####.
I want \##, please.
"""
print testString
print formatters.transformString( testString )
------------------
Prints:
This is a string with
ints: #### # ###############
floats: #####.# ###.###### #.# .###
left-justified strings: <<<<<<<< << <
right-justified strings: >>>>>>>>>> >> >
int at end of sentence: ####.
I want \##, please.
This is a string with
ints: %4d %1d %15d
floats: %7.1f %10.6f %3.1f %4.3f
left-justified strings: %-8s %-2s %-1s
right-justified strings: %10s %2s %1s
int at end of sentence: %4d.
I want #%1d, please.
More information about the Python-list
mailing list