Regular Expression - old regex module vs. re module

Fri Jun 30 13:38:15 EDT 2006

"Jim Segrave" <jes at nl.demon.net> wrote in message
news:12aaigaohtou291 at corp.supernews.com...
>
> If fails for floats specified as ###. or .###, it outputs an integer
> format and the decimal point separately. It also ignores \# which
> should prevent the '#' from being included in a format.
>

True.  What is the spec for these formatting strings, anyway?  I Googled a
while, and it does not appear that this is really a Perl string formatting
technique, despite the OP's comments to the contrary.  And I'm afraid my
limited Regex knowledge leaves the OP's example impenetrable to me.  I got
lost among the '\'s and parens.

I actually thought that "###." was *not* intended to be floating point, but
instead represented an integer before a sentence-ending period.  You do have
to be careful of making *both* leading and trailing digits optional, or else
simple sentence punctuating periods will get converted to "%1f"!

As for *ignoring* "\#", it would seem to me we would rather convert this to
"#", since "#" shouldn't be escaped in normal string interpolation.

The following modified version adds handling for "\#", "\<" and "\>", and
real numbers with no integer part.  The resulting program isn't radically
different from the first version.  (I've highlighted the changes with "<==="
marks.)

-- Paul

------------------
from pyparsing import Combine,Word,Optional,Regex

"""
read Perl-style formatting placeholders and replace with
proper %x string interp formatters

   ###### -> %6d
   ##.### -> %6.3f
   <<<<<  -> %-5s
   >>>>>  -> %5s

"""

# set up patterns to be matched
# (note use of results name in realFormat, for easy access to
# decimal places substring)
intFormat = Word("#")
realFormat = Combine(Optional(Word("#"))+"."+                 # <===
                     Word("#").setResultsName("decPlaces"))
leftString = Word("<")
rightString = Word(">")
escapedChar = Regex(r"\\[#<>]")               # <===

# define parse actions for each - the matched tokens are the third
# arg to parse actions; parse actions will replace the incoming tokens with
# value returned from the parse action
intFormat.setParseAction( lambda s,l,toks: "%%%dd" % len(toks[0]) )
realFormat.setParseAction( lambda s,l,toks: "%%%d.%df" %
                              (len(toks[0]),len(toks.decPlaces)) )
leftString.setParseAction( lambda s,l,toks: "%%-%ds" %  len(toks[0]) )
rightString.setParseAction( lambda s,l,toks: "%%%ds" %  len(toks[0]) )
escapedChar.setParseAction( lambda s,l,toks: toks[0][1] )              #
<===

# collect all formatters into a single "grammar"
# - note reals are checked before ints
formatters = rightString | leftString | realFormat | intFormat | escapedChar
# <===

# set up our test string, and use transform string to invoke parse actions
# on any matched tokens
testString = r"""
    This is a string with
        ints: ####  # ###############
        floats: #####.#  ###.######  #.# .###
        left-justified strings: <<<<<<<<  << <
        right-justified strings: >>>>>>>>>>  >> >
        int at end of sentence: ####.
        I want \##, please.
    """

print testString
print formatters.transformString( testString )

------------------
Prints:

    This is a string with
        ints: ####  # ###############
        floats: #####.#  ###.######  #.# .###
        left-justified strings: <<<<<<<<  << <
        right-justified strings: >>>>>>>>>>  >> >
        int at end of sentence: ####.
        I want \##, please.

    This is a string with
        ints: %4d  %1d %15d
        floats: %7.1f  %10.6f  %3.1f %4.3f
        left-justified strings: %-8s  %-2s %-1s
        right-justified strings: %10s  %2s %1s
        int at end of sentence: %4d.
        I want #%1d, please.