Regular Expression Help
Graham Breed
x31eq at cnntp.org
Mon Apr 13 02:22:55 EDT 2009
Jean-Claude Neveu wrote:
> Hello,
>
> I was wondering if someone could tell me where I'm going wrong with my
> regular expression. I'm trying to write a regexp that identifies whether
> a string contains a correctly-formatted currency amount. I want to
> support dollars, UK pounds and Euros, but the example below deliberately
> omits Euros in case the Euro symbol get mangled anywhere in email or
> listserver processing. I also want people to be able to omit the
> currency symbol if they wish.
If Euro symbols can get mangled, so can Pound signs.
They're both outside ASCII.
> My regexp that I'm matching against is: "^\$\£?\d{0,10}(\.\d{2})?$"
>
> Here's how I think it should work (but clearly I'm wrong, because it
> does not actually work):
>
> ^\$\£? Require zero or one instance of $ or £ at the start of the
> string.
^[$£]? is correct. And, as you're using re.match, the ^ is
superfluous. (A previous message suggested ^[\$£]? which
will also work. You generally need to escape a Dollar sign
but not here.)
You should also think about the encoding. In my terminal,
"£" is identical to '\xc2\xa3'. That is, two bytes for a
UTF-8 code point. If you assume this encoding, it's best to
make it explicit. And if you don't assume a specific
encoding it's best to convert to unicode to do the
comparisons, so for 2.x (or portability) your string should
start u"
> d{0,10} Next, require between zero and ten alpha characters.
There's a backslash missing, but not from your original
expression. Digits are not "alpha characters".
> (\.\d{2})? Optionally, two characters can follow. They must be preceded
> by a decimal point.
That works. Of course, \d{2} is longer than the simpler \d\d
Note that you can comment the original expression like this:
rex = u"""(?x)
^[$£]? # Zero or one instance of $ or £
# at the start of the string.
\d{0,10} # Between zero and ten digits
(\.\d{2})? # Optionally, two digits.
# They must be preceded by a decimal point.
$ # End of line
"""
Then anybody (including you) who comes to read this in the
future will have some idea what you were trying to do.
\> Examples of acceptable input should be:
>
> $12.42
> $12
> £12.42
> $12,482.96 (now I think about it, I have not catered for this in my
> regexp)
Yes, you need to think about that.
Graham
More information about the Python-list
mailing list