[Tutor] Regex not working as desired
Cameron Simpson
cs at cskk.id.au
Tue Feb 27 00:13:00 EST 2018
On 26Feb2018 11:01, Roger Lea Scherer <rls4jc at gmail.com> wrote:
> The first step is to input data and then I want to check to make sure
>there are only digits and no other type of characters. I thought regex
>would be great for this.
Many people do :-) They are a reasonable tool for an assortment of text
matching tasks, but as you're discovering they can be easy to get wrong and
hard to debug when you do. That's not to say you shouldn't use them, but many
people use them for far too much.
>The program works great, but no matter what I
>enter, the regex part does the same thing. By same thing I mean this:
[...]
>Please enter an integer less than 10,000 greater than 0: 4jkk33
>No match
>Please enter an integer less than 10,000 greater than 0: 4k33
>No match
>Please enter an integer less than 10,000 greater than 0: 4jjk4
>No match
>Please enter an integer less than 10,000 greater than 0: 4334
>No match
So, "no match regardless of the input".
>So I don't know what I'm doing wrong. The cipher will still draw, but I
>want to return an "error message" in this case print("No match"), but it
>does it every time, even when there are only digits; that's not what I
>want. Please help. Below is my code:
Thank you for the code! Many people forget to include it. I'm going to trim for
readability...
[...]
>digits = input("Please enter an integer less than 10,000 greater than 0: ")
>
>""" ensure input is no other characters than digits
>sudocode: if the input has anything other than digits
> return digits """
>
>#def digit_check(digits):
># I thought making it a function might h
>p = re.compile(r'[^\D]')
This seems a slightly obtuse way to match a digit. You're matching "not a
nondigit". You could just use \d to match a digit, which is more readable.
This regular expression also matches a _single_ digit.
>m = p.match(digits)
Note that match() matches at the beginning of the string.
I notice that all your test strings start with a digit. That is why the regular
expression always matches.
>if m:
> print("No match")
This seems upside down, since your expression matches a digit.
Ah, I see what you've done.
The "^" marker has 2 purposes in regular expressions. At the start of a regular
expression it requires the expression to match at the start of the string. At
the start of a character range inside [] it means to invert the range. So:
\d A digit.
\D A nondigit.
^\D A nondigit at the start of the string
[^\D] "not a nondigit" ==> a digit
The other thing that you may have missed is that the \d, \D etc shortcuts for
various common characters do not need to be inside [] markers.
So I suspect you wanted to at least start with "a nondigit at the start of the
string". That would be:
^\D
with no [] characters.
Now your wider problem seems to be to make sure your string consists entirely
of digits. Since your logic looks like a match for invalid input, your regexp
might look like this:
\D
and you could use .search instead of .match to find the nondigit anywhere in
the string instead of just at the start.
Usually, however, it is better to write validation code which matches exactly
what you actually want instead of trying to think of all the things that might
be invalid. You want an "all digits" string, so you might write this:
^\d*$
which matches a string containing only digits from the beginning to the end.
That's:
^ start of string
\d a digit
* zero or more of the digit
$ end of string
Of course you really want at least one or more, so you would use "+" instead of
"*".
So you code might look like:
valid_regexp = re.compile(r'^\d+$')
m = valid_regexp.match(digits)
if m:
# input is valid
else:
# input is invalid
Finally, you could also consider not using a regexp for this particular task.
Python's "int" class can be called with a string, and will raise an exception
if that string is not a valid integer. This also has the advantage that you get
an int back, which is easy to test for your other constraints (less than 10000,
greater than 0). Now, because int(0 raises an exception for bad input you need
to phrase the test differently:
try:
value = int(digits)
except ValueError:
# invalid input, do something here
else:
if value >= 10000 or value <= 0:
# value out of range, do something here
else:
# valid input, use it
Cheers,
Cameron Simpson <cs at cskk.id.au> (formerly cs at zip.com.au)
More information about the Tutor
mailing list