[Tutor] Regex not working as desired

Tue Feb 27 00:13:00 EST 2018

On 26Feb2018 11:01, Roger Lea Scherer <rls4jc at gmail.com> wrote:
>  The first step is to input data and then I want to check to make sure
>there are only digits and no other type of characters. I thought regex
>would be great for this.

Many people do :-) They are a reasonable tool for an assortment of text 
matching tasks, but as you're discovering they can be easy to get wrong and 
hard to debug when you do. That's not to say you shouldn't use them, but many 
people use them for far too much.

>The program works great, but no matter what I
>enter, the regex part does the same thing. By same thing I mean this:
[...]
>Please enter an integer less than 10,000 greater than 0:  4jkk33
>No match
>Please enter an integer less than 10,000 greater than 0:  4k33
>No match
>Please enter an integer less than 10,000 greater than 0:  4jjk4
>No match
>Please enter an integer less than 10,000 greater than 0:  4334
>No match

So, "no match regardless of the input".

>So I don't know what I'm doing wrong. The cipher will still draw, but I
>want to return an "error message" in this case print("No match"), but it
>does it every time, even when there are only digits; that's not what I
>want. Please help. Below is my code:

Thank you for the code! Many people forget to include it. I'm going to trim for 
readability...

[...]
>digits = input("Please enter an integer less than 10,000 greater than 0:  ")
>
>""" ensure input is no other characters than digits
>sudocode: if the input has anything other than digits
> return digits  """
>
>#def digit_check(digits):
># I thought making it a function might h
>p = re.compile(r'[^\D]')

This seems a slightly obtuse way to match a digit. You're matching "not a 
nondigit". You could just use \d to match a digit, which is more readable.

This regular expression also matches a _single_ digit.

>m = p.match(digits)

Note that match() matches at the beginning of the string.

I notice that all your test strings start with a digit. That is why the regular 
expression always matches.

>if m:
>    print("No match")

This seems upside down, since your expression matches a digit.

Ah, I see what you've done.

The "^" marker has 2 purposes in regular expressions. At the start of a regular 
expression it requires the expression to match at the start of the string. At 
the start of a character range inside [] it means to invert the range. So:

  \d    A digit.
  \D    A nondigit.
  ^\D   A nondigit at the start of the string
  [^\D] "not a nondigit" ==> a digit

The other thing that you may have missed is that the \d, \D etc shortcuts for 
various common characters do not need to be inside [] markers.

So I suspect you wanted to at least start with "a nondigit at the start of the 
string". That would be:

  ^\D

with no [] characters.

Now your wider problem seems to be to make sure your string consists entirely 
of digits. Since your logic looks like a match for invalid input, your regexp 
might look like this:

  \D

and you could use .search instead of .match to find the nondigit anywhere in 
the string instead of just at the start.

Usually, however, it is better to write validation code which matches exactly 
what you actually want instead of trying to think of all the things that might 
be invalid. You want an "all digits" string, so you might write this:

  ^\d*$

which matches a string containing only digits from the beginning to the end.  
That's:

  ^     start of string
  \d    a digit
  *     zero or more of the digit
  $     end of string

Of course you really want at least one or more, so you would use "+" instead of 
"*".

So you code might look like:

  valid_regexp = re.compile(r'^\d+$')
  m = valid_regexp.match(digits)
  if m:
    # input is valid
  else:
    # input is invalid

Finally, you could also consider not using a regexp for this particular task.  
Python's "int" class can be called with a string, and will raise an exception 
if that string is not a valid integer. This also has the advantage that you get 
an int back, which is easy to test for your other constraints (less than 10000, 
greater than 0). Now, because int(0 raises an exception for bad input you need 
to phrase the test differently:

  try:
    value = int(digits)
  except ValueError:
    # invalid input, do something here
  else:
    if value >= 10000 or value <= 0:
      # value out of range, do something here
    else:
      # valid input, use it

Cheers,
Cameron Simpson <cs at cskk.id.au> (formerly cs at zip.com.au)