Regex Group case change
Cameron Simpson
cs at cskk.id.au
Thu Oct 1 21:24:08 EDT 2020
It is good to see a nice small piece of code which we can run. Thank
you.
So there are a number of things to comment about in the code below;
comments inline under the relevant piece of code (we prefer the "inline
reply" style here, it reads like a conversation):
On 01Oct2020 15:15, Raju <ch.nagaraju008 at gmail.com> wrote:
>import re
>import os
>import sys
>
>#word = "7 the world" # 7 The world
>#word = "Brian'S" # Brian's
>#word = "O'biran"# O'Brian
>#word = "Stoke-On-Trent" # Stoke-on-Trent; here i need to lower the case of middle word(i.e -On-)
There is an opinion often held that regexp are overused. To lowercase
the "on" I would reach for str.split, for example:
left, middle, right = word.split('-', 2)
middle = middle.lower()
modified_word = '-'.join([left, middle, right])
I have broken that out for readability, and it is hardwired for a 3 part
word. See the docs for str.split and str.join:
https://docs.python.org/3/library/stdtypes.html#str.join
https://docs.python.org/3/library/stdtypes.html#str.split
So: no regexps, which I'm sure you now realise can be tricky to get
correct, and are hard to read.
>def wordpattern(word):
> output = ''
> if re.match("^\d+|w*$",word):
> output = word.upper()
> elif re.match("\w+\'\w{1}$",word):
> output = word.capitalize()
> elif re.match("(\d+\w* )(Hello)( \w+)",word))
> group(1)group(2).title()group(3)
> else:
> output.title()
First off, please try to use raw strings for regular expressions, it
avoids many potential accidents to do with backslash treatment by Python
and regexps. So rewritten:
>def wordpattern(word):
> output = ''
> if re.match(r"^\d+|w*$",word):
> output = word.upper()
> elif re.match(r"\w+\'\w{1}$",word):
> output = word.capitalize()
> elif re.match(r"(\d+\w* )(Hello)( \w+)",word))
> group(1)group(2).title()group(3)
> else:
> output.title()
First up, this function does not return a value - it has no return
statement. You probably want:
return output
at the end. Also, your default output seems to be ''; would it not be
better to return word unchanged? So I'd start with:
output = word
up the front.
Then there's a bunch of small issues in the main code:
> if re.match(r"^\d+|w*$",word):
> output = word.upper()
You probably want "\w", not "w" (missing backslash). A plain "w" matches
the letter "w". Also, you probabloy want "\w+", not "\w*" - meaning "at
least one" instead of "zero or more" aka "at least 0". With the "*" it
can match zero character (the empty string).
> elif re.match(r"\w+\'\w{1}$",word):
The "\w{1}" can just be written "\w" - the default repetition for a
subpattern is "exactly once", which is what "{1}" means. So not
incorrect, just more complicated than required.
> output = word.capitalize()
> elif re.match(r"(\d+\w* )(Hello)( \w+)",word))
Typically people put the whitepsace outside the group, because they
usually want the word and not the spaces around it. Of course, the cost
of that s that you would need to put the spaces back in later. So in
fact this works for your use case.
> group(1)group(2).title()group(3)
You need to join these together, and assign the result to output:
output = group(1) + group(2).title() + group(3)
> else:
> output.title()
You need to assign the result to output:
output = output.title()
Cheers,
Cameron Simpson <cs at cskk.id.au>
More information about the Python-list
mailing list