a -very- case sensitive search
ptmcg at austin.rr._bogus_.com
Sun Nov 26 12:11:35 CET 2006
"Ola K" <olakh at walla.co.il> wrote in message
news:1164490795.866046.133230 at 45g2000cws.googlegroups.com...
> I am pretty new to Python and I want to make a script that will search
> for the following options:
> 1) words made of uppercase characters -only- (like "YES")
> 2) words made of lowercase character -only- (like "yes")
> 3) and words with only the first letter capitalized (like "Yes")
> * and I need to do all these considering the fact that not all letters
> are indeed English letters.
> I went through different documention section but couldn't find a right
> condition, function or method for it.
> Suggestions will be very much appriciated...
You may be new to Python, but are you new to regular expressions too? I am
no wiz at them, but here is a script that takes a stab at what you are
trying to do. (For more regular expression info, see
The script has these steps:
- create strings containing all unicode chars that are considered "lower"
and "upper", using the unicode.is* methods
- use these strings to construct 3 regular expressions (or "re"s), one for
words of all lowercase letters, one for words of all uppercase letters, and
one for words that start with an uppercase letter followed by at least one
- use each re to search the string u"YES yes Yes", and print the found
I've used unicode strings throughout, so this should be applicable to your
text consisting of letters beyond the basic Latin set (since Outlook Express
is trying to install Israeli fonts when reading your post, I assume these
are the characters you are trying to handle). You may have to do some setup
of your locale for proper handling of unicode.isupper, etc., but I hope this
gives you a jump start on your problem.
uppers = u"".join( unichr(i) for i in range(sys.maxunicode)
if unichr(i).isupper() )
lowers = u"".join( unichr(i) for i in range(sys.maxunicode)
if unichr(i).islower() )
allUpperRe = ur"\b[%s]+\b" % uppers
allLowerRe = ur"\b[%s]+\b" % lowers
capWordRe = ur"\b[%s][%s]+\b" % (uppers,lowers)
regexes = [
(allUpperRe, "all upper"),
(allLowerRe, "all lower"),
(capWordRe, "title case"),
for reString,label in regexes:
reg = re.compile(reString)
result = reg.findall(u" YES yes Yes ")
all upper : [u'YES']
all lower : [u'yes']
title case : [u'Yes']
More information about the Python-list