ignore case only for a part of the regex?
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Mon Dec 31 23:14:26 EST 2012
On Sun, 30 Dec 2012 10:20:19 -0500, Roy Smith wrote:
> The way I would typically do something like this is build my regexes in
> all lower case and .lower() the text I was matching against them. I'm
> curious what you're doing where you want to enforce case sensitivity in
> one part of a header, but not in another.
Well, sometimes you have things that are case sensitive, and other things
which are not, and sometimes you need to match them at the same time. I
don't think this is any more unusual than (say) wanting to match an
otherwise lowercase word whether or not it comes at the start of a
sentence:
"[Pp]rogramming"
is conceptually equivalent to "match case-insensitive `p`, and case-
sensitive `rogramming`".
By the way, although there is probably nothing you can (easily) do about
this prior to Python 3.3, converting to lowercase is not the right way to
do case-insensitive matching. It happens to work correctly for ASCII, but
it is not correct for all alphabetic characters.
py> 'Straße'.lower()
'straße'
py> 'Straße'.upper()
'STRASSE'
The right way is to casefold first, then match:
py> 'Straße'.casefold()
'strasse'
Curiously, there is an uppercase ß in old German. In recent years some
typographers have started using it instead of SS, but it's still rare,
and the official German rules have ß transform into SS and vice versa.
It's in Unicode, but few fonts show it:
py> unicodedata.lookup('LATIN CAPITAL LETTER SHARP S')
'ẞ'
--
Steven
More information about the Python-list
mailing list