[Python-bugs-list] [ python-Bugs-635595 ] Misleading description of \w in regexs

noreply@sourceforge.net noreply@sourceforge.net
Fri, 08 Nov 2002 09:29:23 -0800


Bugs item #635595, was opened at 2002-11-08 08:29
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=635595&group_id=5470

Category: Documentation
Group: Python 2.2
Status: Open
Resolution: None
Priority: 5
Submitted By: Greg Chapman (glchapman)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: Misleading description of \w in regexs

Initial Comment:
In the Regular Expression Syntax doc page 
(http://www.python.org/dev/doc/devel/lib/re-syntax.html), the 
description for \w is misleading (the same goes for \W).  
The description indicates that, with the locale flag in effect, 
\w includes "characters defined as letters" for the current 
locale.  In reading that, I took "letters" to mean characters 
for which isalpha returns true, but, in fact, all characters 
defined as alphanumerics for the current locale are 
included (so \w works pretty much the same way with locale 
flag as with the unicode flag).  For example (using '\xb2', 
the superscript two):

Python 2.2.2 (#37, Oct 14 2002, 17:02:34) [MSC 32 bit 
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more 
information.
>>> import locale
>>> locale.setlocale(locale.LC_ALL, '')
'English_United States.1252'
>>> import re
>>> re.match(r'\w', '\xb2', re.L).group()
'\xb2'


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=635595&group_id=5470