[Python-checkins] python/nondist/sandbox/string alt292.py, 1.2, 1.3 curry292.py, 1.2, 1.3

Tue Sep 7 06:41:58 CEST 2004

Update of /cvsroot/python/python/nondist/sandbox/string
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv31196

Modified Files:
	alt292.py curry292.py 
Log Message:
* Adopted Martin's suggestion for trapping end user placeholder name errors.
  Now depends on a Unicode definitions of alphanumeric rather than locale
  specific definitions.  The resulting code is cleaner and will run the
  same across all platforms and locale settings.

* Reformatted the comments in the doctests.



Index: alt292.py
===================================================================
RCS file: /cvsroot/python/python/nondist/sandbox/string/alt292.py,v
retrieving revision 1.2
retrieving revision 1.3
diff -u -d -r1.2 -r1.3

--- alt292.py	7 Sep 2004 04:26:43 -0000	1.2
+++ alt292.py	7 Sep 2004 04:41:54 -0000	1.3
@@ -8,7 +8,9 @@
 'the 10 and'
 
 
-Next, it makes sure the return type is a str if all the inputs are a str.  Any unicode components will cause a unicode output.  This matches the behavior of other re and string ops:
+Next, it makes sure the return type is a str if all the inputs are a str.  Any
+unicode components will cause a unicode output.  This matches the behavior of
+other re and string ops:
 
 >>> dollarsub('the $xxx and', xxx='10')
 'the 10 and'
@@ -28,7 +30,8 @@
 u'the 10 and'
 
 
-The ValueErrors are now more specific.  They include the line number and the mismatched token:
+The ValueErrors are now more specific.  They include the line number and the
+mismatched token:
 
 >>> t = """line one
 ... line two
@@ -40,18 +43,30 @@
 ValueError: Invalid placeholder on line 3:  '@malformed'
 
 
-Also, the re pattern was changed just a bit to catch an important class of locale specific errors where a user may use a non-ASCII identifier.  The previous implementation would match up to the first non-ASCII character and then return a KeyError if the abbreviated is (hopefully) found.  Now, it returns a value error highlighting the problem identifier.  Note, we still only accept Python identifiers but have improved error detection:
+Also, the re pattern was changed just a bit to catch an important class of
+language specific errors where a user may use a non-ASCII identifier. The
+previous implementation would match up to the first non-ASCII character and
+then return a KeyError if the abbreviated is (hopefully) found.  Now, it
+returns a value error highlighting the problem identifier.  Note, we still
+only accept Python identifiers but have improved error detection:
 
->>> import locale
->>> savloc = locale.setlocale(locale.LC_ALL)
->>> _ = locale.setlocale(locale.LC_ALL, 'spanish')
 >>> t = u'Returning $ma\u00F1ana or later.'
 >>> dollarsub(t, {})
 Traceback (most recent call last):
  . . .
 ValueError: Invalid placeholder on line 1:  u'ma\xf1ana'
 
->>> _ = locale.setlocale(locale.LC_ALL, savloc)
+
+Exercise safe substitution:
+
+>>> safedollarsub('$$ $name ${rank}', name='Guido', rank='BDFL')
+'$ Guido BDFL'
+>>> safedollarsub('$$ $name ${rank}')
+'$ $name ${rank}'
+>>> safedollarsub('$$ $@malformed ${rank}')
+Traceback (most recent call last):
+ . . .
+ValueError: Invalid placeholder on line 1:  '@malformed'
 
 '''
 
@@ -65,11 +80,11 @@
   \$([_a-z][_a-z0-9]*(?!\w))|   # $ and a Python identifier
   \${([_a-z][_a-z0-9]*)}|       # $ and a brace delimited identifier
   \$(\S*)                       # Catchall for ill-formed $ expressions
-""", _re.IGNORECASE | _re.VERBOSE | _re.LOCALE)
+""", _re.IGNORECASE | _re.VERBOSE | _re.UNICODE)
 # Pattern notes:
 #
 # The pattern for $identifier includes a negative lookahead assertion
-# to make sure that the identifier is not followed by a locale specific
+# to make sure that the identifier is not followed by a Unicode
 # alphanumeric character other than [_a-z0-9].  The idea is to make sure
 # not to partially match an ill-formed identifiers containing characters
 # from other alphabets.  Without the assertion the Spanish word for

Index: curry292.py
===================================================================
RCS file: /cvsroot/python/python/nondist/sandbox/string/curry292.py,v
retrieving revision 1.2
retrieving revision 1.3
diff -u -d -r1.2 -r1.3
--- curry292.py	7 Sep 2004 04:26:44 -0000	1.2
+++ curry292.py	7 Sep 2004 04:41:54 -0000	1.3
@@ -8,7 +8,9 @@
 'the 10 and'
 
 
-Next, it makes sure the return type is a str if all the inputs are a str.  Any unicode components will cause a unicode output.  This matches the behavior of other re and string ops:
+Next, it makes sure the return type is a str if all the inputs are a str. Any
+unicode components will cause a unicode output.  This matches the behavior of
+other re and string ops:
 
 >>> Template('the $xxx and')(xxx='10')
 'the 10 and'
@@ -28,7 +30,8 @@
 u'the 10 and'
 
 
-The ValueErrors are now more specific.  They include the line number and the mismatched token:
+The ValueErrors are now more specific.  They include the line number and the
+mismatched token:
 
 >>> t = """line one
 ... line two
@@ -40,18 +43,19 @@
 ValueError: Invalid placeholder on line 3:  '@malformed'
 
 
-Also, the re pattern was changed just a bit to catch an important class of locale specific errors where a user may use a non-ASCII identifier.  The previous implementation would match up to the first non-ASCII character and then return a KeyError if the abbreviated is (hopefully) found.  Now, it returns a value error highlighting the problem identifier.  Note, we still only accept Python identifiers but have improved error detection:
+Also, the re pattern was changed just a bit to catch an important class of
+language specific errors where a user may use a non-ASCII identifier. The
+previous implementation would match up to the first non-ASCII character and
+then return a KeyError if the abbreviated is (hopefully) found.  Now, it
+returns a value error highlighting the problem identifier.  Note, we still
+only accept Python identifiers but have improved error detection:
 
->>> import locale
->>> savloc = locale.setlocale(locale.LC_ALL)
->>> _ = locale.setlocale(locale.LC_ALL, 'spanish')
 >>> t = u'Returning $ma\u00F1ana or later.'
 >>> Template(t)({})
 Traceback (most recent call last):
  . . .
 ValueError: Invalid placeholder on line 1:  u'ma\xf1ana'
 
->>> _ = locale.setlocale(locale.LC_ALL, savloc)
 
 Exercise safe substitution:
 
@@ -80,11 +84,11 @@
       \$([_a-z][_a-z0-9]*(?!\w))|   # $ and a Python identifier
       \${([_a-z][_a-z0-9]*)}|       # $ and a brace delimited identifier
       \$(\S*)                       # Catchall for ill-formed $ expressions
-    """, _re.IGNORECASE | _re.VERBOSE | _re.LOCALE)
+    """, _re.IGNORECASE | _re.VERBOSE | _re.UNICODE)
     # Pattern notes:
     #
     # The pattern for $identifier includes a negative lookahead assertion
-    # to make sure that the identifier is not followed by a locale specific
+    # to make sure that the identifier is not followed by a Unicode
     # alphanumeric character other than [_a-z0-9].  The idea is to make sure
     # not to partially match an ill-formed identifiers containing characters
     # from other alphabets.  Without the assertion the Spanish word for