[Tutor] superscripts in a regex

Wed Jul 31 12:15:00 CEST 2013

Hi,

In the script below I want to filter out the digits and I do not want to retain the decimal grouping symbol, if there are any. The weird thing is that re.findall returns the expected result (group 1 with digits and optionally group2 too), but re.sub does not (it just returns the entire string). I tried using flags re.LOCALE, re.UNICODE, and re.DEBUG for solutions/clues, but no luck

# -*- coding: utf-8 -*-
#Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] on win32
import re

regex = "(^\d+)[.,]?(\d*)[ \w]+"
surfaces = ["79 m\xb2", "1.000 m\xb2", "2,000 m\xb2"]
for i, surface in enumerate(surfaces):
    #surface = surface.replace("\xb2", "2")  # solves it but maybe, some day, there will me (tm), (c), etc symbols!
    print str(i).center(79, "-")
    print re.sub(regex, r"\1\2", surface)  # huh?!
    print re.findall(regex, surface)  # works as expected

It's a no-no to ask this (esp. because it concerns a builtin) but: is this a b-u-g?

Regards,
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a 
fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~