How to escape # hash character in regex match strings

Brian D briandenzer at gmail.com
Thu Jun 11 16:22:44 CEST 2009


On Jun 11, 2:01 am, Lie Ryan <lie.1... at gmail.com> wrote:
> 504cr... at gmail.com wrote:
> > I've encountered a problem with my RegEx learning curve -- how to
> > escape hash characters # in strings being matched, e.g.:
>
> >>>> string = re.escape('123#abc456')
> >>>> match = re.match('\d+', string)
> >>>> print match
>
> > <_sre.SRE_Match object at 0x00A6A800>
> >>>> print match.group()
>
> > 123
>
> > The correct result should be:
>
> > 123456
>
> > I've tried to escape the hash symbol in the match string without
> > result.
>
> > Any ideas? Is the answer something I overlooked in my lurching Python
> > schooling?
>
> As you're not being clear on what you wanted, I'm just guessing this is
> what you wanted:
>
> >>> s = '123#abc456'
> >>> re.match('\d+', re.sub('#\D+', '', s)).group()
> '123456'
> >>> s = '123#this is a comment and is ignored456'
> >>> re.match('\d+', re.sub('#\D+', '', s)).group()
>
> '123456'

Sorry I wasn't more clear. I positively appreciate your reply. It
provides half of what I'm hoping to learn. The hash character is
actually a desirable hook to identify a data entity in a scraping
routine I'm developing, but not a character I want in the scrubbed
data.

In my application, the hash makes a string of alphanumeric characters
unique from other alphanumeric strings. The strings I'm looking for
are actually manually-entered identifiers, but a real machine-created
identifier shouldn't contain that hash character. The correct pattern
should be 'A1234509', but is instead often merely entered as '#12345'
when the first character, representing an alphabet sequence for the
month, and the last two characters, representing a two-digit year, can
be assumed. Identifying the hash character in a RegEx match is a way
of trapping the string and transforming it into its correct machine-
generated form.

I'm surprised it's been so difficult to find an example of the hash
character in a RegEx string -- for exactly this type of situation,
since it's so common in the real world that people want to put a pound
symbol in front of a number.

Thanks!



More information about the Python-list mailing list