How to escape # hash character in regex match strings

504crank at 504crank at
Thu Jun 11 10:29:45 EDT 2009

On Jun 11, 2:01 am, Lie Ryan <lie.1... at> wrote:
> 504cr... at wrote:
> > I've encountered a problem with my RegEx learning curve -- how to
> > escape hash characters # in strings being matched, e.g.:
> >>>> string = re.escape('123#abc456')
> >>>> match = re.match('\d+', string)
> >>>> print match
> > <_sre.SRE_Match object at 0x00A6A800>
> >>>> print
> > 123
> > The correct result should be:
> > 123456
> > I've tried to escape the hash symbol in the match string without
> > result.
> > Any ideas? Is the answer something I overlooked in my lurching Python
> > schooling?
> As you're not being clear on what you wanted, I'm just guessing this is
> what you wanted:
> >>> s = '123#abc456'
> >>> re.match('\d+', re.sub('#\D+', '', s)).group()
> '123456'
> >>> s = '123#this is a comment and is ignored456'
> >>> re.match('\d+', re.sub('#\D+', '', s)).group()
> '123456'- Hide quoted text -
> - Show quoted text -

Sorry I wasn't more clear. I positively appreciate your reply. It
provides half of what I'm hoping to learn. The hash character is
actually a desirable hook to identify a data entity in a scraping
routine I'm developing, but not a character I want in the scrubbed

In my application, the hash makes a string of alphanumeric characters
unique from other alphanumeric strings. The strings I'm looking for
are actually manually-entered identifiers, but a real machine-created
identifier shouldn't contain that hash character. The correct pattern
should be 'A1234509', but is instead often merely entered as '#12345'
when the first character, representing an alphabet sequence for the
month, and the last two characters, representing a two-digit year, can
be assumed. Identifying the hash character in a RegEx match is a way
of trapping the string and transforming it into its correct machine-
generated form.

Other patterns the strings can take in their manually-created


Garbage in, garbage out -- I know. I wish I could tell the people
entering the data how challenging it is to work with what they
provide, but it is, after all, a screen-scraping routine.

I'm surprised it's been so difficult to find an example of the hash
character in a RegEx string -- for exactly this type of situation,
since it's so common in the real world that people want to put a pound
symbol in front of a number.


More information about the Python-list mailing list