How to grab a number from inside a .html file using regex

Νίκος nikos.the.gr33k at gmail.com
Sat Aug 7 15:37:54 EDT 2010


On 7 Αύγ, 22:17, MRAB <pyt... at mrabarnett.plus.com> wrote:
> Νίκος wrote:
> > On 7 Αύγ, 21:24, MRAB <pyt... at mrabarnett.plus.com> wrote:
>
> >> Use group capture:
>
> >>      found = re.match(r'<!-- (\d+) -->', firstline).group(1)
> >>      print(page_id)
>
> > Worked like a charm! Thanks a lot!
>
> > So match method here not only searched for the string representation
> > of the number but also convert it to integer as well?
>
> > r stand for retrieve the string here?
>
> > and group?
>
> > Wehn a regex searched a .txt file when is retrieving something for it
> > always retrieve it as string right? or can get it as a number as well?
>
> The 'r' prefix makes it a 'raw string literal'. That means that the
> string literal won't treat backslashes as special. Before raw string
> literals were added to the Python language I would have needed to write:
>
>      '<!-- (\\d+) -->'
>
> instead.
>
> (Actually, that's not strictly true in this case, because \d doesn't
> have a special meaning Python strings, but it's a good idea to use raw
> string literals habitually when writing regexes in order to reduce the
> chance of forgetting them when they _are_ necessary. Well, that's what I
> think, anyway. :-))

Couln't agree more!

As the saying goes, better safe than sorry! :-)



More information about the Python-list mailing list