How to grab a number from inside a .html file using regex

MRAB python at mrabarnett.plus.com
Sat Aug 7 14:24:52 EDT 2010


Νίκος wrote:
> Hello guys! Need your precious help again!
> 
> In every html file i have in the very first line a page_id fro counetr
> countign purpsoes like in a format of a comment like this:
> 
> <!-- 1 -->
> <!-- 2 -->
> <!-- 3 -->
> 
> and so on. every html file has its one page_id
> 
> How can i grab that string representaion of a number from inside
> the .html file using regex and convert it to an integer value?
> 
> # ==============================
> # open current html template and get the page ID number
> # ==============================
> 
> f = open( '/home/webville/public_html/' + page )
> 
>  #read first line of the file
> firstline = f.readline()
> 
> page_id = re.match( '<!-- \d -->', firstline )
> print ( page_id )

Use group capture:

     found = re.match(r'<!-- (\d+) -->', firstline).group(1)
     print(page_id)



More information about the Python-list mailing list