What do I do to read html files on my pc?
michele.cecere at gmail.com
Tue Aug 28 12:09:11 CEST 2012
Il giorno lunedì 27 agosto 2012 12:59:02 UTC+2, mikcec82 ha scritto:
> I have an html file on my pc and I want to read it to extract some text.
> Can you help on which libs I have to use and how can I do it?
> thank you so much.
Thank you to all.
Hi Chris, thank you for your hint. I'll try to do as you said and to be clear:
I have to work on an HTML File. This file is not a website-file, neither it comes from internet.
It is a file created by a local software (where "local" means "on my pc").
On this file, I need to do this operation:
1) Open the file
2) Check the occurences of the strings:
2a) XXXX, in this case I have this code:
<tr style="font-size: 10" align="left">
DTC CODE Read:
2b) NOT PASSED, in this case I have this code:
<tr style="color: red" align="left">
: NOT PASSED
Note: color in "<tr style="color: red" align="left">" can be "red" or "orange"
2c) OK or PASSED
3) Then, I need to fill an excel file following this rules:
3a) If 2a or 2b occurs on htmlfile, I'll write NOK in excel file
3b) If 2c occurs on htmlfile, I'll write OK in excel file
1) In this example, in 2b case, I have "CODE CHECK" in the code, but I could also have "TEXT CHECK" or "CHAR CHECK".
2) The research of occurences can be done either by tag ("<tr style="color: red" align="left">") or via (NOT PASSED, PASSED). But I would to use the first method.
In my script I have used the second way to looking for, i.e.:
fileorig = "C:\Users\Mike\Desktop\\2012_05_16_1___p0201_13.html"
f = open(fileorig, 'r')
nomefile = f.read()
for x in nomefile:
if 'XXXX' in nomefile:
But this one works on charachters and not on strings (i.e.: in this way I have searched NOT string by string, but charachters-by-charachters).
I hope I was clear.
Thank for your help
More information about the Python-list