Reading a HP Printer Web Interface

rbt rbt at athop1.ath.vt.edu
Mon Dec 27 21:15:31 CET 2004


Hello there,

Depending on the firmware version of the HP printer and the model type, 
one will encounter a myriad of combinations of the following strings 
while reading the index page:

hp
HP
color
Color
Printer
Printer Status
Status:
Device:
Device Status
laserjet
LaserJet

How can I go about determining if a site is indeed the Web interface to 
a HP printer? The goal is to remove all HP printers from a list of 
publicly available Web sites... I've tried this approach, but it gets 
messy quickly when I attempt to account for all possible combinations 
that HP uses:

f = urllib2.urlopen("http://%s" %host)
data = f.read()
f.close()
if 'hp' or 'HP' and 'color' or 'Color' and 'Printer' or 'Printer Status' 
in data:
     DISREGARD THE IP

I'm sure there's a more graceful way to go about this while maintaining 
a high degree of accuracy and as few false positives as possible. Any 
tips or pointers?

Thanks in advance!



More information about the Python-list mailing list