[FAQTS] Python Knowledge Base Update -- November 1st, 2000
Fiona Czuczman
Fiona Czuczman <fiona@sitegnome.com>
1 Nov 2000 08:52:51 -0000
Greetings,
The latest entries into http://python.faqts.com
regards,
Fiona
## Edited Entries ##############################################
-------------------------------------------------------------
Where can I best learn how to parse out both HTML and Javascript tags to extract text from a page?
http://www.faqts.com/knowledge-base/view.phtml/aid/3680
-------------------------------------------------------------
Paul Allopenna, Matthew Schinckel, Magnus Lyckå
Python Documentation
If you want to (quickly) strip all HTML tags from a string of data, try
using the
re module:
import re
file = open(filename,'r')
data = file.read()
file.close()
text = re.sub('<!--.*?-->', '', data) #Remove comments first, or '>' in
#comments will be interpreted as
#end of (comment) tag.
text = re.sub('<.*?>', '', text)
This will also strip any javascript, but only if the page has been made
'properly'
- that is, the javascript is within HTML comments.
If you want to know how it works, read the 're' chapter in the library
reference,
as it discusses the usefulness of 'non-greedy' regular expressions.
-------------------------------------------------------------
How do I check and retrieve the error conditions & message of script executed via the Python/C API (without using PyErr_Print)
http://www.faqts.com/knowledge-base/view.phtml/aid/6234
-------------------------------------------------------------
Kostas Karanikolas, Fiona Czuczman
Alex Martelli
See Section 4, "Exception Handling", in the Python/C API
Reference Manual. PyErr_Occurred() returns a borrowed-
reference pointer to the latext exception-type PyObject if
any is pending, NULL if no error is pending; to check
whether the exception is one you want to handle, pass
this non-NULL pointer as the first argument to function
PyErr_GivenExceptionMatches, second argument being
the exception-type or class you want to handle -- it will
return non-0 if matching.
For finer control, you can call PyErr_Fetch, with three
PyObject** arguments, for type, value, and traceback
objects -- if no error is pending, each pointed-to pointer
will be set to 0; else, at least the first (to type-object)
will be non-0 (a reference will have be added to objects
returned in this way). What you mean by "message"
is probably what you get by PyObject_Str on the value
object pointer (if non-0).
-------------------------------------------------------------
I'm getting an error stating that "None" object has no attribute "groups" during setup of numpy, any ideas?
http://www.faqts.com/knowledge-base/view.phtml/aid/6245
-------------------------------------------------------------
Michael Risser, Nicholas Hendley
Oleg Broytmann
It seems you are trying to do the following - match or search for
regular expression:
match_object = re.search(pattern, string)
groups = match_object.groups()
But your regexp didn't match anything in the string, so match_object
here is really None. Test it before doing anything with it:
match_object = re.search(pattern, string)
if match_object:
groups = match_object.groups()
else:
print "No match!"
-------------------------------------------------------------
How do you access the printer from Python under Linux???
http://www.faqts.com/knowledge-base/view.phtml/aid/6376
-------------------------------------------------------------
Dave Berry, Donovan Baarda
This question is very open. There are three main levels at which you
can access the printer under linux; user, device, and IO port.
The usual user level access is to print documents using the 'lpr'
command. This will spool documents for printing in /var/spool/lpr to be
printed in order when the printer is available, and allows multiple
users to print documents without causing conflicts. This requires that
the Linux box has lpd configured and running. Typicaly lpd is
configured to use something like magic-filter to automaticly convert
different types of documents into the format understood by the printer.
This means most standard file types (postscript, png, text, etc) can be
printed directly. (Note that there are alternatives to lpd, such as
cups, that perform basicly the same thing). From Python, things can be
printed as follows;
import os
filename='~/file.ps'
os.system('lpr %s' % filename)
or
p=os.popen('lpr','w')
p.write('printing test text string\n')
p.close()
Device level access involves directly opening the linux device file and
writing to it. This requires that the user has write level access to
the device, and does not allow shared access to the printer. This is
not normaly what you want to do unless you are writing something like
your own lpd replacement. Guru's might be able to do some ioctl magic
on the device to do things like get the printer status, but otherwise
this is pretty simple;
p=os.open('/dev/lp1','w')
p.write('printing text test string\n')
p.close()
IO port level access is the lowest level. You do not want to do this
unless you really want to do wierd things with your printer port. An
example of this might be plugging in some strange home-built hardware.
There are a few ways to do this, but the easiest is using the
linux '/dev/port' device that allows direct access to IO ports. The
user must have write access to this device. WARNING!!! making a mistake
when accessing '/dev/port' can seriously stuff up your system! You must
know exactly what you are doing when you use this device. I probably
shouldn't be writing this, because if you know enough to try this, you
probably already know how :-)
IOports=open("/dev/port","r+b",0)
def GetChar(address):
IOports.seek(address)
return ord(IOports.read(1))
def PutChar(address,c):
IOports.seek(address)
IOports.write(chr(c))
class lpt:
def __init__(self,port=0x378):
self.address=port
def Put_Data(self,c):
PutChar(self.address,c)
def Get_Data(self):
return GetChar(self.address)
def Put_Status(self,c):
PutChar(self.address+1,c)
def Get_Status(self):
return GetChar(self.address+1)
def Put_Control(self,c):
PutChar(self.address+2,c)
def Get_Control(self):
return GetChar(self.address+2)
p=lpt()
p.Put_Data('a')
c=p.Get_Status()
IOPorts.close()
-------------------------------------------------------------
How do you keep track of elements of "ravel"ed Numeric arrays
http://www.faqts.com/knowledge-base/view.phtml/aid/4286
-------------------------------------------------------------
Nathan Wallace, Fiona Czuczman, Rob Hooft
Hans Nowak, Snippet 93, Alex
"""
Packages: maths.numeric
"""
"""
Hi, again.
In case anyone else needs to do something like sorting an array and
keeping track of what belongs where in the array, I've found the
following function useful. It's not super-fast, but I guess it's doing
a lot.
"""
import operator
from Numeric import *
def flatten_indices (array):
index_array = indices (array.shape)
index_list = map (ravel, tuple (index_array))
index_list = apply (map, (None, ) + tuple (index_list))
values = map (operator.getitem, \
len (index_list) * [array], \
index_list)
return map (None, values, index_list)
"""
E.g.
>>> b = reshape (arange (4), (2, 2))
>>> flatten_indices (b)
[(0, (0, 0)), (1, (0, 1)), (2, (1, 0)), (3, (1, 1))]
>>>
"""