Greetings, The latest entries into http://python.faqts.com regards, Fiona ## Edited Entries ############################################## ------------------------------------------------------------- Where can I best learn how to parse out both HTML and Javascript tags to extract text from a page? http://www.faqts.com/knowledge-base/view.phtml/aid/3680 ------------------------------------------------------------- Paul Allopenna, Matthew Schinckel, Magnus Lyckå Python Documentation If you want to (quickly) strip all HTML tags from a string of data, try using the re module: import re file = open(filename,'r') data = file.read() file.close() text = re.sub('<!--.*?-->', '', data) #Remove comments first, or '>' in #comments will be interpreted as #end of (comment) tag. text = re.sub('<.*?>', '', text) This will also strip any javascript, but only if the page has been made 'properly' - that is, the javascript is within HTML comments. If you want to know how it works, read the 're' chapter in the library reference, as it discusses the usefulness of 'non-greedy' regular expressions. ------------------------------------------------------------- How do I check and retrieve the error conditions & message of script executed via the Python/C API (without using PyErr_Print) http://www.faqts.com/knowledge-base/view.phtml/aid/6234 ------------------------------------------------------------- Kostas Karanikolas, Fiona Czuczman Alex Martelli See Section 4, "Exception Handling", in the Python/C API Reference Manual. PyErr_Occurred() returns a borrowed- reference pointer to the latext exception-type PyObject if any is pending, NULL if no error is pending; to check whether the exception is one you want to handle, pass this non-NULL pointer as the first argument to function PyErr_GivenExceptionMatches, second argument being the exception-type or class you want to handle -- it will return non-0 if matching. For finer control, you can call PyErr_Fetch, with three PyObject** arguments, for type, value, and traceback objects -- if no error is pending, each pointed-to pointer will be set to 0; else, at least the first (to type-object) will be non-0 (a reference will have be added to objects returned in this way). What you mean by "message" is probably what you get by PyObject_Str on the value object pointer (if non-0). ------------------------------------------------------------- I'm getting an error stating that "None" object has no attribute "groups" during setup of numpy, any ideas? http://www.faqts.com/knowledge-base/view.phtml/aid/6245 ------------------------------------------------------------- Michael Risser, Nicholas Hendley Oleg Broytmann It seems you are trying to do the following - match or search for regular expression: match_object = re.search(pattern, string) groups = match_object.groups() But your regexp didn't match anything in the string, so match_object here is really None. Test it before doing anything with it: match_object = re.search(pattern, string) if match_object: groups = match_object.groups() else: print "No match!" ------------------------------------------------------------- How do you access the printer from Python under Linux??? http://www.faqts.com/knowledge-base/view.phtml/aid/6376 ------------------------------------------------------------- Dave Berry, Donovan Baarda This question is very open. There are three main levels at which you can access the printer under linux; user, device, and IO port. The usual user level access is to print documents using the 'lpr' command. This will spool documents for printing in /var/spool/lpr to be printed in order when the printer is available, and allows multiple users to print documents without causing conflicts. This requires that the Linux box has lpd configured and running. Typicaly lpd is configured to use something like magic-filter to automaticly convert different types of documents into the format understood by the printer. This means most standard file types (postscript, png, text, etc) can be printed directly. (Note that there are alternatives to lpd, such as cups, that perform basicly the same thing). From Python, things can be printed as follows; import os filename='~/file.ps' os.system('lpr %s' % filename) or p=os.popen('lpr','w') p.write('printing test text string\n') p.close() Device level access involves directly opening the linux device file and writing to it. This requires that the user has write level access to the device, and does not allow shared access to the printer. This is not normaly what you want to do unless you are writing something like your own lpd replacement. Guru's might be able to do some ioctl magic on the device to do things like get the printer status, but otherwise this is pretty simple; p=os.open('/dev/lp1','w') p.write('printing text test string\n') p.close() IO port level access is the lowest level. You do not want to do this unless you really want to do wierd things with your printer port. An example of this might be plugging in some strange home-built hardware. There are a few ways to do this, but the easiest is using the linux '/dev/port' device that allows direct access to IO ports. The user must have write access to this device. WARNING!!! making a mistake when accessing '/dev/port' can seriously stuff up your system! You must know exactly what you are doing when you use this device. I probably shouldn't be writing this, because if you know enough to try this, you probably already know how :-) IOports=open("/dev/port","r+b",0) def GetChar(address): IOports.seek(address) return ord(IOports.read(1)) def PutChar(address,c): IOports.seek(address) IOports.write(chr(c)) class lpt: def __init__(self,port=0x378): self.address=port def Put_Data(self,c): PutChar(self.address,c) def Get_Data(self): return GetChar(self.address) def Put_Status(self,c): PutChar(self.address+1,c) def Get_Status(self): return GetChar(self.address+1) def Put_Control(self,c): PutChar(self.address+2,c) def Get_Control(self): return GetChar(self.address+2) p=lpt() p.Put_Data('a') c=p.Get_Status() IOPorts.close() ------------------------------------------------------------- How do you keep track of elements of "ravel"ed Numeric arrays http://www.faqts.com/knowledge-base/view.phtml/aid/4286 ------------------------------------------------------------- Nathan Wallace, Fiona Czuczman, Rob Hooft Hans Nowak, Snippet 93, Alex """ Packages: maths.numeric """ """ Hi, again. In case anyone else needs to do something like sorting an array and keeping track of what belongs where in the array, I've found the following function useful. It's not super-fast, but I guess it's doing a lot. """ import operator from Numeric import * def flatten_indices (array): index_array = indices (array.shape) index_list = map (ravel, tuple (index_array)) index_list = apply (map, (None, ) + tuple (index_list)) values = map (operator.getitem, \ len (index_list) * [array], \ index_list) return map (None, values, index_list) """ E.g.
b = reshape (arange (4), (2, 2)) flatten_indices (b) [(0, (0, 0)), (1, (0, 1)), (2, (1, 0)), (3, (1, 1))]
"""