[FAQTS] Python Knowledge Base Update -- November 1st, 2000

Fiona Czuczman Fiona Czuczman <fiona@sitegnome.com>
1 Nov 2000 08:52:51 -0000


The latest entries into http://python.faqts.com



## Edited Entries ##############################################

Where can I best learn how to parse out both HTML and Javascript tags to extract text from a page?
Paul Allopenna, Matthew Schinckel, Magnus Lyckċ
Python Documentation

If you want to (quickly) strip all HTML tags from a string of data, try 
using the 
re module:

import re

file = open(filename,'r')
data = file.read()

text = re.sub('<!--.*?-->', '', data) #Remove comments first, or '>' in
                                      #comments will be interpreted as
                                      #end of (comment) tag.
text = re.sub('<.*?>', '', text)

This will also strip any javascript, but only if the page has been made 
- that is, the javascript is within HTML comments.

If you want to know how it works, read the 're' chapter in the library 
as it discusses the usefulness of 'non-greedy' regular expressions.

How do I check and retrieve the error conditions & message of script executed via the Python/C API (without using PyErr_Print)
Kostas Karanikolas, Fiona Czuczman
Alex Martelli

See Section 4, "Exception Handling", in the Python/C API
Reference Manual.  PyErr_Occurred() returns a borrowed-
reference pointer to the latext exception-type PyObject if
any is pending, NULL if no error is pending; to check
whether the exception is one you want to handle, pass
this non-NULL pointer as the first argument to function
PyErr_GivenExceptionMatches, second argument being
the exception-type or class you want to handle -- it will
return non-0 if matching.

For finer control, you can call PyErr_Fetch, with three
PyObject** arguments, for type, value, and traceback
objects -- if no error is pending, each pointed-to pointer
will be set to 0; else, at least the first (to type-object)
will be non-0 (a reference will have be added to objects
returned in this way).  What you mean by "message"
is probably what you get by PyObject_Str on the value
object pointer (if non-0).

I'm getting an error stating that "None" object has no attribute "groups" during setup of numpy, any ideas?
Michael Risser, Nicholas Hendley
Oleg Broytmann

It seems you are trying to do the following - match or search for
regular expression:

match_object = re.search(pattern, string)
groups = match_object.groups()

   But your regexp didn't match anything in the string, so match_object
here is really None. Test it before doing anything with it:

match_object = re.search(pattern, string)
if match_object:
   groups = match_object.groups()
   print "No match!"

How do you access the printer from Python under Linux???
Dave Berry, Donovan Baarda

This question is very open. There are three main levels at which you 
can access the printer under linux; user, device, and IO port.

The usual user level access is to print documents using the 'lpr' 
command. This will spool documents for printing in /var/spool/lpr to be 
printed in order when the printer is available, and allows multiple 
users to print documents without causing conflicts. This requires that 
the Linux box has lpd configured and running. Typicaly lpd is 
configured to use something like magic-filter to automaticly convert 
different types of documents into the format understood by the printer. 
This means most standard file types (postscript, png, text, etc) can be 
printed directly. (Note that there are alternatives to lpd, such as 
cups, that perform basicly the same thing). From Python, things can be 
printed as follows;

  import os

  os.system('lpr %s' % filename)


  p.write('printing test text string\n')

Device level access involves directly opening the linux device file and 
writing to it. This requires that the user has write level access to 
the device, and does not allow shared access to the printer. This is 
not normaly what you want to do unless you are writing something like 
your own lpd replacement. Guru's might be able to do some ioctl magic 
on the device to do things like get the printer status, but otherwise 
this is pretty simple;

  p.write('printing text test string\n')

IO port level access is the lowest level. You do not want to do this 
unless you really want to do wierd things with your printer port. An 
example of this might be plugging in some strange home-built hardware. 
There are a few ways to do this, but the easiest is using the 
linux '/dev/port' device that allows direct access to IO ports. The 
user must have write access to this device. WARNING!!! making a mistake 
when accessing '/dev/port' can seriously stuff up your system! You must 
know exactly what you are doing when you use this device. I probably 
shouldn't be writing this, because if you know enough to try this, you 
probably already know how :-)

    def GetChar(address):
        return ord(IOports.read(1))
    def PutChar(address,c):

    class lpt:
        def __init__(self,port=0x378):
        def Put_Data(self,c):
        def Get_Data(self):
            return GetChar(self.address)
        def Put_Status(self,c):
        def Get_Status(self):
            return GetChar(self.address+1)
        def Put_Control(self,c):
        def Get_Control(self):
            return GetChar(self.address+2)

How do you keep track of elements of "ravel"ed Numeric arrays
Nathan Wallace, Fiona Czuczman, Rob Hooft
Hans Nowak, Snippet 93, Alex

Packages: maths.numeric

Hi, again.

In case anyone else needs to do something like sorting an array and
keeping track of what belongs where in the array, I've found the
following function useful.  It's not super-fast, but I guess it's doing
a lot.

import operator
from Numeric import *

def flatten_indices (array):
    index_array = indices (array.shape)
    index_list = map (ravel, tuple (index_array))
    index_list = apply (map, (None, ) + tuple (index_list))
    values = map (operator.getitem, \
                  len (index_list) * [array], \
    return map (None, values, index_list)


>>> b = reshape (arange (4), (2, 2))
>>> flatten_indices (b)
[(0, (0, 0)), (1, (0, 1)), (2, (1, 0)), (3, (1, 1))]