[FAQTS] Python Knowledge Base Update -- June 15th, 2000

Thu Jun 15 09:49:27 EDT 2000

Hi Guys,

Below are the entries that made it into http://python.faqts.com tonight.

Cheers,

Fiona Czuczman

## New Entries #################################################

-------------------------------------------------------------
Which linux distros have Python by default?
Is there a handy list somewhere of which Linux distributions can be expected to have Tkinter installed?
http://www.faqts.com/knowledge-base/view.phtml/aid/3774
-------------------------------------------------------------
Fiona Czuczman
Thomas Weholt, John W. Baxter, William Park, Michael Ströder, François Pinard, Dana Booth

Mandrake 7.0 and 7.1 has it installed, and 7.1 has PIL included too. A
very good distro.

RedHat 6.1 and 6.2 install Python, at least in the way we install it.

Slackware-7.0 has Python package in D series.

S.u.S.E. comes with packages for Python, Tkinter and other handy Python 
modules. Well, the person who installs it has to choose it in the 
install application called YaST (series d).

[Yes, but you choose Python explicitly only for simpler profiles.
Python gets installed automatically in more sophisticated profiles.
If you tune a simple profile yourself (which is what I usually do), you
merely confirm once the installation of Python, when a dependency of any
package you add to the profile.  This is more and more likely, as Python
gets more often needed in packages.  In my last SuSE installation, a few
days ago, I did not have to explictly select neither Python nor 
`pygtk'.]

The new Debian potato has also packages for Python and several modules.

-----------

It's best to choose not to put on Python at install time, and then just
retrieve the latest version from the Internet. Uncomment the lines in 
the Modules/Setup file before you build it pertaining to Tkinter, then 
you're assured of having the latest version.

Of course, you'd need to make sure that you have TCL/TK installed... But
then, you should do that yourself, too.

This way makes it easier to keep track of when you want to upgrade. By 
doing it yourself, you know exactly where the files went. When you 
upgrade after an auto install, you don't know if the distribution's 
install put the files in weird places, so that you'll have conflicting 
crap all over your drive. For instance, I installed Mandrake 6.0 once, 
and it put a ton of KDE junk in /usr/bin. What a stupid place, and what 
a clutter. With Python, if you follow the configuration file defaults 
before you make, it'll always be nice and cozy in 
/usr/local/lib/Pythonxx. Wanna upgrade? You can just move the old 
directory out of the way, and then move your homemade modules directory 
back once you've put a new version in.

-------------------------------------------------------------
I'm searching information regarding the use of pointers for linked and double linked lists.
http://www.faqts.com/knowledge-base/view.phtml/aid/3770
-------------------------------------------------------------
Fiona Czuczman
Martijn Faassen

Linked lists aren't hard:

class Node:
    def __init__(self, next):
        self.next = next

linked_list = Node(Node(Node()))

Every name in Python's a reference, so no pointers are needed. Just 
think of every name in Python as a pointer (to an object), if you like.

Doubly linked lists are along the same pattern, but have the trouble 
that they introduce circular references, which is bad for Python's 
reference counting based garbage collection scheme. You have to break 
one of the references yourself for it to work:

class Node:
    def __init__(self, prev, next):
        self.prev = prev
        self.next = next

node1 = Node(None, None)
node2 = Node(None, None)
node1.next = node2
node2.prev = node1

# and now to clean up so that refcounting works:
node2.prev = None

-------------------------------------------------------------
Where can I find info on combining C, Assembler, and Python?
http://www.faqts.com/knowledge-base/view.phtml/aid/3771
-------------------------------------------------------------
Fiona Czuczman
Martijn Faassen

Take a look here:

Extending and Embedding the Python Interpreter

http://www.python.org/doc/current/ext/ext.html

And here:

Python/C API

http://www.python.org/doc/current/api/api.html

These are part of the standard Python documentation. You'll have to do
the assembler calls yourself, from C.

## Edited Entries ##############################################

-------------------------------------------------------------
Where can I best learn how to parse out both HTML and Javascript tags to extract text from a page?
http://www.faqts.com/knowledge-base/view.phtml/aid/3680
-------------------------------------------------------------
Paul Allopenna, Matthew Schinckel
Python Documentation

If you want to (quickly) strip all HTML tags from a string of data, try using the 
re module:

import re

file = open(filename,'r')
data = file.read()
file.close()

text = re.sub('<.*?>', '', data))

This will also strip any javascript, but only if the page has been made 'properly' 
- that is, the javascript is within HTML comments.

If you want to know how it works, read the 're' chapter in the library reference, 
as it discusses the usefulness of 'non-greedy' regular expressions.

-------------------------------------------------------------
Is there a HTML search engine written in Python?
http://www.faqts.com/knowledge-base/view.phtml/aid/3105
-------------------------------------------------------------
Fiona Czuczman, Matthew Schinckel
Dale Strickland-Clark, Michal Wallace, Robert Roy, JRHoldem

If you're running it on NT, there's a free search engine you can tap 
into that comes as part of the NT 4.0 option pack.

Otherwise:

Check out http://ransacker.sourceforge.net/ .. There's an Index
class that lets you index arbitrary chunks of text.. But you'll have
to write the program that actually reads the HTML files (and strips
the HTML tags, if that's what you mean by "text content")... 

It also does a ranked searches, but you'll have to wrap that, too, if
you want the output to show up on the web.

A full featured full text indexing solution is not trivial. It all
depends on what kind of queries you want to perform. If all you want
to do are queries such as "find all files which contain the word
'dog'" that can be done quite easily, probably under 200  lines of
code for a trivial solution using sgmllib and gdbm. However if you
want to do phrase searching or stem searching or wild-card searching,
then it gets really complicated in a hurry. 

Another factor is how many files you are dealing with. Indices often
run 4-8X the size of the indexed files. And do you want to dynamically
update the index or are you happy just re-indexing the whole works
periodically. A static index is somewhat easier to build than a fully
dynamic one.

An interesting GPL'd indexing package is SWISH++
see:
http://www.best.com/~pjl/software/swish/

A good tactic might be to use this for your indexing, and running the
search engine as a daemon, building a python interface to talk to it
via Unix domain sockets or alternately shelling out and capturing and
parsing the return values.

You also might want to try using Index Server/ASP combo before going to 
any third party solution...full text searching is no trivial matter and 
chances are it'll give you all the tinkering options you could want.

Additionally:

There is a really simple search engine (single word, really only works with small 
sites), available:

<http://www.chariot.net.au/~jaq/matt/search.tar.gz>

(or look on Parnassus if it's moved :-)

-------------------------------------------------------------
Can I combine a "select" call on some of my file objects with the Tkinter event loop?
http://www.faqts.com/knowledge-base/view.phtml/aid/3728
-------------------------------------------------------------
Rob Hooft, Fiona Czuczman
Grant Edwards, Russell E. Owen

Perhaps one can use file events instead of select. Here is a recent 
exchange that may be relevant -- my initial posted question followed by 
Grant Edwards' detailed and helpful reply. (Note: his snippet has some 
unix-specific bits, but one can ignore those). Also, something not 
stated in the example: the function by the file event handlers receives 
two (or possibly three) arguments:
- the socket
- the flags
- perhaps an optional user-defined third argument (this is supported by 
vanilla Tk, but I don't know if it's supported by Tkinter).

>David Beazley's excellent "Python Essential Reference" says in the 
>section on threads: "In addition, many of Python's most popular 
>extensions such as Tkinter may not work properly in a threaded 
>environment."
>
>I assume it's true, but it was quite a bombshell. I was hoping to write 
>a networked GUI client, hence:
>- read data from a socket and fill in a GUI display
>- accept input from the user and write data to the socket
>I assumed I'd use two threads, one for input, one for output. Now I 
have 
>no idea what to do. Any suggestions?

In Tk, you can assign read handlers to file objects.  Anytime
there is data available to be read, the handler will be called.

Just open the socket connection and assign a read-handler to
it.  Piece-o-cake.  Here's an excerpt from a program that uses
that technique to handle data from a popen2'd child process:

------------------------------------------------------------

if cmd is None:
    exceptString = 'no executable specified'
    raise exceptString, cmd

self.__returnCode = -1
self.__child = popen2.Popen3(cmd)
self.__fd = self.__child.fromchild.fileno()

fcntl.fcntl(self.__fd, FCNTL.F_SETFD, FCNTL.O_NDELAY);

Tkinter.tkinter.createfilehandler(self.__child.fromchild,
                                  Tkinter.tkinter.READABLE,
                                  self.__stdoutHandler)

------------------------------------------------------------

self.__child.fromchild      is a file object connected to the "read" end 
of a pipe.
Tkinter.tkinter.READABLE    is a constant that tells Tk what you care 
about.
self.__stdoutHandler        is a function to call when the file object 
has data available.