[FAQTS] Python Knowledge Base Update -- June 15th, 2000
Fiona Czuczman
fiona at sitegnome.com
Thu Jun 15 09:49:27 EDT 2000
Hi Guys,
Below are the entries that made it into http://python.faqts.com tonight.
Cheers,
Fiona Czuczman
## New Entries #################################################
-------------------------------------------------------------
Which linux distros have Python by default?
Is there a handy list somewhere of which Linux distributions can be expected to have Tkinter installed?
http://www.faqts.com/knowledge-base/view.phtml/aid/3774
-------------------------------------------------------------
Fiona Czuczman
Thomas Weholt, John W. Baxter, William Park, Michael Ströder, François Pinard, Dana Booth
Mandrake 7.0 and 7.1 has it installed, and 7.1 has PIL included too. A
very good distro.
RedHat 6.1 and 6.2 install Python, at least in the way we install it.
Slackware-7.0 has Python package in D series.
S.u.S.E. comes with packages for Python, Tkinter and other handy Python
modules. Well, the person who installs it has to choose it in the
install application called YaST (series d).
[Yes, but you choose Python explicitly only for simpler profiles.
Python gets installed automatically in more sophisticated profiles.
If you tune a simple profile yourself (which is what I usually do), you
merely confirm once the installation of Python, when a dependency of any
package you add to the profile. This is more and more likely, as Python
gets more often needed in packages. In my last SuSE installation, a few
days ago, I did not have to explictly select neither Python nor
`pygtk'.]
The new Debian potato has also packages for Python and several modules.
-----------
It's best to choose not to put on Python at install time, and then just
retrieve the latest version from the Internet. Uncomment the lines in
the Modules/Setup file before you build it pertaining to Tkinter, then
you're assured of having the latest version.
Of course, you'd need to make sure that you have TCL/TK installed... But
then, you should do that yourself, too.
This way makes it easier to keep track of when you want to upgrade. By
doing it yourself, you know exactly where the files went. When you
upgrade after an auto install, you don't know if the distribution's
install put the files in weird places, so that you'll have conflicting
crap all over your drive. For instance, I installed Mandrake 6.0 once,
and it put a ton of KDE junk in /usr/bin. What a stupid place, and what
a clutter. With Python, if you follow the configuration file defaults
before you make, it'll always be nice and cozy in
/usr/local/lib/Pythonxx. Wanna upgrade? You can just move the old
directory out of the way, and then move your homemade modules directory
back once you've put a new version in.
-------------------------------------------------------------
I'm searching information regarding the use of pointers for linked and double linked lists.
http://www.faqts.com/knowledge-base/view.phtml/aid/3770
-------------------------------------------------------------
Fiona Czuczman
Martijn Faassen
Linked lists aren't hard:
class Node:
def __init__(self, next):
self.next = next
linked_list = Node(Node(Node()))
Every name in Python's a reference, so no pointers are needed. Just
think of every name in Python as a pointer (to an object), if you like.
Doubly linked lists are along the same pattern, but have the trouble
that they introduce circular references, which is bad for Python's
reference counting based garbage collection scheme. You have to break
one of the references yourself for it to work:
class Node:
def __init__(self, prev, next):
self.prev = prev
self.next = next
node1 = Node(None, None)
node2 = Node(None, None)
node1.next = node2
node2.prev = node1
# and now to clean up so that refcounting works:
node2.prev = None
-------------------------------------------------------------
Where can I find info on combining C, Assembler, and Python?
http://www.faqts.com/knowledge-base/view.phtml/aid/3771
-------------------------------------------------------------
Fiona Czuczman
Martijn Faassen
Take a look here:
Extending and Embedding the Python Interpreter
http://www.python.org/doc/current/ext/ext.html
And here:
Python/C API
http://www.python.org/doc/current/api/api.html
These are part of the standard Python documentation. You'll have to do
the assembler calls yourself, from C.
## Edited Entries ##############################################
-------------------------------------------------------------
Where can I best learn how to parse out both HTML and Javascript tags to extract text from a page?
http://www.faqts.com/knowledge-base/view.phtml/aid/3680
-------------------------------------------------------------
Paul Allopenna, Matthew Schinckel
Python Documentation
If you want to (quickly) strip all HTML tags from a string of data, try using the
re module:
import re
file = open(filename,'r')
data = file.read()
file.close()
text = re.sub('<.*?>', '', data))
This will also strip any javascript, but only if the page has been made 'properly'
- that is, the javascript is within HTML comments.
If you want to know how it works, read the 're' chapter in the library reference,
as it discusses the usefulness of 'non-greedy' regular expressions.
-------------------------------------------------------------
Is there a HTML search engine written in Python?
http://www.faqts.com/knowledge-base/view.phtml/aid/3105
-------------------------------------------------------------
Fiona Czuczman, Matthew Schinckel
Dale Strickland-Clark, Michal Wallace, Robert Roy, JRHoldem
If you're running it on NT, there's a free search engine you can tap
into that comes as part of the NT 4.0 option pack.
Otherwise:
Check out http://ransacker.sourceforge.net/ .. There's an Index
class that lets you index arbitrary chunks of text.. But you'll have
to write the program that actually reads the HTML files (and strips
the HTML tags, if that's what you mean by "text content")...
It also does a ranked searches, but you'll have to wrap that, too, if
you want the output to show up on the web.
A full featured full text indexing solution is not trivial. It all
depends on what kind of queries you want to perform. If all you want
to do are queries such as "find all files which contain the word
'dog'" that can be done quite easily, probably under 200 lines of
code for a trivial solution using sgmllib and gdbm. However if you
want to do phrase searching or stem searching or wild-card searching,
then it gets really complicated in a hurry.
Another factor is how many files you are dealing with. Indices often
run 4-8X the size of the indexed files. And do you want to dynamically
update the index or are you happy just re-indexing the whole works
periodically. A static index is somewhat easier to build than a fully
dynamic one.
An interesting GPL'd indexing package is SWISH++
see:
http://www.best.com/~pjl/software/swish/
A good tactic might be to use this for your indexing, and running the
search engine as a daemon, building a python interface to talk to it
via Unix domain sockets or alternately shelling out and capturing and
parsing the return values.
You also might want to try using Index Server/ASP combo before going to
any third party solution...full text searching is no trivial matter and
chances are it'll give you all the tinkering options you could want.
Additionally:
There is a really simple search engine (single word, really only works with small
sites), available:
<http://www.chariot.net.au/~jaq/matt/search.tar.gz>
(or look on Parnassus if it's moved :-)
-------------------------------------------------------------
Can I combine a "select" call on some of my file objects with the Tkinter event loop?
http://www.faqts.com/knowledge-base/view.phtml/aid/3728
-------------------------------------------------------------
Rob Hooft, Fiona Czuczman
Grant Edwards, Russell E. Owen
Perhaps one can use file events instead of select. Here is a recent
exchange that may be relevant -- my initial posted question followed by
Grant Edwards' detailed and helpful reply. (Note: his snippet has some
unix-specific bits, but one can ignore those). Also, something not
stated in the example: the function by the file event handlers receives
two (or possibly three) arguments:
- the socket
- the flags
- perhaps an optional user-defined third argument (this is supported by
vanilla Tk, but I don't know if it's supported by Tkinter).
>David Beazley's excellent "Python Essential Reference" says in the
>section on threads: "In addition, many of Python's most popular
>extensions such as Tkinter may not work properly in a threaded
>environment."
>
>I assume it's true, but it was quite a bombshell. I was hoping to write
>a networked GUI client, hence:
>- read data from a socket and fill in a GUI display
>- accept input from the user and write data to the socket
>I assumed I'd use two threads, one for input, one for output. Now I
have
>no idea what to do. Any suggestions?
In Tk, you can assign read handlers to file objects. Anytime
there is data available to be read, the handler will be called.
Just open the socket connection and assign a read-handler to
it. Piece-o-cake. Here's an excerpt from a program that uses
that technique to handle data from a popen2'd child process:
------------------------------------------------------------
if cmd is None:
exceptString = 'no executable specified'
raise exceptString, cmd
self.__returnCode = -1
self.__child = popen2.Popen3(cmd)
self.__fd = self.__child.fromchild.fileno()
fcntl.fcntl(self.__fd, FCNTL.F_SETFD, FCNTL.O_NDELAY);
Tkinter.tkinter.createfilehandler(self.__child.fromchild,
Tkinter.tkinter.READABLE,
self.__stdoutHandler)
------------------------------------------------------------
self.__child.fromchild is a file object connected to the "read" end
of a pipe.
Tkinter.tkinter.READABLE is a constant that tells Tk what you care
about.
self.__stdoutHandler is a function to call when the file object
has data available.
More information about the Python-list
mailing list