any comment on my first program plz..

Eugene Kim eugene1977 at hotmail.com
Wed Jul 17 09:21:04 EDT 2002


hi..
i finally managed to parse history.xml file and feed them into postgresql
i got some questions..

i'm using
pyxml, postgresql-python.rpm

1.when i prepare queryString for 
db.query(queryString)
it's really pain to take care of "'", ","..
is there any elegant solution?

2.queryString can't contain special characters
such as + '...and more
error message
------------
 UnicodeError: ASCII encoding error: ordinal not in range(128)
------------
any solution for this? i'd like to store exactly same string in db

3. well.. i'm just upset that many documentations are outdated and it's 
really hard to find good documentation about python
I'm trying to crawl all websites that i visit in db and query through cgi..
size of db is going to be several hundred gigs..

python has any advantage over other languages for my purpose?
i'm fairly comfortable with java but not python..
can i go with java?
thx for any comment...


#!/usr/bin/python
import string
import xml.sax.handler
import _pg
user=_pg.set_defuser('postgres')
db=_pg.connect('test','localhost')


class HistoryHandler(xml.sax.handler.ContentHandler):
    def __init__(self):
        self.inItem =0
        self.mapping={}
        self.count=0
    def startElement(self, name, attributes):
        if name=="item":
            self.url=string.replace(attributes["url"], "'", '"') # needs to 
handle more special characters... 
            self.title=string.replace(attributes["title"], "'", '"') 
            self.firsttime=attributes["first_time"]
            self.lasttime=attributes["last_time"]
            self.visits=attributes["visits"]
            tmp = "\'"
            sqlselect = 'SELECT url, visits FROM history WHERE url = \'' + 
self.url + "';"
            result = db.query(sqlselect).getresult()

            if(result):
                existUrl = str(result[0][0])
                newVisits=str(result[0][1]+ int(self.visits))
                sqlclause = 'UPDATE history SET visits =' + newVisits + ' 
WHERE url = \'' + existUrl + '\''
            else:
                sqlclause = 'INSERT INTO history VALUES (' + tmp + 
self.title + tmp +","+  tmp + self.url + tmp + "," + self.firsttime + "," + 
self.lasttime + "," + self.visits + " )"

            db.query(sqlclause)


# import pprint
parser = xml.sax.make_parser(  )
handler = HistoryHandler(  )
parser.setContentHandler(handler)
parser.parse("smallhistory.xml")
# pprint.pprint(handler.mapping)
            








More information about the Python-list mailing list