parsing in python

Gandalf gandalf at
Wed Jun 9 10:34:43 CEST 2004

Peter Sprenger wrote:

> Hello,
> I hope somebody can help me with my problem. I am writing Zope python 
> scripts that will do parsing on text for dynamic webpages: I am 
> getting a text from an oracle database that contains different tags 
> that have to
> be converted to a HTML expression. E.g. "<pic#>" ( # is an integer 
> number) has to be converted to <img src="..."> where the image data 
> comes also from a database table.
> Since strings are immutable, is there an effective way to parse such 
> texts in Python? In the process of finding and converting the embedded 
> tags I also would like to make a word wrap on the generated HTML 
> output to increase the readability of the generated HTML source.
> Can I write an efficient parser in Python or should I extend Python 
> with a C routine that will do this task in O(n)? 

I do not know any search algorigthm that can do string search in O(n). 
Do you?

By the way, I'm almost sure that you do not need a fast program here. It 
seems you are developing an internet application.
The HTML pages you generate are...

1.) Downloaded by the client relatively slowly
2.) They are read by the client even more slowly

so I think that the bottleneck will be the network bandwidth. If you are 
developing a system for your intranet, the bottleneck can be the read 
spead of humans. Or are you so lucky that you do a site with millions of 
hits a day? In that case, I would suggest to create a set of web 
servers. Sometimes it is better to create a load balanced server than a 
single hard-coded, optimized server. The reasons:

1.) It is extremely easy to create a load balanced web server (I'm not 
speaking about the database server, it can be a single computer)
2.) If you do load balancing, then you will have redundancy. When your 
server blows up you still have other servers alive
3.) You can develop your system in a higher level language. When there 
is a need to improve performance, you can add new servers anytime. More 
scaleable, and of course when your site is so familiar it will not be a 
problem to buy and add a new server....

These were my thoughs; you can of course create and optimized C code 
just for fun. ;-)



More information about the Python-list mailing list