Efficiency of UserString
Oliver Hofmann
a2619725 at uni-koeln.de
Sun Mar 18 12:17:04 EST 2001
This buffer is for notes you don't want to save, and for Lisp evaluation.
If you want to create a file, visit that file with C-x C-f,
then enter the text in that file's own buffer.
'lo everyone!
Got a question regarding UserString's efficiency. I am parsing text
and would like to store a word's position (sentence, absolute position)
somewhere; this would allow turning the document into a list of
words, then removing stopwords or stemming the word and still having
the information about it's original position in the document.
Figured instead of creating several lists to store and update that info
I could put it into the string itself by using UserString; however
the Python library reference states:
"It should be noted that these classes are highly inefficient compared
to real string or Unicode objects; this is especially the case for
MutableString."
I've tried two basic operations below but couldn't spot much of a
difference. Could someone please point me at an error I've made or
tell me which operations are slower with UserString?
----
import UserString
import profile, pstats
text = """ The compound ppp(A2p)3A3[32P]pCp is a commercially available
radioactive analogue of the 2,5 oligoadenylate series ppp(A2p)nA, n
greater than or equal to 2, commonly referred to as 2-5A. It is used
as a probe for measuring concentrations in competition radiobinding and
radioimmune assays. We have found that incubation of the probe with
extracts from HeLa, CV1, or neuroblastoma cells results in its covalent
attachment to two size classes of RNA: the first includes a major species
with a molecular weight of approximately 350,000, the second is much
smaller (40 +/- 5 nucleotides in length) and could represent tRNA
half-molecules. Ligation is to the 3 end of the probe molecule with
formation of a 3,5-phosphodiester bond. Thus, probe ligation provides a
sensitive and convenient assay for the detection not only of RNA
ligase(s) but also of ligatable RNAs (such as the putative tRNA
half-molecules) in mammalian cell extracts. """
text = ' '.join(text.split())
text2 = UserString.UserString(text)
def fun1():
global text
for a in range(0, 1000):
text = ' '.join(text.split())
def fun2():
global text
for a in range(0, 1000):
text.find('neuro')
def fun3():
global text2
for a in range(0, 1000):
text2 = ' '.join(text2.split())
def fun4():
global text2
for a in range(0, 1000):
text2.find('neuro')
def main():
for a in range(0, 100):
fun1()
fun2()
fun3()
fun4()
profile.run('main()', 'profile.tmp')
p = pstats.Stats('profile.tmp')
p.sort_stats('cumulative').print_stats(10)
----
Here are the stats:
404 function calls in 80.010 CPU seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 80.010 80.010 <string>:1(?)
1 0.000 0.000 80.010 80.010 profile:0(main())
1 0.000 0.000 80.010 80.010 test3.py:48(main)
100 39.050 0.390 39.050 0.390 test3.py:24(fun1)
100 39.040 0.390 39.040 0.390 test3.py:36(fun3)
100 0.960 0.010 0.960 0.010 test3.py:42(fun4)
100 0.960 0.010 0.960 0.010 test3.py:30(fun2)
Many thanks once again!
Oliver
--
Oliver Hofmann - University of Cologne - Department of Biochemistry
o.hofmann at smail.uni-koeln.de - setar at gmx.de - connla at thewell.com
If you care, you just get disappointed all the time. If you don't care
nothing matters so you are never upset. -- Calvin
More information about the Python-list
mailing list