[Tutor] text module

Scot W. Stevenson scot@possum.in-berlin.de
Sat, 31 Aug 2002 01:41:30 +0200


Hello Kyle, 

> started to write a module (called it the text module, any better
> ideas?) 

I would agree with J"o that "wc.py" might be a better name, even though you 
might end up explaining to non-Unix-people that it doesn't involve toilet 
paper =8). 

>     body = subj.readlines()

This is okay if you know that the file is going to be small and you machine 
is big, but for large files on small machines, it could be a problem: 
readlines loads the whole file into memory in one big gulp. xreadlines was 
created in Python 2.1 (I think) to avoid this problem, but if you have 
Python 2.2 or later, the really cool thing to do is to use iterators and 
simply create a loop such as:

for line in subj: 
    (etc)

which reads one line at a time as a string. You can get the number of 
characters in that line (with spaces) as 

len(line)

and the number of spaces as 

line.count(" ")

which, put together, should be a simpler way of calculating the number of 
characters. 

wordlist = line.split(" ")

gives you the a list of words split by spaces, and the length of that list 
is therefore the number of words in the line. 

So to figure out everything in one loop at once, you could try (in Python 
2.2 only):

========================================
def CountAll(location):

    nbr_lines = nbr_words = nbr_allchars = nbr_blackchars = 0

    subj = file(location, 'r')

    for line in subj:
        nbr_lines = nbr_lines + 1 
        nbr_words = nbr_words + len(line.split(" "))
        temp = len(line)
        nbr_allchars = nbr_allchars + temp
        nbr_blackchars = nbr_blackchars + temp - line.count(" ")

    subj.close()
    print nbr_lines, nbr_words, nbr_allchars, nbr_blackchars
=========================================
        
which, of course, is not quite what you wanted to do...but you should be 
able to adapt it to your setup quite easily. Again, this will only work 
for Python 2.2, so if you have a different version, you are going to have 
to use xreadlines and such. 

Hope this helped, 
Y, Scot

-- 
 Scot W. Stevenson wrote me on Saturday, 31. Aug 2002 in Zepernick, Germany  
       on his happy little Linux system that has been up for 1774 hours       
        and has a CPU that is falling asleep at a system load of 0.30.