It's ...

Wed Jun 24 16:40:29 EDT 2009

On Wed, 2009-06-24 at 20:53 +0100, Angus Rodgers wrote:
> ... my first Python program!  So please be gentle (no fifty ton
> weights on the head!), but tell me if it's properly "Pythonic",
> or if it's a dead parrot (and if the latter, how to revive it).
> 

Yay.  Welcome to Python.

> I'm working from Beazley's /Python: Essential Reference/ (2nd
> ed. 2001), so my first newbie question is how best to find out
> what's changed from version 2.1 to version 2.5. (I've recently
> installed 2.5.4 on my creaky old Win98SE system.) I expect to 
> be buying the 4th edition when it comes out, which will be soon,
> but before then, is there a quick online way to find this out?
> 

Check here: http://docs.python.org/whatsnew/index.html

It's not designed to be newbie friendly, but it's in there.

> Having only got up to page 84 - where we can actually start to
> read stuff from the hard disk - I'm emboldened to try to learn
> to do something useful, such as removing all those annoying hard
> tab characters from my many old text files (before I cottoned on
> to using soft tabs in my text editor).
> 
> This sort of thing seems to work, in the interpreter (for an 
> ASCII text file, named 'h071.txt', in the current directory):
> 
> stop = 3   # Tab stops every 3 characters
> from types import StringType   # Is this awkwardness necessary?

Not anymore.  You can just use str for this.

> detab = lambda s : StringType.expandtabs(s, stop)  # Or use def

First, use def.  lambda is a rarity for use when you'd rather not assign
your function to a variable.  

Second, expandtabs is a method on string objects.  s is a string object,
so you can just use s.expandtabs(stop)

Third, I'd recommend passing your tabstops into detab with a default
argument, rather than defining it irrevocably in a global variable
(which is brittle and ugly)

def detab(s, stop=3):
    #do stuff

Then you can do

    three_space_version = detab(s)
    eight_space_version = detab(s, 8)

> f = open('h071.txt')   # Do some stuff to f, perhaps, and then:
> f.seek(0)

f is not opened for writing, so if you do stuff to the contents of f,
you'll have to put the new version in a different variable, so f.seek(0)
doesn't help.  If you don't do stuff to it, then you're at the beginning
of the file anyway, so either way, you shouldn't need to f.seek(0).

> print ''.join(map(detab, f.xreadlines()))

Sometime in the history of python, files became iterable, which means
you can do the following:

for line in f:
    print detab(line)

Much prettier than running through join/map shenanigans.  This is also
the place to modify the output before passing it to detab:

for line in f:
    # do stuff to line
    print detab(line)

Also note that you can iterate over a file several times:

f = open('foo.txt')
for line in f:
    print line[0]  # prints the first character of every line
for line in f:
    print line[1]  #prints the second character of every line
> f.close()
> 

> Obviously, to turn this into a generally useful program, I need
> to learn to write to a new file, and how to parcel up the Python
> code, and write a script to apply the "detab" function to all the
> files found by searching a Windows directory, and replace the old
> files with the new ones; but, for the guts of the program, is this
> a reasonable way to write the code to strip tabs from a text file?
> 
> For writing the output file, this seems to work in the interpreter:
> 
> g = open('temp.txt', 'w')
> g.writelines(map(detab, f.xreadlines()))
> g.close()
> 

Doesn't help, as map returns a list.  You can use itertools.imap, or you
can use a for loop, as above.

> In practice, does this avoid creating the whole string in memory
> at one time, as is done by using ''.join()? (I'll have to read up
> on "opaque sequence objects", which have only been mentioned once
> or twice in passing - another instance perhaps being an xrange()?)
> Not that that matters much in practice (in this simple case), but
> it seems elegant to avoid creating the whole output file at once.

The terms to look for, rather than opaque sequence objects are
"iterators" and "generators".

> 
> OK, I'm just getting my feet wet, and I'll try not to ask too many
> silly questions!
> 
> First impressions are: (1) Python seems both elegant and practical;
> and (2) Beazley seems a pleasantly unfussy introduction for someone 
> with at least a little programming experience in other languages.
> 

Glad you're enjoying Beazley.  I would look for something more
up-to-date.  Python's come a long way since 2.1.  I'd hate for you to
miss out on all the iterators, booleans, codecs, subprocess, yield,
unified int/longs, decorators, decimals, sets, context managers and
new-style classes that have come since then.

> -- 
> Angus Rodgers

Cheers,
Cliff