how to extract columns like awk $1 $5

Roy Smith roy at panix.com
Sat Jan 8 00:19:08 EST 2005


Dan Valentine <nobody at invalid.domain> wrote:

> On Fri, 07 Jan 2005 12:15:48 -0500, Anand S Bisen wrote:
> 
> > Is there a simple way to extract words speerated by a space in python 
> > the way i do it in awk '{print $4 $5}' . I am sure there should be some 
> > but i dont know it.
> 
> i guess it depends on how faithfully you want to reproduce awk's behavior
> and options.
> 
> as several people have mentioned, strings have the split() method for 
> simple tokenization, but blindly indexing into the resulting sequence 
> can give you an out-of-range exception.  out of range indexes are no
> problem for awk; it would just return an empty string without complaint.

It's pretty easy to create a list type which has awk-ish behavior:

class awkList (list):
    def __getitem__ (self, key):
        try:
            return list.__getitem__ (self, key)
        except IndexError:
            return ""

l = awkList ("foo bar baz".split())
print "l[0] = ", repr (l[0])
print "l[5] = ", repr (l[5])

-----------

Roy-Smiths-Computer:play$ ./awk.py
l[0] =  'foo'
l[5] =  ''

Hmmm.  There's something going on here I don't understand.  The ref 
manual (3.3.5 Emulating container types) says for __getitem__(), "Note: 
for loops expect that an IndexError will be raised for illegal indexes 
to allow proper detection of the end of the sequence."  I expected my 
little demo class to therefore break for loops, but they seem to work 
fine:

>>> import awk
>>> l = awk.awkList ("foo bar baz".split())
>>> l
['foo', 'bar', 'baz']
>>> for i in l:
...     print i
... 
foo
bar
baz
>>> l[5]
''

Given that I've caught the IndexError, I'm not sure how that's working.



More information about the Python-list mailing list