parsing csv files class

Tim Roberts timr at probo.com
Sun Dec 28 06:46:29 CET 2008


"alex goretoy" <aleksandr.goretoy at gmail.com> wrote:
>
>class parsercsvy(object):
>    """Return a line from a csv file or total amount of lines"""
>    def __init__(self,file_name=""):
>        self.func_me_color="white_on_black"
>        self.soc=stdout_colours.stdout_colors()
>        self.soc.me_him(['ENTER:',__name__],self.func_me_color)
>        self.filename = file_name
>        self.buffer = []
>        self.bufferp= []
>        if string.find(self.filename,"http") != -1:
>            resp=urllib2.urlopen(self.filename)
>            file=resp.read()
>            lfi=len(string.split(self.filename,"/"))
>            filename = "/tmp/"+string.split(self.filename,"/")[lfi-1]

Style issue:  unless you are running Python 1.x, you virtually never need
to import the "string" module.  Also, you can always refer to the last
element of a list or tuple by using [-1]:

            parts = self.filename.split( "/" )
            filename = "/tmp/" + parts[-1]    


>    def parse(self,filename,ret=0):
>        self.soc.me_him(['ENTER:',__name__],self.func_me_color)
>        i = 0
>        try:
>            reader = csv.reader(file(filename, "rb"))
>            try:
>                for row in reader:
>                    self.buffer.append(row)
>                    s,a=[],{}
>
>                    for j in range(len(self.buffer[0])):
>                        a[self.buffer[0][j]]=row[j]
>                    self.bufferp.append(a)
>                    i+=1
>                self.total = i-1

You might consider keeping the header line separate.

        reader = csv.reader(open(filename, "rb"))
        header = reader.next()
        self.buffer = list(reader)
        self.bufferp = [ dict( zip( header, line ) ) for line in reader ]
        self.header = header

Also, you don't really need a separate "total" variable, since it's equal
to len(self.buffer).

>    def total(self):
>        """return total number of lines in csv file"""
>        self.soc.me_him(['ENTER:',__name__],self.func_me_color)
>        self.soc.me_him(['RETURN:',self.total,__name__],self.func_me_color)
>        return self.total

There's a problem here, as this was originally written.  "self.total"
starts out being a function (this one here).  But after self.parse runs,
"self.total" will be an integer, and this function is lost.  You need to
decide whether you want users to just access the self.total integer, or
force them to use the function.  In the latter case, you can change the
counter to self._total.

On the other hand, the self.total counter is unnecessary:
    def total(self):
        return len(self.buffer)

>    def find_and_replace(self,li,fi,re):
>        """
>        find and replace a string inside a string, return list
>        find_and_replace(list,find,replace)
>        """
>        this=[]
>        for l in li:
>#            found_index=string.find(l,fi)
>            this.append(l.replace(fi,re))
>        return this

    def find_and_replace(self,li,fi,re):
        return [l.replace(fi,re) for l in li]

I'm not sure why this is a member of the class; it doesn't use any of the
members.
-- 
Tim Roberts, timr at probo.com
Providenza & Boekelheide, Inc.



More information about the Python-list mailing list