[Tutor] Reading from files problem

Scott SA pydev at rscorp.ab.ca
Mon Apr 20 11:03:09 CEST 2009


On Apr 20, 2009, at 12:59 AM, Alan Gauld wrote:

> You might want to store the data in a dictionary keyed by ID number?

I had thought of suggesting this, but it appeared that the OP was  
going to re-iterate the file each time he wished to query the CSV.

May have been a bad assumption on my part as I envisioned pickling a  
dict. and that just got too complicated.

> test = [float(n) for n in lines[11:14]]
> hwgrades = sum(hw)

The composite of this would be:
	sum([float(n) for n in lines[11:14]])

... which, I agree, is easier on the eyes/brain than the  
reduce(lambda:...) example I gave.

sum is also on <http://docs.python.org/library/functions.html> along  
with with range and other built-ins.

Chris: wrapping the for-loop in square-brackets is part of list  
comprehension, found here (part 5.1.4)
	<http://docs.python.org/tutorial/datastructures.html>

> Thats all fine for reading one stiudent, but you overwrite the data  
> each time through the loop! This also looks like an obvious use for  
> a class so I'd create a Student class to hold all the data
> (You could create methods to do the totals/averages too, plus add a  
> __str__  method to print the student data in the format required-  
> I'll leave that as an excercise for the reader!))

This part is actually the reason I've replied, everything before this  
was 'just along the way'.

Classes are a big subject for starting out, here are the main docs.
	<http://docs.python.org/tutorial/classes.html>

Also, check out 'dive into python' and others for help in getting a  
handle on that.

I figured that the Student class proposed probably needed an example  
to get over the initial hurdle.

	class Student(object):
		def __init__(self)
			pass

In its most basic form, this is pretty much the 'hello world' for  
classes.

> So I'd change the structure to be like this(pseudo code)
>
> students = dict()  # empty dict
> for line in gradesfile:
>   line = line.split(',')
>   s = Student()

This step creates an instance of the class. Just for the moment, think  
of it as a fancy variable -- how python will store and reference the  
live data. In the end, you would need a class-instance for each and  
every student (line of the file).

>   s.id = line[0]

And this adds an 'id' attribute to the class

Pre-defined in the class, this would look like:

	class Student(object):
		def __init__(self)
			self.id = None

When the instance is created, the id has None as its value (or  
anything you wanted). The "self" reference means the instance of the  
class itself, more on that in a moment.

Still accessed the same as above:
	s.id = n

>   s.lastname = line[1]
>   etc....
>   s.hwtotal = sum(hw)
>   etc....
>   students[s.id] = s

As mentioned, id, lastname, hwtotal, etc. become attributes of the  
class. Nothing terribly magical, they are actually stored in a  
dictionary (i.e. s.__dict__) and long-hand access would be:  
s.__dict__['id']

So, the next step to this would be to actually use the class to do the  
heavy lifting. This is what Alan is talking about a bit further down.

	class Student(object):
		def __init__(self, csv_data):
			csv_list = csv_data.split(',')

			self.id = csv_list[0]
			...
			self. hwgrades = self._listFloats(csv_list[4:10])

		def _list_floats(self, str_list):
			return [float(n) for n in str_list]

		def hw_grade_total(self):
			sum(self.hwgrades)

The two methods are part of the business-logic of the class - notice  
internally they are accessed by 'self'. This is very important, so  
python knows what data to work with.

Assuming you're not using the CSV library or already have the row/line  
from the file as a list:
	
	for student_data in grades_file:
		s = Student(student_data)
		student_dict[s.id] = s

So, when python creates the class instance, it calls the __init__  
method. Since you've passed it a list of student data, it processes it  
at the same time. In this example, it will cause an error if you don't  
pass any data, by the way. You might need to consider verifying that  
each line has the correct number of fields otherwise an error could be  
generated.

Accessing the grade total method is like this:
	
	grade_total = s.hw_grade_total()

Or, if you want the raw list of floats:
	
	grade_list = s.hwgrades

I still contend that all of this is ideal for a database, like SQLite,  
which would allow for searching by name as well as ID, etc. It is the  
persistence of data that motivates this perspective. So what I would  
do is make a class Students, with a file-import method using the CSV  
lib which then did the work of putting all the data into a database,  
bypassing a Student class (until there was a valid reason for one).   
Once the data is loaded in, it can be referenced without re- 
interpreting the CSV file, again through methods in a Students class.

I hope this helps,

Scott

PS. My email is acting up, did my prev. message actually make it to  
the list?




More information about the Tutor mailing list