[Tutor] Reading information from a text file into a list of lists (WinXP/py2.6.2)

Dave Angel davea at ieee.org
Mon Oct 26 03:38:24 CET 2009

Katt wrote:
> <div class="moz-text-flowed" style="font-family: -moz-fixed">Hello all,
> Currently I am working on a program that reads text from a text file.  
> I would like it to place the information int a list and inside the 
> information would have sublists of information.
> The text file looks like this:
> "Old Test","2009_10_20"
> "Current Test","2009_10_25"
> "Future Test","2009_11_01"
> I am trying to get the list of lists to look like the following after 
> the information is read from the text file:
> important_dates = [["Old Test","2009_10_20"],["Current 
> Test",2009_10_25"],["Future Test","2009_11_01"]]
> What I currently have is:
> def read_important_dates():
>    print "\nReading text file into program: important.txt"
>    text_file = open("important.txt", "r")
>    dates = text_file.readlines()
>    text_file.close()
>    print dates #
> read_important_dates()
> When it gets to print dates I see the following:
> [ ' "Old Test","2009_10_20"\n',  ' "Current Test",2009_10_25"\n',  ' 
> "Future Test","2009_11_01" ' ]
> Does it look this way because I am using the "print"?  Or do I still 
> need to add the inner [ ] and strip out the \n and '?
> If I add brackets to the text file I am reading from my output looks 
> like this:
> <snip>
We have to be clear about this.  You want an actual list containing 
sublists.  You don't just want something which seems to print out with 
brackets to mimic that?  Probably if you just showed us the whole 
assignment question, it'd be easier

One thing at a time.  Please don't try adding brackets to the data 
file.  You're not trying to create brackets, you're trying to create 
list of list.  The brackets are just the way Python shows you what 
you've got when you print out a list directly.   Similarly, the 
single-quote you see in that print string are just a marker showing that 
you've got strings.  It's what's between the quotes that is the contents 
of the string.

First, you need to get confidence in the relationship between what's 
really in variables, and what gets printed out when you print in various 
ways.  Then, once you're getting sure of your ability to recognize 
these, you can actually debug why you're not getting what you want.

If you have  
mystring = "This is a string"
print mystring
print repr(mystring)

what you get is:
This is a string
'This is a string'

So you can see the string has no quotes within it, but you might see 
quotes around it when you ask python to display it indirectly.  Now 
let's put a newline into the middle of the string:

mystring = "This is line 1\nAnd line 2"
print mystring
print repr(mystring)

The output is:
This is line 1
And line 2
'This is line 1\nAnd line 2'

So we see there's a control character in the middle of the string, which 
causes our console to switch to the next line (it's called a newline).  
But the repr() form shows us the quotes and the slash and the n.   
There's not really a slash or an n at that point in the string, but 
repr() uses that form to show us exactly what's happening there.  The 
goal of repr() is to approximate what you might have to do in source 
code to produce the same data.

When we print a list, as a single item, we'll see brackets around the 
whole thing, and commas between the items.  There are no brackets 
anywhere in the list, that's just python's way of repr() the list.  And 
when an item in the list is a string, it'll have quotes around it, that 
are not part of the string.  And any control characters are represented 
by the slash-terminology.  It's just like with repr() of a single string.

It can be useful to loop through the list printing individual items of 
the list.  It can also be useful to use len() on the list to see how 
many items are in it.
     for item in mylist:
         print item
     print len(mylist)

Now let's look at the output you showed:

[ ' "Old Test","2009_10_20"\n',  ' "Current Test",2009_10_25"\n',  ' 
"Future Test","2009_11_01" ' ]

We could also do print len(dates) and see a 3.  This is what readlines() 
produces.  It reads the raw data from the file, and produces a list, one 
string per line.  It uses newline to decide where each line ends, and 
keeps the newlines with the data.  It happens you didn't have one at the 
end of the file, so we only see two of them.

readlines() doesn't know how to do anything more with the data in each 
line.  It only understands one level of parsing.  So you have to write 
some form of loop, which visits each item of the list, creating the 
sublist you said your teacher wanted.

The logical way to divide up each line would be to use the split() 
method of string.  If you tell it to split by commas, it would work on 
the particular file you've got.  You have to watch a bit, since somebody 
may slip in a comma inside one of those quoted items in the data file, 
and expect you *not* to split based on that one.  Again, it'd be good to 
have a clear problem statement, containing the expectations of the input 
file, among other things.

Now you have the additional problem of getting rid of the newlines and 
the quotes in the input data.  The desired result you showed has no 
quotes, no commas, and no newlines anywhere in the list of lists.

strip() is the usual way of getting rid of leading and trailing 
delimiters, after readlines, or after split.

So write some loops, and try to divvy up the data, and we can guide you 
as you learn.  As always, when you want help with an error, use copy and 
paste to show us the entire thing.

One more question.  Are you permitted to use some of the fancier modules 
of the python library, such as csv?  If I were your teacher, I'd 
explicitly prevent that type of solution, as you need to know how to do 
things "by hand" before learning some of the more complex tools.


More information about the Tutor mailing list