[Tutor] Reading information from a text file into a list of lists (WinXP/py2.6.2)
Dave Angel
davea at ieee.org
Mon Oct 26 03:38:24 CET 2009
Katt wrote:
> <div class="moz-text-flowed" style="font-family: -moz-fixed">Hello all,
>
> Currently I am working on a program that reads text from a text file.
> I would like it to place the information int a list and inside the
> information would have sublists of information.
>
> The text file looks like this:
>
> "Old Test","2009_10_20"
> "Current Test","2009_10_25"
> "Future Test","2009_11_01"
>
> I am trying to get the list of lists to look like the following after
> the information is read from the text file:
>
> important_dates = [["Old Test","2009_10_20"],["Current
> Test",2009_10_25"],["Future Test","2009_11_01"]]
>
> What I currently have is:
>
> def read_important_dates():
> print "\nReading text file into program: important.txt"
> text_file = open("important.txt", "r")
> dates = text_file.readlines()
> text_file.close()
> print dates #
> read_important_dates()
>
> When it gets to print dates I see the following:
>
> [ ' "Old Test","2009_10_20"\n', ' "Current Test",2009_10_25"\n', '
> "Future Test","2009_11_01" ' ]
>
> Does it look this way because I am using the "print"? Or do I still
> need to add the inner [ ] and strip out the \n and '?
> If I add brackets to the text file I am reading from my output looks
> like this:
> <snip>
>
We have to be clear about this. You want an actual list containing
sublists. You don't just want something which seems to print out with
brackets to mimic that? Probably if you just showed us the whole
assignment question, it'd be easier
One thing at a time. Please don't try adding brackets to the data
file. You're not trying to create brackets, you're trying to create
list of list. The brackets are just the way Python shows you what
you've got when you print out a list directly. Similarly, the
single-quote you see in that print string are just a marker showing that
you've got strings. It's what's between the quotes that is the contents
of the string.
First, you need to get confidence in the relationship between what's
really in variables, and what gets printed out when you print in various
ways. Then, once you're getting sure of your ability to recognize
these, you can actually debug why you're not getting what you want.
If you have
mystring = "This is a string"
print mystring
print repr(mystring)
what you get is:
This is a string
'This is a string'
So you can see the string has no quotes within it, but you might see
quotes around it when you ask python to display it indirectly. Now
let's put a newline into the middle of the string:
mystring = "This is line 1\nAnd line 2"
print mystring
print repr(mystring)
The output is:
This is line 1
And line 2
'This is line 1\nAnd line 2'
So we see there's a control character in the middle of the string, which
causes our console to switch to the next line (it's called a newline).
But the repr() form shows us the quotes and the slash and the n.
There's not really a slash or an n at that point in the string, but
repr() uses that form to show us exactly what's happening there. The
goal of repr() is to approximate what you might have to do in source
code to produce the same data.
When we print a list, as a single item, we'll see brackets around the
whole thing, and commas between the items. There are no brackets
anywhere in the list, that's just python's way of repr() the list. And
when an item in the list is a string, it'll have quotes around it, that
are not part of the string. And any control characters are represented
by the slash-terminology. It's just like with repr() of a single string.
It can be useful to loop through the list printing individual items of
the list. It can also be useful to use len() on the list to see how
many items are in it.
for item in mylist:
print item
print len(mylist)
Now let's look at the output you showed:
[ ' "Old Test","2009_10_20"\n', ' "Current Test",2009_10_25"\n', '
"Future Test","2009_11_01" ' ]
We could also do print len(dates) and see a 3. This is what readlines()
produces. It reads the raw data from the file, and produces a list, one
string per line. It uses newline to decide where each line ends, and
keeps the newlines with the data. It happens you didn't have one at the
end of the file, so we only see two of them.
readlines() doesn't know how to do anything more with the data in each
line. It only understands one level of parsing. So you have to write
some form of loop, which visits each item of the list, creating the
sublist you said your teacher wanted.
The logical way to divide up each line would be to use the split()
method of string. If you tell it to split by commas, it would work on
the particular file you've got. You have to watch a bit, since somebody
may slip in a comma inside one of those quoted items in the data file,
and expect you *not* to split based on that one. Again, it'd be good to
have a clear problem statement, containing the expectations of the input
file, among other things.
Now you have the additional problem of getting rid of the newlines and
the quotes in the input data. The desired result you showed has no
quotes, no commas, and no newlines anywhere in the list of lists.
strip() is the usual way of getting rid of leading and trailing
delimiters, after readlines, or after split.
So write some loops, and try to divvy up the data, and we can guide you
as you learn. As always, when you want help with an error, use copy and
paste to show us the entire thing.
One more question. Are you permitted to use some of the fancier modules
of the python library, such as csv? If I were your teacher, I'd
explicitly prevent that type of solution, as you need to know how to do
things "by hand" before learning some of the more complex tools.
DaveA
More information about the Tutor
mailing list