[Tutor] just what does read() return?

Fri Oct 1 02:01:48 CEST 2010

On Fri, 1 Oct 2010 08:32:40 am Alex Hall wrote:

> I fully expected to see txt be an array of strings since I figured
> self.original would have been split on one or more new lines. It
> turns out, though, that I get this instead:
> ['l\nvx vy z\nvx vy z']

There's no need to call str() on something that already is a string. 
Admittedly it doesn't do much harm, but it is confusing for the person 
reading, who may be fooled into thinking that perhaps the argument 
wasn't a string in the first place.

The string split method doesn't interpret its argument as a regular 
expression. r'\n+' has no special meaning here. It's just three literal 
characters backslash, the letter n, and the plus sign. split() tries to 
split on that substring, and since your data doesn't include that 
combination anywhere, returns a list containing a single item:

>>> "abcde".split("ZZZ")
['abcde']

> How is it that txt is not an array of the lines in the file, but
> instead still holds \n characters? I thought the manual said read()
> returns a string:

It does return a string. It is a string including the newline 
characters.

[...]
> I know I can use f.readline(), and I was doing that before and it all
> worked fine. However, I saw that I was reading the file twice and, in
> the interest of good practice if I ever have this sort of project
> with a huge file, I thought I would try to be more efficient and read
> it once.

You think that keeping a huge file in memory *all the time* is more 
efficient? It's the other way around -- when dealing with *small* files 
you can afford to keep it in memory. When dealing with huge files, you 
need to re-write your program to deal with the file a piece at a time. 
(This is often a good strategy for small files as well, but it is 
essential for huge ones.)

Of course, "small" and "huge" is relative to the technology of the day. 
I remember when 1MB was huge. These days, huge would mean gigabytes. 
Small would be anything under a few tens of megabytes.

-- 
Steven D'Aprano