A question about yield
Chris Rebert
clp2 at rebertia.com
Sun Nov 7 13:14:12 EST 2010
On Sun, Nov 7, 2010 at 9:56 AM, chad <cdalten at gmail.com> wrote:
> On Nov 7, 9:47 am, Chris Rebert <c... at rebertia.com> wrote:
>> On Sun, Nov 7, 2010 at 9:34 AM, chad <cdal... at gmail.com> wrote:
>> <snip>
>> > #!/usr/local/bin/python
>>
>> > import sys
>>
>> > def construct_set(data):
>> > for line in data:
>> > lines = line.splitlines()
>> > for curline in lines:
>> > if curline.strip():
>> > key = curline.split(' ')
>> > value = int(key[0])
>> > yield value
>>
>> > def approximate(first, second):
>> > midpoint = (first + second) / 2
>> > return midpoint
>>
>> > def format(input):
>> > prev = 0
>> > value = int(input)
>>
>> > with open("/home/cdalten/oakland/freq") as f:
>> > for next in construct_set(f):
>> > if value > prev:
>> > current = prev
>> > prev = next
>>
>> > middle = approximate(current, prev)
>> > if middle < prev and value > middle:
>> > return prev
>> > elif value > current and current < middle:
>> > return current
>> <snip>
>> > The question is about the construct_set() function.
>> <snip>
>> > I have it yield on 'value' instead of 'curline'. Will the program
>> > still read the input file named freq line by line even though I don't
>> > have it yielding on 'curline'? Or since I have it yield on 'value',
>> > will it read the entire input file into memory at once?
>>
>> The former. The yield has no effect at all on how the file is read.
>> The "for line in data:" iteration over the file object is what makes
>> Python read from the file line-by-line. Incidentally, the use of
>> splitlines() is pointless; you're already getting single lines from
>> the file object by iterating over it, so splitlines() will always
>> return a single-element list.
>
> But what happens if the input file is say 250MB? Will all 250MB be
> loaded into memory at once?
No. As I said, the file will be read from 1 line at a time, on an
as-needed basis; which is to say, "line-by-line".
> Just curious, because I thought maybe
> using something like 'yield curline' would prevent this scenario.
Using "for line in data:" is what prevents that scenario.
The "yield" is only relevant to how the file is read insofar as the
the alternative to yield-ing would be to return a list, which would
necessitate going through the entire file in continuous go and then
returning a very large list; but even then, the file's content would
still be read from line-by-line, not all at once as one humongous
string.
Cheers,
Chris
--
http://blog.rebertia.com
More information about the Python-list
mailing list