A question about yield

Sun Nov 7 13:14:12 EST 2010

On Sun, Nov 7, 2010 at 9:56 AM, chad <cdalten at gmail.com> wrote:
> On Nov 7, 9:47 am, Chris Rebert <c... at rebertia.com> wrote:
>> On Sun, Nov 7, 2010 at 9:34 AM, chad <cdal... at gmail.com> wrote:
>> <snip>
>> > #!/usr/local/bin/python
>>
>> > import sys
>>
>> > def construct_set(data):
>> >    for line in data:
>> >        lines = line.splitlines()
>> >        for curline in lines:
>> >            if curline.strip():
>> >                key = curline.split(' ')
>> >                value = int(key[0])
>> >                yield value
>>
>> > def approximate(first, second):
>> >    midpoint = (first + second) / 2
>> >    return midpoint
>>
>> > def format(input):
>> >    prev = 0
>> >    value = int(input)
>>
>> >    with open("/home/cdalten/oakland/freq") as f:
>> >        for next in construct_set(f):
>> >            if value > prev:
>> >                current = prev
>> >                prev = next
>>
>> >        middle = approximate(current, prev)
>> >        if middle < prev and value > middle:
>> >            return prev
>> >        elif value > current and current < middle:
>> >            return current
>> <snip>
>> > The question is about the construct_set() function.
>> <snip>
>> > I have it yield on 'value' instead of 'curline'. Will the program
>> > still read the input file named freq line by line even though I don't
>> > have it yielding on 'curline'? Or since I have it yield on 'value',
>> > will it read the entire input file into memory at once?
>>
>> The former. The yield has no effect at all on how the file is read.
>> The "for line in data:" iteration over the file object is what makes
>> Python read from the file line-by-line. Incidentally, the use of
>> splitlines() is pointless; you're already getting single lines from
>> the file object by iterating over it, so splitlines() will always
>> return a single-element list.
>
> But what happens if the input file is say 250MB? Will all 250MB be
> loaded into memory at once?

No. As I said, the file will be read from 1 line at a time, on an
as-needed basis; which is to say, "line-by-line".

> Just curious, because I thought maybe
> using something like 'yield curline' would prevent this scenario.

Using "for line in data:" is what prevents that scenario.
The "yield" is only relevant to how the file is read insofar as the
the alternative to yield-ing would be to return a list, which would
necessitate going through the entire file in continuous go and then
returning a very large list; but even then, the file's content would
still be read from line-by-line, not all at once as one humongous
string.

Cheers,
Chris
--
http://blog.rebertia.com