[Tutor] Newbie Here -- Averaging & Adding Madness Over a Given (x) Range?!?!
Michael J. McConachie
michael at redhat.com
Fri Feb 15 15:45:05 CET 2013
@ Stephen,
Thank you for the answers. I appreciate your understanding, and
patience; I understand that it was confusing (unintentionally) and
probably irritating to any of the seasoned tutor list members.
Your examples helped greatly, and was the push I needed. Happy Friday,
and thanks again,
Mike
On 02/14/2013 05:48 PM, Steven D'Aprano wrote:
> On 15/02/13 07:55, Michael McConachie wrote:
>
>> Essentially:
>>
>> 1. I have a list of numbers that already exist in a file. I
>> generate this file by parsing info from logs.
>> 2. Each line contains an integer on it (corresponding to the number
>> of milliseconds that it takes to complete a certain repeated task).
>> 3. There are over a million entries in this file, one per line; at
>> any given time it can be just a few thousand, or more than a million.
>>
>> Example:
>> -------
>> 173
>> 1685
>> 1152
>> 253
>> 1623
>
>
> A million entries sounds like a lot to you or me, but to your
> computer, it's not. When you start talking tens or hundreds of
> millions, that's possibly a lot.
>
> Do you know how to read those numbers into a Python list? Here is the
> "baby step" way to do so:
>
>
> data = [] # Start with an empty list.
> f = open("filename") # Obviously you have to use the actual file name.
> for line in f: # Read the file one line at a time.
> num = int(line) # Convert each line into an integer (whole number)
> data.append(num) # and append it to the end of the list.
> f.close() # Close the file when done.
>
>
> Here's a more concise way to do it:
>
> with open("filename") as f:
> data = [int(line) for line in f]
>
>
>
> Once you have that list of numbers, you can sum the whole lot:
>
> sum(data)
>
>
> or just a range of the items:
>
> sum(data[:100]) # The first 100 items.
>
> sum(data[100:200]) # The second 100 items.
>
> sum(data[-50:]) # The last 50 items.
>
> sum(data[1000:]) # Item 1001 to the end. (See below.)
>
> sum(data[5:99:3]) # Every third item, starting at index 5 and ending
> at index 98.
>
>
>
> This is called "slicing", and it is perhaps the most powerful and
> useful technique that Python gives you for dealing with lists. The
> rules though are not necessarily the most intuitive though.
>
>
> A slice is either a pair of numbers separated with a colon, inside the
> square brackets:
>
> data[start:end]
>
> or a triple:
>
> data[start:end:step]
>
> Any of these three numbers can be left out. The default values are:
>
> start=0
> end=length of the sequence being sliced
> step=1
>
> They can also be negative. If start or end are negative, they are
> interpreted as "from the end" rather than "from the beginning".
>
> Item positions are counted from 0, which will be very familiar to C
> programmers. The start index is included in the slice, the end
> position is excluded.
>
> The model that you should think of is to imagine the sequence of items
> labelled with their index, starting from zero, and with a vertical
> line *between* each position. Here is a sequence of 26 items, showing
> the index in the first line and the value in the second:
>
>
> |0|1|2|3|4|5|6|7|8|9| ... |25|
> |a|b|c|d|e|f|g|h|i|j| ... |z |
>
> When you take a slice, the items are always cut at the left. So, if
> the above is called "letters", we have:
>
> letters[0:4] # returns "abcd"
>
> letters[2:8] # returns "cdefgh"
>
> letters[2:8:2] # returns "ceg"
>
> letters[-3:] # returns "xyz"
>
>
>
>> Eventually what I'll need to do is:
>>
>> 1. Index the file and/or count the lines, as to identify each line's
>> positional relevance so that it can average any range of numbers that
>> are sequential; one to one another.
>
>
> No need. Python already does that, automatically, when you read the
> data into a list.
>
>
>
>> 2. Calculate the difference between any given (x) range. In order
>> to be able to ask the program to average every 5, 10, 100, 100, or
>> 10,000 etc. --> until completion. This includes the need to dealing
>> with stray remainders at the end of the file that aren't divisible by
>> that initial requested range.
>
> I don't quite understand you here. First you say "difference", then
> you say "average". Can you show a sample of data, say, 10 values, and
> the sorts of typical calculations you want to perform, with the
> answers you expect to get?
>
>
> For example, here's 10 numbers:
>
>
> 103, 104, 105, 109, 111, 112, 115, 120, 123, 128
>
>
> Here are the running averages of 3 values:
>
> (103+104+105)/3
>
> (104+105+109)/3
>
> (105+109+111)/3
>
> (109+111+112)/3
>
> (111+112+115)/3
>
> (112+115+120)/3
>
> (115+120+123)/3
>
> (120+123+128)/3
>
>
> Is that what you mean? If so, then Python can deal with this
> trivially, using slicing. With your data stored in list "data", as
> above, I can say:
>
>
> for i in range(0, len(data)-3): # Stop 3 from the end.
> print sum(data[i:i+3])
>
>
> to print the running sums taking three items at a time.
>
>
>
> The rest of your post just confuses me. Until you explain exactly what
> calculations you are trying to perform, I can't tell you how to
> perform them :-)
>
>
>
>
More information about the Tutor
mailing list