[Tutor] Replacing a value in a list
Alan Gauld
alan.gauld at yahoo.co.uk
Mon Aug 16 15:28:06 EDT 2021
On 16/08/2021 18:15, nzbz xx wrote:
> I have the codes as such now but it doesn't yield the same number of input
> and output e.g. given the raw data is [-999,-999, 3, 4, -999 ], the output
> generated is [3, 3, 4]. I'm not sure where the problem lies when I've
> already done a for-loop and it should give me the same number of elements
> in the list.
>
> def clean_data(dataset):
>
> data_cleaned_count = 0
> clean_list = []
>
> # Identifying all valid data in the dataset
> for data in range(len(dataset)):
> if dataset[data] != -999:
> clean_list.append(dataset[data])
First, you shouldn't need the index.
Just use the data items directly:
for data in dataset:
if data != -999:
clean_list.append(data)
Much more pythonic.
However its also wrong! Because it builds clean list without any
elements where the -999 occurs. You need the exception processing
embedded inside the loop. And you probably need to use enumerate()
to get the index for accessing the earlier or later elements.
Something like:
for index, item in enumerate(dataset):
if item != -999:
clean_list.append(item)
else:
# count markers
n = 1
while dataset[index+n] = -999:
n += 1
if n == 1: # do something with item
if n == 2: do another thing
etc...
clean_list.append(item)
> # Replacing missing data
> for md in range(len(dataset)):
> if dataset[md] == -999:
>
> consecutive_invalid_count = 0
> for i in range(len(dataset)-1):
> if dataset[i] == dataset[i+1]:
> consecutive_invalid_count +=1
This loop looks wrong to me, although I haven't worked
through it in detail. But it looks like you are iterating
over most of dataset each time? This will count adjacent
equal values not just marker values?
> start = dataset.index(-999)
> end = start + consecutive_invalid_count
>
> left_idx = start-1 # Finding the adjacent valid data
> right_idx = end + 1
>
> if abs(md - left_idx) > abs(md - right_idx) or md >=
> len(dataset)-1: # Locating the nearest valid data
> clean_list.insert(md,dataset[left_idx] )
> data_cleaned_count += 1
>
> if abs(md - left_idx) < abs(md - right_idx) or md == 0:
> clean_list.insert(md, dataset[right_idx])
> data_cleaned_count += 1
>
> On Sat, Aug 14, 2021 at 8:10 PM Alan Gauld via Tutor <tutor at python.org>
> wrote:
>
>> On 14/08/2021 05:23, nzbz xx wrote:
>>> Assuming that when there are consecutive missing values, they should be
>>> replaced with adjacent valid values e.g [1,2,-999,-999,5] should give
>>> [1,2,2,5,5]. And given [1,-999,-999,-999,5], the middle missing value
>> would
>>> take the average of index 1 & 3. This should get an output of [1, 2 , 2,
>>> 3.5, 5]. How should it be coded for it to solve from the outer elements
>>> first to the inner elements?
>>
>> That still leaves the question of what happens when there are 4 or 5
>> blank pieces? For 4 you can duplicate the average, but what about 5?
>> Is there a point at which ou decide the data is too damaged to continue
>> and throw an error?
>>
>> As for coding it I'd write a small helper function to find the
>> start/stop indices of the blanks (or the start and length if you prefer)
>>
>> Something like:
>>
>> def findBlanks(seq,blank=-999):
>> start = seq.index(blank)
>> end = start+1
>> while seq[end] == blank:
>> end+=1
>> return start,end
>>
>> For the simple case of 3 blanks you can do
>>
>> if start != 0:
>> seq[start] = seq[start-1]
>> if end != len(seq)-1:
>> end = seq[end+1]
>>
>> gapsize = end-start
>> if gapsize >= 3:
>> seq[start+1]=(seq[start]+seq[end])/2
>>
>> Now what you do if end-start>3 is up to you.
>> And what you do if the gap is at the start or
>> end of seq is also up to you...
>>
>> It's all in the specification.
>> What is supposed to happen? Once you know the
>> complete algorithm the code should practically
>> write itself.
>>
>> --
>> Alan G
>> Author of the Learn to Program web site
>> http://www.alan-g.me.uk/
>> http://www.amazon.com/author/alan_gauld
>> Follow my photo-blog on Flickr at:
>> http://www.flickr.com/photos/alangauldphotos
>>
>>
>> _______________________________________________
>> Tutor maillist - Tutor at python.org
>> To unsubscribe or change subscription options:
>> https://mail.python.org/mailman/listinfo/tutor
>>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos
More information about the Tutor
mailing list