Column-Specific Conditions and Column-Specific Substitution Values

Hi Everyone, a beginner's question on how to perform some data substitution efficiently. I have a panel dataset, or in other words x individuals observed over a certain time span. For each column or individual, I need to substitute a certain value anytime a certain condition is satisfied. Both the condition and the value to be substituted into the panel dataset are individual specific. I can tackle the fact that the condition is individual specific but I cannot find a way to tackle the fact that the value to be substituted is individual specific without using a for – lop. Frankly, considering the size of the dataset the use of a for loop is perfectly acceptable in terms of the time needed to complete task but still it would be nice to learn a way to do this (a task I implement often) in a more efficient way. Thanks in advance Cristiano import numpy as np from copy import deepcopy Data = np.array([[0,4,0], [2,5,7], [2,5,6]]) EditedData = deepcopy(Data) Condition = np.array([0, 5, 6]) # individual-specific condition SubstituteData = np.array([1, 10,100]) # The logic here # if the value of any obssrvation for the 1st individual is 0, substitute 1, # the 2nd individual is 5, substitute 10 # the 3rd individual is 6, substitute 100 # This wouldn't a problem if SubstituteData was not individual specific Data # eg EditedData[Data==Condition] = 555 # As SubstituteData is individual specifc, I need to use a for loop for i in range(np.shape(EditedData)[1]): TempData = EditedData[:, i] # I introduce TempData to increase readability TempData[TempData == Condition[i]] = SubstituteData[i] EditedData[:, i] = TempData print EditedData

Cristiano Fini wrote:
Hi Everyone, a beginner's question on how to perform some data substitution efficiently. I have a panel dataset, or in other words x individuals observed over a certain time span. For each column or individual, I need to substitute a certain value anytime a certain condition is satisfied. Both the condition and the value to be substituted into the panel dataset are individual specific. I can tackle the fact that the condition is individual specific but I cannot find a way to tackle the fact that the value to be substituted is individual specific without using a for – lop. Frankly, considering the size of the dataset the use of a for loop is perfectly acceptable in terms of the time needed to complete task but still it would be nice to learn a way to do this (a task I implement often) in a more efficient way. Thanks in advance Cristiano
import numpy as np from copy import deepcopy Data = np.array([[0,4,0], [2,5,7], [2,5,6]]) EditedData = deepcopy(Data)
Data.copy() will do for non-object arrays like this.
Condition = np.array([0, 5, 6]) # individual-specific condition SubstituteData = np.array([1, 10,100]) # The logic here # if the value of any obssrvation for the 1st individual is 0, substitute 1, # the 2nd individual is 5, substitute 10 # the 3rd individual is 6, substitute 100
# This wouldn't a problem if SubstituteData was not individual specific Data # eg EditedData[Data==Condition] = 555 # As SubstituteData is individual specifc, I need to use a for loop for i in range(np.shape(EditedData)[1]): TempData = EditedData[:, i] # I introduce TempData to increase readability TempData[TempData == Condition[i]] = SubstituteData[i] EditedData[:, i] = TempData
How about should_replace = (Data != Condition[np.newaxis, :]) Then for instance EditedData = Data * (~should_replace) + SubstituteData[np.newaxis, :] * should_replace although a copy-and-modification in EditedData might be possible as well... Dag Sverre

Cristiano Fini wrote:
Hi Everyone, a beginner's question on how to perform some data substitution efficiently. I have a panel dataset, or in other words x individuals observed over a certain time span. For each column or individual, I need to substitute a certain value anytime a certain condition is satisfied. Both the condition and the value to be substituted into the panel dataset are individual specific. I can tackle the fact that the condition is individual specific but I cannot find a way to tackle the fact that the value to be substituted is individual specific without using a for – lop. Frankly, considering the size of the dataset the use of a for loop is perfectly acceptable in terms of the time needed to complete task but still it would be nice to learn a way to do this (a task I implement often) in a more efficient way. Thanks in advance Cristiano
import numpy as np from copy import deepcopy Data = np.array([[0,4,0], [2,5,7], [2,5,6]]) EditedData = deepcopy(Data) Condition = np.array([0, 5, 6]) # individual-specific condition SubstituteData = np.array([1, 10,100]) # The logic here # if the value of any obssrvation for the 1st individual is 0, substitute 1, # the 2nd individual is 5, substitute 10 # the 3rd individual is 6, substitute 100
# This wouldn't a problem if SubstituteData was not individual specific Data # eg EditedData[Data==Condition] = 555 # As SubstituteData is individual specifc, I need to use a for loop for i in range(np.shape(EditedData)[1]): TempData = EditedData[:, i] # I introduce TempData to increase readability TempData[TempData == Condition[i]] = SubstituteData[i] EditedData[:, i] = TempData
print EditedData
Instead of the loop, you could use: EditedData = np.choose(Data == Condition, (Data, SubstituteData)) Warren
------------------------------------------------------------------------
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (3)
-
Cristiano Fini
-
Dag Sverre Seljebotn
-
Warren Weckesser