Alternate Method To Avoid Loop In Pandas Dataframe
I have the following dataframe: table2 = pd.DataFrame({ 'Product Type': ['A', 'B', 'C', 'D'], 'State_1_Value': [10, 11, 12, 13], 'State_2_Value': [20, 21, 22, 2
Solution 1:
I was able to accomplish this with no loops using the following code:
As a result on my 10k x 200 table it ran in 3 minutes instead of the previous 2 hours.
Unfortunately now I need to run it on a 10k x 4k table, and I hit MemoryError on that one, but it may be out of the scope of this question.
df= pd.DataFrame({
'Product Type': ['A', 'B', 'C', 'D'],
'State_1_Value': [10, 11, 12, 13],
'State_2_Value': [20, 21, 22, 23],
'State_3_Value': [30, 31, 32, 33],
'State_4_Value': [40, 41, 42, 43],
'State_5_Value': [50, 51, 52, 53],
'State_6_Value': [60, 61, 62, 63],
'Lower_Bound': [-1, 1, .5, 5],
'Upper_Bound': [1, 2, .625, 15],
'sim_1': [0, 0, .61, 7],
'sim_2': [1, 1.5, .7, 9],
})
buckets = df.ix[:,-2:].sub(df['Lower_Bound'],axis=0).div(df['Upper_Bound'].sub(df['Lower_Bound'],axis=0),axis=0) * 5 + 1
low = buckets.applymap(int)
high = buckets.applymap(int) + 1
low = low.applymap(lambda x: 1if x < 1else x)
low = low.applymap(lambda x: 5if x > 5else x)
high = high.applymap(lambda x: 6if x > 6else x)
high = high.applymap(lambda x: 2if x < 2else x)
low_value = pd.DataFrame(df.filter(regex="State|Type").values[np.arange(low.shape[0])[:,None], low])
high_value = pd.DataFrame(df.filter(regex="State|Type").values[np.arange(high.shape[0])[:,None], high])
df1 = (high_value - low_value).mul((buckets - low).values) + low_value
df1['Product Type'] = df['Product Type']
Post a Comment for "Alternate Method To Avoid Loop In Pandas Dataframe"