Looped Regression Model In Python/sklearn
I'm trying to systematically regress a couple of different dependent variables (countries) on the same set of inputs/independent variables, and want to do this in a looped fashion
Solution 1:
Simply iterate through the column names. Then pass name into a defined function. In fact, you can wrap the process in a dictionary comprehension and pass into DataFrame
constructor to return a dataframe of predicted values (same shape as original dataframe):
X = pd.DataFrame(...)
countries = pd.DataFrame(...)
def reg_proc(label):
y = countries[label]
regressor = LinearRegression()
regressor.fit(X, y)
y_pred = regressor.predict(X)
return(y_pred)
pred_df = pd.DataFrame({lab: reg_proc(lab) for lab in countries.columns},
columns = countries.columns)
To demonstrate with random, seeded data where tools below would be your countries:
Data
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
np.random.seed(7172018)
tools = pd.DataFrame({'pandas': np.random.uniform(0,1000,50),
'r': np.random.uniform(0,1000,50),
'julia': np.random.uniform(0,1000,50),
'sas': np.random.uniform(0,1000,50),
'spss': np.random.uniform(0,1000,50),
'stata': np.random.uniform(0,1000,50)
},
columns=['pandas', 'r', 'julia', 'sas', 'spss', 'stata'])
X = pd.DataFrame({'Input1': np.random.randn(50)*10,
'Input2': np.random.randn(50)*10,
'Input3': np.random.randn(50)*10,
'Input4': np.random.randn(50)*10})
Model
defreg_proc(label):
y = tools[label]
regressor = LinearRegression()
regressor.fit(X, y)
y_pred = regressor.predict(X)
return(y_pred)
pred_df = pd.DataFrame({lab: reg_proc(lab) for lab in tools.columns},
columns = tools.columns)
print(pred_df.head(10))
# pandas r julia sas spss stata# 0 547.631679 576.025733 682.390046 507.767567 246.020799 557.648181# 1 577.334819 575.992992 280.579234 506.014191 443.044139 396.044620# 2 430.494827 576.211105 541.096721 441.997575 386.309627 558.472179# 3 440.662962 524.582054 406.849303 420.017656 508.701222 393.678200# 4 588.993442 472.414081 453.815978 479.208183 389.744062 424.507541# 5 520.215513 489.447248 670.708618 459.375294 314.008988 516.235188# 6 515.266625 459.292370 477.485995 436.398180 446.777292 398.826234# 7 423.930650 414.069751 629.444118 378.059735 448.760240 449.062734# 8 549.769034 406.531405 653.557937 441.425445 348.725447 456.089921# 9 396.826924 399.327683 717.285415 361.235709 444.830491 429.967976
Post a Comment for "Looped Regression Model In Python/sklearn"