Skip to content Skip to sidebar Skip to footer

Looped Regression Model In Python/sklearn

I'm trying to systematically regress a couple of different dependent variables (countries) on the same set of inputs/independent variables, and want to do this in a looped fashion

Solution 1:

Simply iterate through the column names. Then pass name into a defined function. In fact, you can wrap the process in a dictionary comprehension and pass into DataFrame constructor to return a dataframe of predicted values (same shape as original dataframe):

X = pd.DataFrame(...)
countries = pd.DataFrame(...)

def reg_proc(label):
    y = countries[label]

    regressor = LinearRegression()
    regressor.fit(X, y)

    y_pred = regressor.predict(X)        
    return(y_pred)

pred_df = pd.DataFrame({lab: reg_proc(lab) for lab in countries.columns}, 
                       columns = countries.columns)

To demonstrate with random, seeded data where tools below would be your countries:

Data

import numpy as np
import pandas as pd

from sklearn.linear_model import LinearRegression

np.random.seed(7172018)
tools = pd.DataFrame({'pandas': np.random.uniform(0,1000,50),
                      'r': np.random.uniform(0,1000,50),
                      'julia': np.random.uniform(0,1000,50),
                      'sas': np.random.uniform(0,1000,50),
                      'spss': np.random.uniform(0,1000,50),
                      'stata': np.random.uniform(0,1000,50)
                     },  
                     columns=['pandas', 'r', 'julia', 'sas', 'spss', 'stata'])

X = pd.DataFrame({'Input1': np.random.randn(50)*10,
                  'Input2': np.random.randn(50)*10,
                  'Input3': np.random.randn(50)*10,
                  'Input4': np.random.randn(50)*10})

Model

defreg_proc(label):
    y = tools[label]

    regressor = LinearRegression()
    regressor.fit(X, y)

    y_pred = regressor.predict(X)        
    return(y_pred)

pred_df = pd.DataFrame({lab: reg_proc(lab) for lab in tools.columns}, 
                       columns = tools.columns)

print(pred_df.head(10))

#        pandas           r       julia         sas        spss       stata# 0  547.631679  576.025733  682.390046  507.767567  246.020799  557.648181# 1  577.334819  575.992992  280.579234  506.014191  443.044139  396.044620# 2  430.494827  576.211105  541.096721  441.997575  386.309627  558.472179# 3  440.662962  524.582054  406.849303  420.017656  508.701222  393.678200# 4  588.993442  472.414081  453.815978  479.208183  389.744062  424.507541# 5  520.215513  489.447248  670.708618  459.375294  314.008988  516.235188# 6  515.266625  459.292370  477.485995  436.398180  446.777292  398.826234# 7  423.930650  414.069751  629.444118  378.059735  448.760240  449.062734# 8  549.769034  406.531405  653.557937  441.425445  348.725447  456.089921# 9  396.826924  399.327683  717.285415  361.235709  444.830491  429.967976

Post a Comment for "Looped Regression Model In Python/sklearn"