Run Regression Analysis On Multiple Subsets Of Pandas Columns Efficiently
I could have chosen to go for a shorter question that only focuses on the core-problem here that is list permutations. But the reason I'm bringing statsmodels and pandas into the q
Solution 1:
Based on the help I got here, I've been able to put together a function that takes all columns in a pandas dataframe, defines a dependent variable, and returns all unique combinations of the remaining variables. The result differs a bit from the desired result as defined above but makes more sense for practical use, I think. I'm still hoping that others will be able to post even better solutions.
Here it is:
# Importsimport pandas as pd
import numpy as np
import itertools
# A datafrane with random numbers
np.random.seed(123)
rows = 12
listVars= ['y','x1', 'x2', 'x3']
rng = pd.date_range('1/1/2017', periods=rows, freq='D')
df_1 = pd.DataFrame(np.random.randint(100,150,size=(rows, len(listVars))), columns=listVars)
df_1 = df_1.set_index(rng)
# The functiondefStepWise(columns, dependent):
""" Takes the columns of a pandas dataframe, defines a dependent variable
and returns all unique combinations of the remaining (independent) variables.
"""
independent = columns.copy()
independent.remove(dependent)
lst1 = []
lst2 = []
for i in np.arange(1, len(independent)+1):
#print(list(itertools.combinations(independent, i)))
elem = list(itertools.combinations(independent, i))
lst1.append(elem)
lst2.extend(elem)
combosIndependent = [list(elem) for elem in lst2]
combosAll = [[dependent, other] for other in combosIndependent]
return(combosAll)
lExec = StepWise(columns = list(df_1), dependent = 'y')
print(lExec)
If you combine this with snippet 3 above, you can easily store the results of multiple regression analyses on a specified dependent variable in a pandas data frame.
Post a Comment for "Run Regression Analysis On Multiple Subsets Of Pandas Columns Efficiently"