Deduplicating Code In Slightly Different Functions
Solution 1:
Having a common function that takes an extra parameter that controls where to compute retrieved_tests
would work too.
e.g.
deffmeasure_kfold_generic(array, nfolds, mode):
ret = []
# Kfold1 and kfold2 both have this outer loopfor train_index, test_index in KFold(len(array), nfolds):
correlation = analyze(array[train_index])
# Retrieved tests is calculated outside the build loop in kfold2if mode==2:
retrieved_tests = _sum_tests(correlation)
for build in array[test_index]: # <- All functions have this loop# Retrieved tests is calculated inside the build loop in kfold1if mode==1:
retrieved_tests = get_tests(set(build['modules']), correlation)
relevant_tests = set(build['tests'])
fval = calc_f(relevant_tests, retrieved_tests)
if fval isnotNone:
ret.append(fval)
Solution 2:
One way is to write the inner loops each as a function, and then have the outer loop as a separate function that receives the others as an argument. This is something close to what is done in sorting functions (that receive the function that should be used to compare two elements).
Of course, the hard part is to find what exactly is the common part between all functions, which is not always simple.
Solution 3:
Typical solution would be to identify parts of algorithm and use Template method design pattern where different stages would be implemented in subclasses. I do not understand your code at all, but I assume there would be methods like makeGlobalRetrievedTests()
and makeIndividualRetrievedTests()
?
Solution 4:
I'd approach the problem inside-out: by factoring out the innermost loop. This works well with a 'functional' style (as well as 'functional programming'). It seems to me that if you generalize fmeasure_all
a bit you could implement all three functions in terms of that. Something like
deffmeasure(builds, calcFn, retrieveFn):
ret = []
for build in array:
relevant = set(build['tests'])
fval = calcFn(relevant, retrieveFn(build))
if fval isnotNone:
ret.append(fval)
return ret
This allows you to define:
deffmeasure_kfold1(array, nfolds):
ret = []
# Kfold1 and kfold2 both have this outer loopfor train_index, test_index in KFold(len(array), nfolds):
correlation = analyze(array[train_index])
ret += fmeasure(array[test_index], calc_f,
lambda build: get_tests(set(build['modules']), correlation))
return ret
deffmeasure_kfold2(array, nfolds):
ret = []
# Kfold1 and kfold2 both have this outer loopfor train_index, test_index in KFold(len(array), nfolds):
correlation = analyze(array[train_index])
# Retrieved tests is calculated outside the build loop in kfold2
retrieved_tests = _sum_tests(correlation)
ret += fmeasure(array[test_index], calc_f, lambda _: retrieved_tests)
return ret
deffmeasure_all(array):
return fmeasure(array,
lambda relevant, _: calc_f2(relevant),
lambda x: x)
By now, fmeasure_kfold1
and fmeasure_kfold2
look awfully similiar. They mostly differ in how fmeasure
is called, so we can implement a generic fmeasure_kfoldn
function which centralizes the iteration and collecting the results:
def fmeasure_kfoldn(array, nfolds, callable):
ret = []
for train_index, test_index in KFold(len(array), nfolds):
correlation = analyze(array[train_index])
ret += callable(array[test_index], correlation)
return ret
This allows defining fmeasure_kfold1
and fmeasure_kfold2
very easily:
deffmeasure_kfold1(array, nfolds):
defmeasure(builds, correlation):
return fmeasure(builds, calc_f, lambda build: get_tests(set(build['modules']), correlation))
return fmeasure_kfoldn(array, nfolds, measure)
deffmeasure_kfold2(array, nfolds):
defmeasure(builds, correlation):
retrieved_tests = _sum_tests(correlation)
return fmeasure(builds, calc_f, lambda _: retrieved_tests)
return fmeasure_kfoldn(array, nfolds, measure)
Post a Comment for "Deduplicating Code In Slightly Different Functions"