Skip to content Skip to sidebar Skip to footer

Deduplicating Code In Slightly Different Functions

I have two very similar loops, and these two contain an inner loop that is very similar to a third loop (eh... :) ). Illustrated with code it looks close to this: # First function

Solution 1:

Having a common function that takes an extra parameter that controls where to compute retrieved_tests would work too.

e.g.

deffmeasure_kfold_generic(array, nfolds, mode):
    ret = []

    # Kfold1 and kfold2 both have this outer loopfor train_index, test_index in KFold(len(array), nfolds):
        correlation = analyze(array[train_index])

        # Retrieved tests is calculated outside the build loop in kfold2if mode==2:
            retrieved_tests = _sum_tests(correlation)

        for build in array[test_index]:  # <- All functions have this loop# Retrieved tests is calculated inside the build loop in kfold1if mode==1:
                retrieved_tests = get_tests(set(build['modules']), correlation)

            relevant_tests = set(build['tests'])
            fval = calc_f(relevant_tests, retrieved_tests)
            if fval isnotNone:
                ret.append(fval)

Solution 2:

One way is to write the inner loops each as a function, and then have the outer loop as a separate function that receives the others as an argument. This is something close to what is done in sorting functions (that receive the function that should be used to compare two elements).

Of course, the hard part is to find what exactly is the common part between all functions, which is not always simple.

Solution 3:

Typical solution would be to identify parts of algorithm and use Template method design pattern where different stages would be implemented in subclasses. I do not understand your code at all, but I assume there would be methods like makeGlobalRetrievedTests() and makeIndividualRetrievedTests()?

Solution 4:

I'd approach the problem inside-out: by factoring out the innermost loop. This works well with a 'functional' style (as well as 'functional programming'). It seems to me that if you generalize fmeasure_all a bit you could implement all three functions in terms of that. Something like

deffmeasure(builds, calcFn, retrieveFn):
    ret = []
    for build in array:
        relevant = set(build['tests'])
        fval = calcFn(relevant, retrieveFn(build))
        if fval isnotNone:
            ret.append(fval)

    return ret

This allows you to define:

deffmeasure_kfold1(array, nfolds):
    ret = []

    # Kfold1 and kfold2 both have this outer loopfor train_index, test_index in KFold(len(array), nfolds):
        correlation = analyze(array[train_index])

        ret += fmeasure(array[test_index], calc_f,
                        lambda build: get_tests(set(build['modules']), correlation))

    return ret


deffmeasure_kfold2(array, nfolds):
    ret = []

    # Kfold1 and kfold2 both have this outer loopfor train_index, test_index in KFold(len(array), nfolds):
        correlation = analyze(array[train_index])

        # Retrieved tests is calculated outside the build loop in kfold2
        retrieved_tests = _sum_tests(correlation)

        ret += fmeasure(array[test_index], calc_f, lambda _: retrieved_tests)

    return ret


deffmeasure_all(array):
    return fmeasure(array,
                    lambda relevant, _: calc_f2(relevant),
                    lambda x: x)

By now, fmeasure_kfold1 and fmeasure_kfold2 look awfully similiar. They mostly differ in how fmeasure is called, so we can implement a generic fmeasure_kfoldn function which centralizes the iteration and collecting the results:

def fmeasure_kfoldn(array, nfolds, callable):
    ret = []
    for train_index, test_index in KFold(len(array), nfolds):
        correlation = analyze(array[train_index])
        ret += callable(array[test_index], correlation)
    return ret

This allows defining fmeasure_kfold1 and fmeasure_kfold2 very easily:

deffmeasure_kfold1(array, nfolds):
    defmeasure(builds, correlation):
        return fmeasure(builds, calc_f, lambda build: get_tests(set(build['modules']), correlation))
    return fmeasure_kfoldn(array, nfolds, measure)


deffmeasure_kfold2(array, nfolds):
    defmeasure(builds, correlation):
        retrieved_tests = _sum_tests(correlation)
        return fmeasure(builds, calc_f, lambda _: retrieved_tests)
    return fmeasure_kfoldn(array, nfolds, measure)

Post a Comment for "Deduplicating Code In Slightly Different Functions"