Skip to content Skip to sidebar Skip to footer

How To Calculate Counts And Frequencies For Pairs In List Of Lists?

Bases refers to A,T,G and C sample = [['CGG','ATT'],['GCGC','TAAA']] # Note on fragility of data: Each element can only be made up only 2 of the 4 bases. # [['CGG' ==> Only C

Solution 1:

You are not really using Counter any different than a plain dict. Try something like the following approach:

>>> sample = [['CGG','ATT'],['GCGC','TAAA']]
>>> from collections import Counter
>>> base_counts = [[Counter(base) for base in sub] forsubinsample]
>>> base_counts
[[Counter({'G': 2, 'C': 1}), Counter({'T': 2, 'A': 1})], [Counter({'G': 2, 'C': 2}), Counter({'A': 3, 'T': 1})]]

Now you can continue with a functional approach using nested comprehensions to transform your data*:

>>> base_freqs = [[{k_v[0]:k_v[1]/len(bases[i]) for i,k_v inenumerate(count.items())} for count in counts] 
... for counts, bases inzip(base_counts, sample)]
>>> >>> base_freqs
[[{'G': 0.6666666666666666, 'C': 0.3333333333333333}, {'A': 0.3333333333333333, 'T': 0.6666666666666666}], [{'G': 0.5, 'C': 0.5}, {'A': 0.75, 'T': 0.25}]]
>>> 

*Note, some people do not like big, nested comprehensions like that. I think it's fine as long as you are sticking to functional constructs and not mutating data structures inside your comprehensions. I actually find it very expressive. Others disagree vehemently. You can always unfold that code into nested for-loops.

Anyway, you can then work the same thing with the pairs. First:

>>> pairs = [list(zip(*bases)) for bases in sample]
>>> pairs
[[('C', 'A'), ('G', 'T'), ('G', 'T')], [('G', 'T'), ('C', 'A'), ('G', 'A'), ('C', 'A')]]
>>> pair_counts = [Counter(base_pair) for base_pair in pairs]
>>> pair_counts
[Counter({('G', 'T'): 2, ('C', 'A'): 1}), Counter({('C', 'A'): 2, ('G', 'T'): 1, ('G', 'A'): 1})]
>>> 

Now, here it is easier to not use comprehensions so we don't have to calculate total more than once:

>>> pair_freq = []
>>> for count in pair_counts:
...   total = sum(count.values())
...   pair_freq.append({k:c/total for k,c in count.items()})
... >>> pair_freq
[{('C', 'A'): 0.3333333333333333, ('G', 'T'): 0.6666666666666666}, {('G', 'T'): 0.25, ('C', 'A'): 0.5, ('G', 'A'): 0.25}]
>>> 

Post a Comment for "How To Calculate Counts And Frequencies For Pairs In List Of Lists?"