Skip to content Skip to sidebar Skip to footer

Fixing The Code In Python To Change A Text File

I have a big text file like the small example: small example: chr1 37091 37122 D00645:305:CCVLRANXX:1:1104:21074:48301 0 - chr1 37091 37122 D00645:305:CCVLRANXX:1:1

Solution 1:

You can use Counter like this:

from collections import Counter

infile = open('infile.txt', 'rb')
content = []
for i in infile:
    # append only first 3 columns as one line string
    content.append('  '.join(i.split()[:3]))

# this is now dictionary
c = Counter(content)


elements = c.most_common(len(c.elements()))

withopen('outfile.txt','w') as f:
    for item, freq in elements:
        f.write('{}\t{}\n'.format(item, freq))

Solution 2:

You can also use pandas and your solution will be really easy:

Just read the big txt file in a pandas dataframe like:

df = pd.read_csv('infile.txt', sep=' ')
df.groupby([0,1,2]).count()

This should give you:

chr1 37091 37122     17
     74325 74356      1
     93529 93560      2

Let me know if this helps.

Solution 3:

You can use a regular dictionary with your target comparison lines as keys:

infile = 'infile.txt'
content = {}

withopen(infile, 'r') as fin:
    for line in fin:
        temp = line.split()
        ifnot temp[1]+temp[2] in content:
            content[temp[1]+temp[2]] = [1, temp[0:3]]
        else:
            content[temp[1]+temp[2]][0]+=1withopen('outfile.txt','w') as fout:
    for key, value in content.items():
        for entry in value[1]:
            fout.write(entry + ' ')
        fout.write(str(value[0]) + '\n')

The key is a concatenated second and third column. The value is a list - first element is the counter and second element is a list of values from your input file you want to save to the output. The if checks if there is already an entry with given key - if yes, it increments the counter, if not - it creates a new list with counter set to 1 and the appropriate values as the list part.

Note that for consistency the program uses the recommended with open in both cases. It also doesn't read the txt file in binary mode.

Solution 4:

Here's one way to do it:

withopen('infile.txt', 'r') as file:
    content = [i.split() for i in file.readlines()]

results = {}
for i in data:
    # use .setdefault to set counter as 0, increment at each match.
    results.setdefault('\t'.join(i[:3]), 0)
    results['\t'.join(i[:3])] += 1# results# {'chr1\t37091\t37122': 17, #  'chr1\t54832\t54863': 1, #  'chr1\t74307\t74338': 1,#  'chr1\t74325\t74356': 1, #  'chr1\t93529\t93560': 2}# Output the results with list comprehensionwithopen('outfile.txt', 'w') as file:
    file.writelines('\t'.join((k, str(v))) for k, v in results.items())

Or, just use Counter:

import Counter
withopen('infile.txt', 'r') as file:
    data = ['\t'.join(i.split()[:3]) for i in file.readlines()]

withopen('outfile.txt', 'w') as file:
    file.writelines('\t'.join((k, str(v))) for k, v inCounter(data).items())

# Counter(data).items()

# dict_items([('chr1\t37091\t37122', 17),
#             ('chr1\t54832\t54863', 1), 
#             ('chr1\t74307\t74338', 1), 
#             ('chr1\t74325\t74356', 1),
#             ('chr1\t93529\t93560', 2)])

In either case we group the first three "columns" as a key, then use said key to identify the number of times it occured in your data.

Post a Comment for "Fixing The Code In Python To Change A Text File"