Find Duplicate Words In Two Files

August 04, 2023 Post a Comment

I've two text files. I need to check for duplicate words inside them. Is there a way more concise than this code? file1 = set(line.strip() for line in open('/home/user1/file1.txt')

Solution 1:

You can write concise code but more importantly you don't need to create two sets, you can use set.intersection which will allow your code to work for larger data sets and run faster:

withopen('/home/user1/file1.txt') as f1,  open('/home/user1/file2.txt') as f2:
    for line inset(map(str.rstrip,f2)).intersection(map(str.rstrip,f2))):
        print(line)

For python2 use itertools.imap:

from itertools import imap
withopen('/home/user1/file1.txt') as f1,  open('/home/user1/file2.txt') as f2:
    for line inset(imap(str.rstrip,f2)).intersection(imap(str.rstrip(f2))):
        print(line)

You create a single set which is then added to iterating over the iterable passed in i.e the str.rstripped lines of file2 as oopposed to creating two full sets of lines first then doing the intersection.

Solution 2:

Even shorter:

Baca Juga

withopen('/home/user/file1.txt') as file1, open('/home/user/file2.txt') as file2:
    print"".join([word+"\n"for word inset(file1.read().split()) & set(file2.read().split())])

Solution 3:

This is one line shorter and closes both files after use:

withopen('/home/user1/file1.txt') as file1, open('/home/user1/file2.txt') as file2:
    for line inset(line.strip() for line in file1) & set(line.strip() for line in file2):
        if line: 
            print(line)

Variation with only one set:

withopen('/home/user1/file1.txt') as file1, open('/home/user1/file2.txt') as file2:
    for line inset(line.strip() for line in file1).intersection(line.strip() for line in 
                                                                 file2):
        if line: 
            print(line)

Python College

Find Duplicate Words In Two Files

Solution 1:

Solution 2:

Solution 3:

Post a Comment for "Find Duplicate Words In Two Files"