Skip to content Skip to sidebar Skip to footer

Identify Ids With Similar Address

I have a data in a csv file which basically has some IDs, their corresponding address and the matching similarity percentage of 1 address with other. I want to identify the IDs whi

Solution 1:

I have wrote a code which gives a list containg the "Search string" and it's corresponding matching 'Cust_Id'.

The Code is,

 import pandas as pd

def duplicates(lst, item):
   return [i for i, x in enumerate(lst) if x == item]

# Creating Data frame
data = {'Cust_Id' : ['1 ','2' , '3','4','5','6'],
        'Match Ratio'  : [[('ABC', 100)],[('DEF', 100)],[('DEF', 100)], [('ABC', 100)],[('PQR', 100)],[('DEF', 100)]],
        'Search' : ['ABC','DEF','XYZ','PQR','TUV','LMN']
        }
df = pd.DataFrame(data)

print(df)
# Creating a list of 1'st value of tuple Match Ratio
matches = df['Match Ratio'].tolist()
matches = [x[0][0] for x in matches]

found  = []
for s indf['Search']:
    data_list = []
    if s in matches:
        index = duplicates(matches,s)
        Cust_Id = list([df['Cust_Id'][i]] for i in index)
        data_list.append(s)
        data_list.append(Cust_Id)
        found.append(data_list)
print(found)

Dataframe output

  Cust_Id   Match Ratio Search
01[(ABC, 100)]    ABC
12[(DEF, 100)]    DEF
23[(DEF, 100)]    XYZ
34[(ABC, 100)]    PQR
45[(PQR, 100)]    TUV
56[(DEF, 100)]    LMN

Found List output

[['ABC', [['1 '], ['4']]], ['DEF', [['2'], ['3'], ['6']]], ['PQR', [['5']]]]

Hope you got what you were looking for :)

Post a Comment for "Identify Ids With Similar Address"