Identify Ids With Similar Address
I have a data in a csv file which basically has some IDs, their corresponding address and the matching similarity percentage of 1 address with other. I want to identify the IDs whi
Solution 1:
I have wrote a code which gives a list containg the "Search string" and it's corresponding matching 'Cust_Id'.
The Code is,
import pandas as pd
def duplicates(lst, item):
return [i for i, x in enumerate(lst) if x == item]
# Creating Data frame
data = {'Cust_Id' : ['1 ','2' , '3','4','5','6'],
'Match Ratio' : [[('ABC', 100)],[('DEF', 100)],[('DEF', 100)], [('ABC', 100)],[('PQR', 100)],[('DEF', 100)]],
'Search' : ['ABC','DEF','XYZ','PQR','TUV','LMN']
}
df = pd.DataFrame(data)
print(df)
# Creating a list of 1'st value of tuple Match Ratio
matches = df['Match Ratio'].tolist()
matches = [x[0][0] for x in matches]
found = []
for s indf['Search']:
data_list = []
if s in matches:
index = duplicates(matches,s)
Cust_Id = list([df['Cust_Id'][i]] for i in index)
data_list.append(s)
data_list.append(Cust_Id)
found.append(data_list)
print(found)
Dataframe output
Cust_Id Match Ratio Search
01[(ABC, 100)] ABC
12[(DEF, 100)] DEF
23[(DEF, 100)] XYZ
34[(ABC, 100)] PQR
45[(PQR, 100)] TUV
56[(DEF, 100)] LMN
Found List output
[['ABC', [['1 '], ['4']]], ['DEF', [['2'], ['3'], ['6']]], ['PQR', [['5']]]]
Hope you got what you were looking for :)
Post a Comment for "Identify Ids With Similar Address"