Skip to content Skip to sidebar Skip to footer

Pandas Auto-renaming Same Headers

I'm using Python 3.7 with pandas. I have successfully loaded my csv file and placed headers in a list csv_file = pandas.read_csv(file, encoding='ISO-8859-1') headers = [line.up

Solution 1:

There is an option called mangle_dupe_cols which is by default True (i.e. make duplicated columns as X, X.1, ... , X.N) But this option is not exactly meant to be set as False.

As pandas is warning in its document, "Passing in False will cause data to be overwritten if there are duplicate names in the columns."

Source: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

Solution 2:

Does your CSV file have multiple headers called "ADID"?

That's not going to work. Headers need to be unique. Otherwise, if you refer to column "ADID", how does it know if you are talking about ADID or ADID.1 or ADID.2?

Solution 3:

It is possible, but not recommended.

You can use str.replace with regex - (\.\d+)$:

. matches the character . literally (case sensitive) \d+ matches a digit (equal to [0-9]) + Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy) $ asserts position at the end of a line


c = ['ADID', 'FIRST NAME', 'LAST NAME', 'FULL NAME', 
     'ADID.1', 'ADID.2', 'ROLE 2', 'GROUP', 'DIVISION', 'TEAM', 'COMPANY']  
df = pd.DataFrame(columns=c)

df.columns = df.columns.str.replace('(\.\d+)$','')
print (df)
Empty DataFrame
Columns: [ADID, FIRST NAME, LAST NAME, FULL NAME, 
          ADID, ADID, ROLE 2, GROUP, DIVISION, TEAM, COMPANY]
Index: []

Post a Comment for "Pandas Auto-renaming Same Headers"