How To Clean Data So That The Correct Arrival Code Is There For The City Pair?
How to clean data so that the correct arrival code is there for the city pair? From the picture, the CSV is like column 1: City Pair (Departure - Arrival), column 2 is meant to be
Solution 1:
As an example, and using the cleaner looking data from your other question...
Given:
..from your other question.
Try:
import pandas as pd
import numpy as np
import math
from math import sin, cos, sqrt, atan2, radians
def get_distance(in_lat1, in_lon1, in_lat2, in_lon2):
# approximate radius of earth in km
R = 6373.0
lat1 = radians(in_lat1)
lon1 = radians(in_lon1)
lat2 = radians(in_lat2)
lon2 = radians(in_lon2)
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
c = 2 * atan2(sqrt(a), sqrt(1 - a))
distance = R * c
return distance
df = pd.DataFrame({'Normalised City Pair': {0: 'London, United Kingdom - New York, United States',
1: 'Johannesburg, South Africa - London, United Kingdom',
2: 'London, United Kingdom - New York, United States',
3: 'Johannesburg, South Africa - London, United Kingdom',
4: 'London, United Kingdom - Singapore, Singapore'},
'Departure Code': {0: 'LHR', 1: 'JNB', 2: 'LHR', 3: 'JNB', 4: 'SIN'},
'Arrival Code': {0: 'JFK', 1: 'LHR', 2: 'JFK', 3: 'LHR', 4: 'LHR'},
'Departure_lat': {0: 51.5, 1: -26.1, 2: 51.5, 3: -26.1, 4: 1.3},
'Departure_lon': {0: -0.45, 1: 28.23, 2: -0.45, 3: 28.23, 4: 103.98},
'Arrival_lat': {0: 40.64, 1: 51.47, 2: 40.64, 3: 51.47, 4: 51.47},
'Arrival_lon': {0: -73.79, 1: -0.45, 2: -73.79, 3: -0.45, 4: -0.45}})
df_airports = pd.read_csv('https://ourairports.com/data/airports.csv')
df_airports = df_airports[['name', 'iata_code']].copy()
df_airports = df_airports[df_airports['iata_code'].notna()].reset_index(drop=True)
# df_airports.query('iata_code == "CDG" | iata_code == "LHR"')
df['Distance'] = df.apply(lambda x: get_distance(x['Departure_lat'], x['Departure_lon'], x['Arrival_lat'], x['Arrival_lon']), axis=1)
#df[['ap_dep', 'ap_arr']] = df['Normalised City Pair'].str.split(' - ', expand=True)
df_airports = df_airports.sort_values('name')
df_airports = df_airports.drop_duplicates(subset ='iata_code', keep='first')
df['dep_ap_name'] = df['Departure Code'].map(df_airports.set_index('iata_code')['name'])
df['arr_ap_name'] = df['Arrival Code'].map(df_airports.set_index('iata_code')['name'])
Output:
If the frame now has too many columns and you want something cleaner and to reorder the columns try final_df = df[['a', 'b', 'c', 'd']]
where abc are the columns and ordering you would like.
Post a Comment for "How To Clean Data So That The Correct Arrival Code Is There For The City Pair?"