How To Clean Data So That The Correct Arrival Code Is There For The City Pair?

December 27, 2023 Post a Comment

How to clean data so that the correct arrival code is there for the city pair? From the picture, the CSV is like column 1: City Pair (Departure - Arrival), column 2 is meant to be

Solution 1:

As an example, and using the cleaner looking data from your other question...

Given:

..from your other question.

Try:

import pandas as pd
import numpy as np
import math
from math import sin, cos, sqrt, atan2, radians

def get_distance(in_lat1, in_lon1, in_lat2, in_lon2):
    # approximate radius of earth in km
    R = 6373.0

    lat1 = radians(in_lat1)
    lon1 = radians(in_lon1)
    lat2 = radians(in_lat2)
    lon2 = radians(in_lon2)

    dlon = lon2 - lon1
    dlat = lat2 - lat1

    a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
    c = 2 * atan2(sqrt(a), sqrt(1 - a))

    distance = R * c

    return distance

df = pd.DataFrame({'Normalised City Pair': {0: 'London, United Kingdom - New York, United States',
  1: 'Johannesburg, South Africa - London, United Kingdom',
  2: 'London, United Kingdom - New York, United States',
  3: 'Johannesburg, South Africa - London, United Kingdom',
  4: 'London, United Kingdom - Singapore, Singapore'},
 'Departure Code': {0: 'LHR', 1: 'JNB', 2: 'LHR', 3: 'JNB', 4: 'SIN'},
 'Arrival Code': {0: 'JFK', 1: 'LHR', 2: 'JFK', 3: 'LHR', 4: 'LHR'},
 'Departure_lat': {0: 51.5, 1: -26.1, 2: 51.5, 3: -26.1, 4: 1.3},
 'Departure_lon': {0: -0.45, 1: 28.23, 2: -0.45, 3: 28.23, 4: 103.98},
 'Arrival_lat': {0: 40.64, 1: 51.47, 2: 40.64, 3: 51.47, 4: 51.47},
 'Arrival_lon': {0: -73.79, 1: -0.45, 2: -73.79, 3: -0.45, 4: -0.45}})

df_airports = pd.read_csv('https://ourairports.com/data/airports.csv')
df_airports = df_airports[['name', 'iata_code']].copy()
df_airports = df_airports[df_airports['iata_code'].notna()].reset_index(drop=True)
# df_airports.query('iata_code == "CDG" | iata_code == "LHR"')

df['Distance'] = df.apply(lambda x: get_distance(x['Departure_lat'], x['Departure_lon'], x['Arrival_lat'], x['Arrival_lon']), axis=1)

#df[['ap_dep', 'ap_arr']] = df['Normalised City Pair'].str.split(' - ', expand=True)

df_airports = df_airports.sort_values('name')

df_airports = df_airports.drop_duplicates(subset ='iata_code', keep='first')

df['dep_ap_name'] = df['Departure Code'].map(df_airports.set_index('iata_code')['name'])
df['arr_ap_name'] = df['Arrival Code'].map(df_airports.set_index('iata_code')['name'])

Output:

If the frame now has too many columns and you want something cleaner and to reorder the columns try final_df = df[['a', 'b', 'c', 'd']] where abc are the columns and ordering you would like.

Python College

How To Clean Data So That The Correct Arrival Code Is There For The City Pair?

Solution 1:

Post a Comment for "How To Clean Data So That The Correct Arrival Code Is There For The City Pair?"