Skip to content Skip to sidebar Skip to footer

Python Dataframe Check If A Value In A Column Dataframe Is Within A Range Of Values Reported In Another Dataframe

Apology if the problemis trivial but as a python newby I wasn't able to find the right solution. I have two dataframes and I need to add a column to the first dataframe that is tru

Solution 1:

first_df['output'] = (second_df.code2_start <= first_df.code2) & (second_df.code2_end <= first_df.code2)

This works because when you do something like: second_df.code2_start <= first_df.code2

You get a boolean Series. If you then perform a logical AND on two of these boolean series, you get a Series which has value True where both Series were True and False otherwise.

Here's an example:

>>> import pandas as pd
>>> a = pd.DataFrame([{1:2,2:4,3:6},{1:3,2:6,3:9},{1:4,2:8,3:10}])
>>> a['output'] = (a[2] <= a[3]) & (a[2] >= a[1])
>>> a
   1  2   3 output
0  2  4   6   True
1  3  6   9   True
2  4  8  10   True

EDIT:

So based on your updated question and my new interpretation of your problem, I would do something like this:

import pandas as pd

# Define some data to work with
df_1 = pd.DataFrame([{'c1':1,'c2':5},{'c1':1,'c2':10},{'c1':1,'c2':20},{'c1':2,'c2':8}])
df_2 = pd.DataFrame([{'c1':1,'start':3,'end':6},{'c1':1,'start':7,'end':15},{'c1':2,'start':5,'end':15}])

# Function checks if c2 value is within any range matching c1 value
def checkRange(x, code_range):
    idx = code_range.c1 == x.c1
    code_range = code_range.loc[idx]
    check = (code_range.start <= x.c2) & (code_range.end >= x.c2)
    return check.any()

# Apply the checkRange function to each row of the DataFrame
df_1['output'] = df_1.apply(lambda x: checkRange(x, df_2), axis=1)

What I do here is define a function called checkRange which takes as input x, a single row of df_1 and code_range, the entire df_2 DataFrame. It first finds the rows of code_range which have the same c1 value as the given row, x.c1. Then the non matching rows are discarded. This is done in the first 2 lines:

idx = code_range.c1 == x.c1
code_range = code_range.loc[idx]

Next, we get a boolean Series which tells us if x.c2 falls within any of the ranges given in the reduced code_range DataFrame:

check = (code_range.start <= x.c2) & (code_range.end >= x.c2)

Finally, since we only care that the x.c2 falls within one of the ranges, we return the value of check.any(). When we call any() on a boolean Series, it will return True if any of the values in the Series are True.

To call the checkRange function on each row of df_1, we can use apply(). I define a lambda expression in order to send the checkRange function the row as well as df_2. axis=1 means that the function will be called on each row (instead of each column) for the DataFrame.


Post a Comment for "Python Dataframe Check If A Value In A Column Dataframe Is Within A Range Of Values Reported In Another Dataframe"