Python Dataframe Check If A Value In A Column Dataframe Is Within A Range Of Values Reported In Another Dataframe
Solution 1:
first_df['output'] = (second_df.code2_start <= first_df.code2) & (second_df.code2_end <= first_df.code2)
This works because when you do something like: second_df.code2_start <= first_df.code2
You get a boolean Series. If you then perform a logical AND on two of these boolean series, you get a Series which has value True
where both Series were True
and False
otherwise.
Here's an example:
>>> import pandas as pd
>>> a = pd.DataFrame([{1:2,2:4,3:6},{1:3,2:6,3:9},{1:4,2:8,3:10}])
>>> a['output'] = (a[2] <= a[3]) & (a[2] >= a[1])
>>> a
1 2 3 output
0 2 4 6 True
1 3 6 9 True
2 4 8 10 True
EDIT:
So based on your updated question and my new interpretation of your problem, I would do something like this:
import pandas as pd
# Define some data to work with
df_1 = pd.DataFrame([{'c1':1,'c2':5},{'c1':1,'c2':10},{'c1':1,'c2':20},{'c1':2,'c2':8}])
df_2 = pd.DataFrame([{'c1':1,'start':3,'end':6},{'c1':1,'start':7,'end':15},{'c1':2,'start':5,'end':15}])
# Function checks if c2 value is within any range matching c1 value
def checkRange(x, code_range):
idx = code_range.c1 == x.c1
code_range = code_range.loc[idx]
check = (code_range.start <= x.c2) & (code_range.end >= x.c2)
return check.any()
# Apply the checkRange function to each row of the DataFrame
df_1['output'] = df_1.apply(lambda x: checkRange(x, df_2), axis=1)
What I do here is define a function called checkRange
which takes as input x
, a single row of df_1
and code_range
, the entire df_2
DataFrame. It first finds the rows of code_range
which have the same c1
value as the given row, x.c1
. Then the non matching rows are discarded. This is done in the first 2 lines:
idx = code_range.c1 == x.c1
code_range = code_range.loc[idx]
Next, we get a boolean Series which tells us if x.c2
falls within any of the ranges given in the reduced code_range
DataFrame:
check = (code_range.start <= x.c2) & (code_range.end >= x.c2)
Finally, since we only care that the x.c2
falls within one of the ranges, we return the value of check.any()
. When we call any()
on a boolean Series, it will return True
if any of the values in the Series are True
.
To call the checkRange
function on each row of df_1
, we can use apply()
. I define a lambda expression in order to send the checkRange
function the row as well as df_2
. axis=1
means that the function will be called on each row (instead of each column) for the DataFrame.
Post a Comment for "Python Dataframe Check If A Value In A Column Dataframe Is Within A Range Of Values Reported In Another Dataframe"