How To Calculate Volume Weighted Average Price (vwap) Using A Pandas Dataframe With Ask And Bid Price?
Solution 1:
You can use np.where
to give you the price from the correct column (bid
or ask
) depending on the value in the trade
column. Note that this gives you the bid price when no trade occurs, but because this is then multiplied by a NaN
trade size it won't matter. I also forward filled the VWAP.
volume=df['trade_size']price=np.where(df['trade'].eq('ask'),df['ask'],df['bid'])df=df.assign(VWAP=((volume*price).cumsum()/vol.cumsum()).ffill())>>>dftimebid_sizebidaskask_sizetradetrade_sizephaseVWAP02019-01-07 07:45:01.064515495152.52152.5419NaNNaNOPENNaN12019-01-07 07:45:01.11007231152.53152.5419NaNNaNOPENNaN22019-01-07 07:45:01.11659632152.53152.5419NaNNaNOPENNaN32019-01-07 07:45:01.11686032152.53152.5421NaNNaNOPENNaN42019-01-07 07:45:01.11690534152.53152.5421NaNNaNOPENNaN52019-01-07 07:45:01.11698234152.53152.5431NaNNaNOPENNaN62019-01-07 07:45:01.14790138152.53152.5431NaNNaNOPENNaN72019-01-07 07:45:01.18997138152.53152.5431ask15.0OPEN152.5482019-01-07 07:45:01.18997138152.53152.5416NaNNaNOPEN152.5492019-01-07 07:45:01.19076637152.53152.5416NaNNaNOPEN152.54102019-01-07 07:45:01.19085637152.53152.5415NaNNaNOPEN152.54112019-01-07 07:45:01.19085637152.53152.5416ask1.0OPEN152.54122019-01-07 07:45:01.19393837152.53152.55108NaNNaNOPEN152.54132019-01-07 07:45:01.19393837152.53152.5415ask15.0OPEN152.54142019-01-07 07:45:01.1943262152.54152.55108NaNNaNOPEN152.54152019-01-07 07:45:01.1944532152.54152.5597NaNNaNOPEN152.54162019-01-07 07:45:01.1944796152.54152.5597NaNNaNOPEN152.54172019-01-07 07:45:01.19450719152.54152.5597NaNNaNOPEN152.54182019-01-07 07:45:01.19453219152.54152.5577NaNNaNOPEN152.54192019-01-07 07:45:01.19459819152.54152.5579NaNNaNOPEN152.54
Solution 2:
Here is one possible approach
Append VMAP
column full of NaN
s
df['VMAP'] = np.nan
Calculate VMAP
(based on this equation provided by the OP) and assign values based on ask
or bid
, as requierd by the OP
for trade in ['ask','bid']:
# Find indexes of `ask` or `buy`
bid_idx = df[df.trade==trade].index
# Slice DF based on `ask` or `buy`, using indexes
df.loc[bid_idx, 'VMAP'] = (
(df.loc[bid_idx, 'trade_size'] * df.loc[bid_idx, trade]).cumsum()
/
(df.loc[bid_idx, 'trade_size']).cumsum()
)
print(df.iloc[:,1:])
time bid_size bid ask ask_size trade trade_size phase VMAP
007:45:01.064515495152.52152.5419 NaN NaN OPEN NaN
107:45:01.11007231152.53152.5419 NaN NaN OPEN NaN
207:45:01.11659632152.53152.5419 NaN NaN OPEN NaN
307:45:01.11686032152.53152.5421 NaN NaN OPEN NaN
407:45:01.11690534152.53152.5421 NaN NaN OPEN NaN
507:45:01.11698234152.53152.5431 NaN NaN OPEN NaN
607:45:01.14790138152.53152.5431 NaN NaN OPEN NaN
707:45:01.18997138152.53152.5431 ask 15.0OPEN152.54807:45:01.18997138152.53152.5416 NaN NaN OPEN NaN
907:45:01.19076637152.53152.5416 NaN NaN OPEN NaN
1007:45:01.19085637152.53152.5415 NaN NaN OPEN NaN
1107:45:01.19085637152.53152.5416 ask 1.0OPEN152.541207:45:01.19393837152.53152.55108 NaN NaN OPEN NaN
1307:45:01.19393837152.53152.5415 ask 15.0OPEN152.541407:45:01.1943262152.54152.55108 NaN NaN OPEN NaN
1507:45:01.1944532152.54152.5597 NaN NaN OPEN NaN
1607:45:01.1944796152.54152.5597 NaN NaN OPEN NaN
1707:45:01.19450719152.54152.5597 NaN NaN OPEN NaN
1807:45:01.19453219152.54152.5577 NaN NaN OPEN NaN
1907:45:01.19459819152.54152.5579 NaN NaN OPEN NaN
EDIT
As @edinho
correctly indicated, the VMAP
is the same as the trade_price
column.
Solution 3:
Ok, here it is
df['trade_price'] = df.apply(lambda x: x['bid'] if x['trade']=='bid'else x['ask'], axis=1)
df['vwap'] = (df['trade_price'] * df['trade_size']).cumsum() / df['trade_size'].fillna(0).cumsum()
The first line:
It saves the trade_price in a new column, so it is easier to retrieve it later.
If you want, you can delete this line and make a function (maybe it is easier to read). But I prefer to see the intermediary results.
Q: why it has values even when there is no trade?
A: because of the way the lambda is written. The else
captures the ask
price. But it won't make a difference, because of the next step.
Second line:
Here the real calculation takes places.
The first part calculate the total volume traded until that moment (as you said, using cumulative sums makes life easier).
The second part calculates the total volume traded until that moment (again, cumulative sums).
If you want, you can break this line and make more intermediary columns.
Q: why the fillna(0)
?
A: so the total volume don't get NaNs
and you don't get a division error
Q: why so many NaNs
in the vwap
column?
A: Because of the lines that don't have trade. You can fill them with 0s
, but would be better to keep the 'no trade' information.
Ps.: you may get a wrong result as it is considering volume and price only in the same direction. But, you could try to invert some signal to fix the volume in the way you expect (for instance: changing the ask
price to negative).
and this code output:
trade_price vwap
1152.54NaN2152.54NaN3152.54NaN4152.54NaN5152.54NaN6152.54NaN7152.54NaN8152.54152.549152.54NaN10152.54NaN11152.54NaN12152.54152.5413152.55NaN14152.54152.5415152.55NaN16152.55NaN17152.55NaN18152.55NaN19152.55NaN20152.55NaN
Post a Comment for "How To Calculate Volume Weighted Average Price (vwap) Using A Pandas Dataframe With Ask And Bid Price?"