How To Calculate Volume Weighted Average Price (vwap) Using A Pandas Dataframe With Ask And Bid Price?
Solution 1:
You can use np.where to give you the price from the correct column (bid or ask) depending on the value in the trade column. Note that this gives you the bid price when no trade occurs, but because this is then multiplied by a NaN trade size it won't matter. I also forward filled the VWAP.
volume=df['trade_size']price=np.where(df['trade'].eq('ask'),df['ask'],df['bid'])df=df.assign(VWAP=((volume*price).cumsum()/vol.cumsum()).ffill())>>>dftimebid_sizebidaskask_sizetradetrade_sizephaseVWAP02019-01-07 07:45:01.064515495152.52152.5419NaNNaNOPENNaN12019-01-07 07:45:01.11007231152.53152.5419NaNNaNOPENNaN22019-01-07 07:45:01.11659632152.53152.5419NaNNaNOPENNaN32019-01-07 07:45:01.11686032152.53152.5421NaNNaNOPENNaN42019-01-07 07:45:01.11690534152.53152.5421NaNNaNOPENNaN52019-01-07 07:45:01.11698234152.53152.5431NaNNaNOPENNaN62019-01-07 07:45:01.14790138152.53152.5431NaNNaNOPENNaN72019-01-07 07:45:01.18997138152.53152.5431ask15.0OPEN152.5482019-01-07 07:45:01.18997138152.53152.5416NaNNaNOPEN152.5492019-01-07 07:45:01.19076637152.53152.5416NaNNaNOPEN152.54102019-01-07 07:45:01.19085637152.53152.5415NaNNaNOPEN152.54112019-01-07 07:45:01.19085637152.53152.5416ask1.0OPEN152.54122019-01-07 07:45:01.19393837152.53152.55108NaNNaNOPEN152.54132019-01-07 07:45:01.19393837152.53152.5415ask15.0OPEN152.54142019-01-07 07:45:01.1943262152.54152.55108NaNNaNOPEN152.54152019-01-07 07:45:01.1944532152.54152.5597NaNNaNOPEN152.54162019-01-07 07:45:01.1944796152.54152.5597NaNNaNOPEN152.54172019-01-07 07:45:01.19450719152.54152.5597NaNNaNOPEN152.54182019-01-07 07:45:01.19453219152.54152.5577NaNNaNOPEN152.54192019-01-07 07:45:01.19459819152.54152.5579NaNNaNOPEN152.54Solution 2:
Here is one possible approach
Append VMAP column full of NaNs
df['VMAP'] = np.nan
Calculate VMAP (based on this equation provided by the OP) and assign values based on ask or bid, as requierd by the OP
for trade in ['ask','bid']:
# Find indexes of `ask` or `buy`
bid_idx = df[df.trade==trade].index
# Slice DF based on `ask` or `buy`, using indexes
df.loc[bid_idx, 'VMAP'] = (
(df.loc[bid_idx, 'trade_size'] * df.loc[bid_idx, trade]).cumsum()
/
(df.loc[bid_idx, 'trade_size']).cumsum()
)
print(df.iloc[:,1:])
time bid_size bid ask ask_size trade trade_size phase VMAP
007:45:01.064515495152.52152.5419 NaN NaN OPEN NaN
107:45:01.11007231152.53152.5419 NaN NaN OPEN NaN
207:45:01.11659632152.53152.5419 NaN NaN OPEN NaN
307:45:01.11686032152.53152.5421 NaN NaN OPEN NaN
407:45:01.11690534152.53152.5421 NaN NaN OPEN NaN
507:45:01.11698234152.53152.5431 NaN NaN OPEN NaN
607:45:01.14790138152.53152.5431 NaN NaN OPEN NaN
707:45:01.18997138152.53152.5431 ask 15.0OPEN152.54807:45:01.18997138152.53152.5416 NaN NaN OPEN NaN
907:45:01.19076637152.53152.5416 NaN NaN OPEN NaN
1007:45:01.19085637152.53152.5415 NaN NaN OPEN NaN
1107:45:01.19085637152.53152.5416 ask 1.0OPEN152.541207:45:01.19393837152.53152.55108 NaN NaN OPEN NaN
1307:45:01.19393837152.53152.5415 ask 15.0OPEN152.541407:45:01.1943262152.54152.55108 NaN NaN OPEN NaN
1507:45:01.1944532152.54152.5597 NaN NaN OPEN NaN
1607:45:01.1944796152.54152.5597 NaN NaN OPEN NaN
1707:45:01.19450719152.54152.5597 NaN NaN OPEN NaN
1807:45:01.19453219152.54152.5577 NaN NaN OPEN NaN
1907:45:01.19459819152.54152.5579 NaN NaN OPEN NaN
EDIT
As @edinhocorrectly indicated, the VMAP is the same as the trade_price column.
Solution 3:
Ok, here it is
df['trade_price'] = df.apply(lambda x: x['bid'] if x['trade']=='bid'else x['ask'], axis=1)
df['vwap'] = (df['trade_price'] * df['trade_size']).cumsum() / df['trade_size'].fillna(0).cumsum()
The first line:
It saves the trade_price in a new column, so it is easier to retrieve it later.
If you want, you can delete this line and make a function (maybe it is easier to read). But I prefer to see the intermediary results.
Q: why it has values even when there is no trade?
A: because of the way the lambda is written. The else captures the ask price. But it won't make a difference, because of the next step.
Second line:
Here the real calculation takes places.
The first part calculate the total volume traded until that moment (as you said, using cumulative sums makes life easier).
The second part calculates the total volume traded until that moment (again, cumulative sums).
If you want, you can break this line and make more intermediary columns.
Q: why the fillna(0)?
A: so the total volume don't get NaNs and you don't get a division error
Q: why so many NaNs in the vwap column?
A: Because of the lines that don't have trade. You can fill them with 0s, but would be better to keep the 'no trade' information.
Ps.: you may get a wrong result as it is considering volume and price only in the same direction. But, you could try to invert some signal to fix the volume in the way you expect (for instance: changing the ask price to negative).
and this code output:
trade_price vwap
1152.54NaN2152.54NaN3152.54NaN4152.54NaN5152.54NaN6152.54NaN7152.54NaN8152.54152.549152.54NaN10152.54NaN11152.54NaN12152.54152.5413152.55NaN14152.54152.5415152.55NaN16152.55NaN17152.55NaN18152.55NaN19152.55NaN20152.55NaN
Post a Comment for "How To Calculate Volume Weighted Average Price (vwap) Using A Pandas Dataframe With Ask And Bid Price?"