Skip to content Skip to sidebar Skip to footer

Python - ETFs Daily Data Web Scraping

I'm trying to web scrape some daily info of differents ETFs. I found that https://www.marketwatch.com/ have a accurate info. The most relevant info is the open Price, outstanding s

Solution 1:

If you use the package investpy you don't have to use web scraping to get the required data. investpy allows you to get daily ETF data. It also helps you to find an ETF by its ISIN (International Securities Identification Number):

investpy.search_etfs(by="isin", value="my_isin")

And that's the way you get the data:

investpy.get_etf_recent_data(etf=etf_name, country="my_country")

Solution 2:

Yes, I agree that Beautiful Soup is a good approach. Here is some Python code which uses the Beautiful Soup library to extract the intraday price from the IVV fund page:

import requests
from bs4 import BeautifulSoup

r = requests.get("https://www.marketwatch.com/investing/fund/ivv")
html = r.text

soup = BeautifulSoup(html, "html.parser")

if soup.h1.string == "Pardon Our Interruption...":
    print("They detected we are a bot. We hit a captcha.")
else:
    price = soup.find("h3", class_="intraday__price").find("bg-quote").string
    print(price)

The fact that the price changes frequently is not a problem. The names and classes of the HTML tags will remain constant. And this is all you need for Beautiful Soup to work.

Your main challenge is that the website is able to detect you are not using an Internet browser, and will display a captcha to your Python script. So you will need to find a method around this. Also, I recommend checking the legality of scraping and whether it violates their terms of service.

You can learn more about Beautiful Soup here:

https://www.crummy.com/software/BeautifulSoup/bs4/doc/


Post a Comment for "Python - ETFs Daily Data Web Scraping"