Skip to content Skip to sidebar Skip to footer

How Can I Get Past A Password Protected Aspx Website In Order To Parse It Programatically?

My problem here is that the action method in the form tag is not a URL so the Python requests module won't work in order for me to parse the HTML that is beyond the login. The url

Solution 1:

Its difficult to do with simple web scraping tools - you have to capture and re-send your authentication tokens after you spoof the log in. For anything but trivial web scraping, it is worth your time to use selenium. Since your question was generic, I'll give an overview. Let me know if you need specifics.

  1. Create a connection to the web site using selenium (I've had best luck with the gecko driver.)
  2. Navigate to the login page
  3. Use selenium to simulate a user's entering the uname and pw, the simulate pressing 'ok' (or whatever).
  4. Navigate where you want to go next either by simulating clicking links or by the direct url. Since you are using the same logged-in connection, it will let you get there now, just as if you were logged in.
  5. Pass your html to beautiful soup and scrape away.

Here is an example of my code -- hope it helps.

logger.debug(f"{login_info.user_id}: Logging In")

options = webdriver.FirefoxOptions()
if"Linux"in platform.system():
    path = "path/to/geckodriver"else:
    path = "path/to/geckodriver.exe"if headless:
    options.add_argument("-headless")
driver = webdriver.Firefox(executable_path=path, firefox_options=options)
driver.implicitly_wait(10)


driver.get(login_info.host)
uname_box = driver.find_element_by_name('txtLoginUserID')
pw_box = driver.find_element_by_name('txtLoginPassword')
login_btn = driver.find_element_by_name('btnLogin')

uname_box.send_keys(login_info.username)
pw_box.send_keys(login_info.password)
time.sleep(1)
login_btn.click()

Post a Comment for "How Can I Get Past A Password Protected Aspx Website In Order To Parse It Programatically?"