Skip to content Skip to sidebar Skip to footer

How To Scrape Latitude Longitude In Beautiful Soup

I am fairly new to BeautifulSoup4 and am having trouble extracting latitude and longitude values out of an html response from the below code. url = 'http://cinematreasures.org/thea

Solution 1:

Here is my approach:

import requests
import demjson
from bs4 import BeautifulSoup

url = 'http://cinematreasures.org/theaters/united-states?page=1'
page = requests.get(url)
soup = BeautifulSoup(page.text)

to_plain_coord = lambda d: (d['point']['lng'], d['point']['lat'])
# Grabbing theater coords if `data` attribute exists
coords = [
    to_plain_coord(demjson.decode(t.attrs['data']))
    for t in soup.select('.theater')
    if'data'in t.attrs]

print(coords)

I don't use any string manipulations. Instead I load JSON from data attribute. Unfortunately it's not quite valid JSON here, so I'm using demjson library for json parsing.

pip install demjson

Solution 2:

Okay, so you grab all the <tr>s correctly, now we just need to get the data attribute from each of them.

import re
import requests
from bs4 import BeautifulSoup

url = 'http://cinematreasures.org/theaters/united-states?page=1' 
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")
theaters = soup.findAll("tr", class_="theater")
data = [ t.get('data') for t in theaters if t.get('data') ]
print data

Unfortunately this gives you a list of strings, not a dictionary object like one might've hoped for. We can use regular expressions on the data strings to convert them to dicts (thanks RootTwo):

coords = []
for d in data:
    c = dict(re.findall(r'(lat|lng):\s*(-?\d{1,3}\.\d+)', d))
    coords.append(c)

Solution 3:

If you're expecting only a single response do:

print links[0]

Post a Comment for "How To Scrape Latitude Longitude In Beautiful Soup"