Error In Regex Formulation For Web Scraping In Python
I am trying to scrape some information from a website. I require 8 fields of information, I have got it for 5 fields, but 3 fields are always coming empty. I think there is some mi
Solution 1:
patFinderAddress = re.compile('<td><spanclass="label">Address:</span></td>.*?</td>'
patFinderPhone = re.compile('<td><spanclass="label">Phone:</span>\s*</td><td>\s*^\s*.*\s*^\s*.*<br>',re.M)
patFinderFax = re.compile('<td><spanclass="label">FAX:</span>\s*</td><td>\s*^\s*.*\s*^\s*.*</td>',re.M)
Here's the some regexs that work with your data. The last two weren't working as the data spanned multiple lines. The first didn't work because it was wrong.
But, for html parsing, use an html parser as it's far more robust and gives you the data you want rather than this eyesore of html strings.
Post a Comment for "Error In Regex Formulation For Web Scraping In Python"