Following Siblings Selenium Python With Conditions

August 09, 2024 Post a Comment

I'm trying to collect following siblings until a certain sibling, But I still can't figure out how to do it, I tried to locate before and after sibling with class name but I got wr

Solution 1:

Regarding your expected output, why don't you extract the text from all span elements since they are already in order ? For example, with LXML :

data=tree.xpath("//span/text()")
print(*data, sep="\n")

Output :

2August2020123415August202056

If you really want to use loops and create a dictionnary, here's a proposal. First, the data :

data = """<divclass="MainClass"><divclass="InfoClass"><divclass="left-wrap"><spanclass="date">2 August 2020</span></div></div><divclass="DataClass"><emclass="Code"><span>1</span></em></div><divclass="DataClass"><emclass="Code"><span>2</span></em></div><divclass="DataClass"><emclass="Code"><span>3</span></em></div><divclass="DataClass"><emclass="Code"><span>4</span></em></div><divclass="InfoClass"><divclass="left-wrap"><spanclass="date">15 August 2020</span></div></div><divclass="DataClass"><emclass="Code"><span>5</span></em></div><divclass="DataClass"><emclass="Code"><span>6</span></em></div></div>"""

Then, the code :

import lxml.html
tree = lxml.html.fromstring(data)

dates = [el.text for el in tree.xpath("//span[@class='date']")]
print(dates)

dc=[]
for els in dates:
    lists=[el.text for el in tree.xpath("//div[span[text()='"+els+"']]/../following-sibling::div[@class='DataClass']//span[preceding::span[@class='date'][1][.='"+els+"']]")]
    dc.append(lists)

print(dc)

dictionary = dict(zip(dates,dc))
print(dictionary)

Comments :

First, you extract the dates in a list. Then, all rely upon the following XPath (the one you were looking for ?) to get the corresponding dataclasses :

//div[span[text()='"+els+"']]/../following-sibling::div[@class='DataClass']//span[preceding::span[@class='date'][1][.='"+els+"']]

+els+ are the dates previously fetched.

Finally, you construct the dictionnary. This code is written for LXML. Just replace the tree.xpath with the Selenium equialent(driver.find_elements_by_xpath) to make it work.

Output (dates, dataclasses, dictionnary) :

['2 August 2020', '15 August 2020']
[['1', '2', '3', '4'], ['5', '6']]
{'2 August 2020': ['1', '2', '3', '4'], '15 August 2020': ['5', '6']}

EDIT : If you need to print the dictionnary, you can use :

forkeys,values in dictionary.items():
    print(keys)
    print(*values,sep='\n')

Output as requested :

2August2020123415August202056

Solution 2:

You can use same simple code as for previous question but using list to collect correct values if .Code is not unique. It work also if 2 August 2020 and 15 August 2020 will same code

codes = list()
for e in driver.find_elements_by_class_name('Code'):
    code = e.text
    date = e.find_element_by_xpath("(./preceding::span[@class='date'])[last()]").text
    codes.append({"date": date, "code": code})

for c in codes:
    print(f'date: {c["date"]}, code: {c["code"]}')

The output:

date:2August2020,code:1date:2August2020,code:2date:2August2020,code:3date:2August2020,code:4date:15August2020,code:5date:15August2020,code:6

If you want dict with date as a key and codes as values:

codes = dict()
for e in driver.find_elements_by_class_name('Code'):
    code = e.text
    date = e.find_element_by_xpath("(./preceding::span[@class='date'])[last()]").text
    ifdatein codes:
        codes[date].append(code)
    else:
        codes.update({date: [code]})

for k, v in codes.items():
    print(f'{k} : {v}')

With output:

2August2020 : ['1', '2', '3', '4']15August2020 : ['5', '6']

Solution 3:

I have found a way that will display the text you want it.

mainClassText = driver.find_element_by_xpath("//div[@class='MainClass']").text
print(mainClassText)

if you want you can also turn this into list.

mainClassTextList = mainClassText.split("\n")
for ele in mainClassTextList:
    print(ele)

It will be displayed in both cases:

2August2020123415August202056

Solution 4:

As all the divs containing date and data are at same level under MainClass div. We can get desired result uisng one generic xpaths for all spans containing date and data.

 driver = webdriver.Chrome()
driver.get("https://bilalzamel.htmlsave.net/")

mainClass = driver.find_elements_by_xpath("//div[@class='MainClass']//span")
for mc in mainClass:
    kDate = mc.text
    print(kDate)

Python College

Following Siblings Selenium Python With Conditions

Solution 1:

Solution 2:

Solution 3:

Solution 4:

Post a Comment for "Following Siblings Selenium Python With Conditions"