xpath笔记

作者:stoat 发布时间:2021-06-11 分类:技术

1、浏览器的copy xpath局限性挺大的
2、如果一堆兄弟节点里,拥有多个class,想要筛选出符合条件的节点方法:html.xpath('.//div[contains(@class,"c-container") or contains(@class,"b-container")]/text()')
3、如果一堆兄弟节点里,拥有多个class,想要剔除掉不符合条件的节点方法:html.xpath('.//div[contains(@class,"c-container") and not(contains(@class,"b-container"))]/text()')
4、输出标签的属性值:html.xpath('.//div[@id="first"]/a/@title
5、根据属性值做筛选:html.xpath('.//div[text()="水电费"]/@class')
6、如果当前节点下还有子节点,但是想输出当前节点的所有文本,包含其子节点的文本:

contents = html.xpath('//div[@id="third"]//h4')
lst = []
for e in contents:
    lst.append(e.xpath('string(.)').replace('\n','').strip())
print(lst)

附代码:

html_str = """ <body> <div class="container"><div id="first"><div class="onex c-container b-container">abc</div><br /><div class="onex c-container">def</div><br /><div class="onex b-container">ghi</div><br /><div class="one">都市</div><br /><div class="two">德玛西亚</div><br /><div class="two">王牌对王牌</div><br /> <a><div class="spex">特殊位置</div><br /> <div class="spex"><div class="spe">特殊位置</div></div></a></div><br /><div id="second"><div class="three">水电费</div><br /><div class="three">说的话房间不开封</div><br /><div class="four">三顿饭黑客技术</div></div><br /><div id="third"><div class="three">水电费</div><br /><div class="three">说的话房间开封</div><br /><div class="three"><h4 title="this is title"><span><em>ema</em> h4 title</span></h4></div></div></div> </body> """ from lxml import etree html = etree.HTML(html_str) print(html.xpath('.//div[contains(@class,"c-container") or contains(@class,"b-container")]/text()')) print(html.xpath('.//div[contains(@class,"c-container") and contains(@class,"b-container")]/text()')) print(html.xpath('.//div[@class="one"]/text()')) print(html.xpath('.//div[@id="first"]/div/text()')) print(html.xpath('.//div[@id="first"]//div/text()')) print(html.xpath('.//div[@id="first"]//div[@class="spe"]/text()')) print(html.xpath('.//div[@id="second"]/div[@class="three"]/text()')) print(html.xpath('.//div[@class="three"]/text()')) print(html.xpath('.//div[text()="水电费"]/@class')) print(html.xpath('.//div[@id="third"]//h4//text()')) print(html.xpath('.//div[@id="third"]//h4/@title')) print(html.xpath('.//div[@id="third"]//span/text()')) print(html.xpath('string(//div[@id="third"]//h4)')) xx = html.xpath('.//div[@id="first"]')[0] print(xx.xpath('//div/text()')) contents = html.xpath('//div[@id="third"]//h4') lst = [] for e in contents: lst.append(e.xpath('string(.)').replace('\n','').strip()) print(lst)

原文地址:xpath笔记 by 雪鼬博客

标签:xpath

评论已关闭