Say you have HTML similar to the following:
<div style="background-image: url('https://some.domain/image')"></div>
and you want to extract https://some.domain/image
using XPath. With XPath 2.0, you can select the URL with something like
select-before(select-after(//div/@style, "backgound-image: url("), ")")
but, when using XPath 1.0, this fails — I think it’s due to nested functions not being supported in XPath 1.0, but I have been unable to find documentation to confirm that. Is there a way to accomplish this using XPath 1.0?
A quick Google suggests what you have.
If the code you have quoted is verbatim what you have tried, seems like you need to extract the parentheses and possibly a single or double quote, depending on the source css. The example source you have given has a single quote.
select-before(select-after(//div/@style, "backgound-image: url("), ")")
Should be (notice the extra
'
relating tourl('...url')
)select-before(select-after(//div/@style, "backgound-image: url('"), "')")
But I don’t think that would cause xpath to fail… It would just extract the wrong value
Edit:
Further reading suggests xpath 1.0 does have limited functionalities. But, like you, can’t find anything concrete.If the code you have quoted is verbatim what you have tried, seems like you need to extract the parentheses and possibly a single or double quote, depending on the source css. […] But I don’t think that would cause xpath to fail… It would just extract the wrong value
Ah, yeah, that wasn’t intentional — my bad! But, yeah, that would just result in the wrong value being returned, not the expression failing.
Asking just because I’m curious… why are you using xpath?
Also, is this for a website you control or for some else’s website?
If you’re rendering the page (in a browser, e2e test-runner, spider bot, etc…), have you considered running some js on the page to get the image? Something like:
const imagePath = document.getElementById('exampleIdOnElement').style.backgroundImage
Asking just because I’m curious… why are you using xpath?
I’m using a service called FreshRSS that automatically fetches RSS feeds. It has a feature that allows you to create custom feeds for sites by scraping the HTML with user specified XPath expressions.
I know that this isn’t exactly “web development”, but it uses webdev tools, and I wasn’t entirely sure where else to post this.
If you’re rendering the page (in a browser, e2e test-runner, spider bot, etc…), have you considered running some js on the page to get the image? Something like: const imagePath = document.getElementById(‘exampleIdOnElement’).style.backgroundImage
JS is, unfortunately, not possible here. I can only use XPath expressions.