XPath

XPath is used to select nodes from an XML or HTML document. It’s useful knowledge for selecting in Tritium. A node is tag or element within an HTML document — for example a <div> or an <a>.

Take this bit of Tritium, which removes every image node in the document:

$(".//img") {
  remove()
}

The string in quotes (within the “$()”) is XPath: .//img. We’ll break the components down: the “.” instructs XPath to look within the children of the currently selected node. The “//” means to look through every child node (i.e., children, grandchildren, great-grandchildren, etc.) to see if there is an image element.

Common XPath expressions

XPath allows for a variety of expressions to locate content within an XML or HTML document. There are several great external resources for learning XPath listed at the bottom of the page. Here are some common expressions:

/                   - selects from the root node
./                  - selects the children of the current node
../                 - selects the children of the *parent* of the current node
.//                 - selects *all* the children, grand-children of the current node
body/div[1]         - selects the first div child of the body
div[@class='one']   - selects divs with the class 'one'
./span | ./script   - selects all span and script children of the current node

“Current Node”

The examples rely on the relationships between nodes. So, they often include a condition about the “current node”. This is the last node selected.

$("./div") {
  $("")
}

In the Tritium above, we have selected a <div> and we are about to write some XPath to select another node. The “current node” in this case is the <div>. Any XPath will need to be relative to this node.

Examples

Take a look at the following HTML. Using XPath, the relationship between the nodes will be explained.

<body>
  <div>
    <span>
       <div></div>
    </span>
  </div>
  <span>
  </span>
</body>

Start with some simple Tritium:

$("/body") {
  
}

The single forward slash “/” selects the root node — in this case, the <body>. Anything written inside the curly braces is inside the <body>: that is to say the <body> is the current node.

Comparing “./” and “.//

Firstly, look at “./” :

$("./body") {
  $("./div")
}

The “./” means “search through the children of the current node.” The current node is the <body>, and there is only one child <div>: so this is the element that will be selected. The <div> within the <span> will not be selected.

$("/body") {
  $(".//div")
}

The “.//” means “search through the children, grandchildren, great-grandchildren of the current node”. The current node is the <body>, and all the <div>s within the <body> will be selected: this means both the direct <div> child and the <div> within the <span> will be selected.

Comparing “./” and “../

Starting out from a different “current node”, the following Tritium selects the <div> child of the <body>:

$("/body/div") {

}

Using “./”:

$("/body/div") {
  $("./span")
}

As we have seen, the “./” selects the direct children of the current node: in this case, the <span> that also contains the <div>.

$("/body/div") {
  $("../span")
}

The “../” searches the parent of the current node. The current node is the <div>, and its parent is the <body>. Therefore, the XPath will search the <body> for children (/) that are <span>s. This will return the <span> sibling of the <div>.

XPath and CSS

XPath is similar in concept to CSS; however its syntax is quite different.

Below is a brief side-by-side comparison of the different syntaxes, adapted from this article.

Goal                      CSS3                XPath
----                      ----                -----
All Elements              *                   //*
All P Elements            p                   //p
All Child Elements        p > *               //p/*
Element By ID             #foo                //*[@id='foo']
Element By Class          .foo                //*[contains(@class,'foo')]
Element With Attribute    *[title]            //*[@title]
First Child of All P      p > *:first-child   //p/*[0]
All P with an A child     Not possible        //p[a]
Next Element              p + *               //p/following-sibling::*[1]

Resources

Online Testers