Justkez.com Ruby, Geospatial, Data-viz and life.


Nokogiri and XPath Partial Attribute Matching

Written on 18 Oct 2011 by Kester Dobson

Nokogiri is a hugely powerful XML (and thus HTML) parser for Ruby. I use it for consuming pretty much anything with HTML in it, and even via the excellent FeedZirra for processing feeds.

It also has great XPath support and makes partial attribute matching a breeze.

For example, to find all links to Amazon.com in a HTML document:

require 'open-uri'
require 'nokogiri'

doc = Nokogiri::HTML(open('http://mydomain.com').read())
aLinks = doc.xpath("//a[contains(@href, 'www.amazon.com')]")

You can then iterate through aLinks to do any additional filtering.

Nokogiri also supports the starts-with and ends-width in addition to the contains above.

There is also an excellent snippet over here on Stackoverflow talking about how to partial match node content values - so you could easily do a partial match on link anchor text (in the above example).

Comments/Discussion

blog comments powered by Disqus --