101 Ideas for JSONP – Idea #4: Scraping XML with XPath, Part 1

It's been a while since the last installment of JSONP. Lots of good stuff has happened in the meantime to make our lives easier, but some things are still brutish, ugly and complex. One of the nice things in the more recent version of the JDK is the ever improving support for XML processing. Long gone are the days when you had to hunt around for libraries and hope that your version of the JDK worked with your third party libs. There are still reasons, like performance and extended features, that would cause you to pick an alternate library, but if you just want to add a little bit of XML processing into your apps, things couldn't be easier.

One of the parts of the Java XML API's that I find somewhat
cumbersome is name space processing. It would be nice if namespaces
could be discovered dynamically, but it seems we need to lob in a NamespaceContext to make the machinery happy. So, on to one of my favorite utility classes: MappedNamespaceContext

public class MappedNamespaceContext implements NamespaceContext {

	private Map<String, String> uriMap = new HashMap<String, String>();

	public void addUri(String prefix, String uri) {		uriMap.put(prefix, uri);	}

	public String getNamespaceURI(String prefix) {		if (prefix == null) throw new NullPointerException("Null prefix");                else if (uriMap.containsKey(prefix)) return uriMap.get(prefix);                else if ("xml".equals(prefix)) return XMLConstants.XML_NS_URI;                return XMLConstants.NULL_NS_URI;	}

	public String getPrefix(String namespaceURI) {		throw new UnsupportedOperationException();	}

	public Iterator getPrefixes(String namespaceURI) {		throw new UnsupportedOperationException();	}

}

This way I don't have to implement NamespaceContext anew each time I process a different XML source. How to use this beastie?

// parse the docDocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();domFactory.setNamespaceAware(true); DocumentBuilder builder = domFactory.newDocumentBuilder();Document doc = builder.parse(is);

// setup the namespaces for Yahoo! Weather RSSMappedNamespaceContext ctx = new MappedNamespaceContext();ctx.addUri("yweather", "http://xml.weather.yahoo.com/ns/rss/1.0");ctx.addUri("geo","http://www.w3.org/2003/01/geo/wgs84_pos#");

// get the resultXPathFactory factory = XPathFactory.newInstance();XPath xpath = factory.newXPath();xpath.setNamespaceContext(ctx);XPathExpression expr = xpath.compile(xpathExpression);NodeList result = (NodeList)expr.evaluate(doc, XPathConstants.NODESET);

Next time, we will actually put this thing to work to scrape Yahoo! Weather and mash it up with Google Maps.

Technorati Tags: , , , ,

Related posts:

  1. 101 Ideas for JSONP – Idea #3: Scraping HTML With TagSoup and XQuery
  2. 101 Ideas for JSONP – Idea #2: Wrap XML in JSONP
  3. 101 Ideas for JSONP – Idea #1: RSS to JSONP
  4. Grails and JSONP: How Easy is That?
  5. Using Sarissa to Scrape HTML with Ajax

Topics: , ,

Leave a comment

Powered by WP Hashcash

Launch: Pathfinder Newsletter

    Get a monthly update on best practices for delivering successful software.

    Subscribe via email


    Subscribe via RSS      RSS icon

Topics

Search

WordPress

Comments about this site: info@pathf.com