agile-ajax

101 Ideas for JSONP - Idea #4: Scraping XML with XPath, Part 1

It's been a while since the last installment of JSONP. Lots of good stuff has happened in the meantime to make our lives easier, but some things are still brutish, ugly and complex. One of the nice things in the more recent version of the JDK is the ever improving support for XML processing. Long gone are the days when you had to hunt around for libraries and hope that your version of the JDK worked with your third party libs. There are still reasons, like performance and extended features, that would cause you to pick an alternate library, but if you just want to add a little bit of XML processing into your apps, things couldn't be easier.

One of the parts of the Java XML API's that I find somewhat
cumbersome is name space processing. It would be nice if namespaces
could be discovered dynamically, but it seems we need to lob in a NamespaceContext to make the machinery happy. So, on to one of my favorite utility classes: MappedNamespaceContext

public class MappedNamespaceContext implements NamespaceContext {

	private Map<String, String> uriMap = new HashMap<String, String>();

	public void addUri(String prefix, String uri) {		uriMap.put(prefix, uri);	}

	public String getNamespaceURI(String prefix) {		if (prefix == null) throw new NullPointerException("Null prefix");                else if (uriMap.containsKey(prefix)) return uriMap.get(prefix);                else if ("xml".equals(prefix)) return XMLConstants.XML_NS_URI;                return XMLConstants.NULL_NS_URI;	}

	public String getPrefix(String namespaceURI) {		throw new UnsupportedOperationException();	}

	public Iterator getPrefixes(String namespaceURI) {		throw new UnsupportedOperationException();	}

}

This way I don't have to implement NamespaceContext anew each time I process a different XML source. How to use this beastie?

// parse the docDocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();domFactory.setNamespaceAware(true); DocumentBuilder builder = domFactory.newDocumentBuilder();Document doc = builder.parse(is);

// setup the namespaces for Yahoo! Weather RSSMappedNamespaceContext ctx = new MappedNamespaceContext();ctx.addUri("yweather", "http://xml.weather.yahoo.com/ns/rss/1.0");ctx.addUri("geo","http://www.w3.org/2003/01/geo/wgs84_pos#");

// get the resultXPathFactory factory = XPathFactory.newInstance();XPath xpath = factory.newXPath();xpath.setNamespaceContext(ctx);XPathExpression expr = xpath.compile(xpathExpression);NodeList result = (NodeList)expr.evaluate(doc, XPathConstants.NODESET);

Next time, we will actually put this thing to work to scrape Yahoo! Weather and mash it up with Google Maps.

Technorati Tags: , , , ,

Topics: , ,

Leave a comment

Powered by WP Hashcash

Who is Pathfinder?

Topics

Search

WordPress

Comments about this site: info@pathf.com