- We design and build extraordinary applications for companies looking to make the next great idea a reality.
- learn more
Using Sarissa to Scrape HTML with Ajax
I've been playing around with more disruptive technology lately: Bookmarklets. What is a Bookmarklet? That's where you bookmark a hunk of JavaScript rather than a URL. It allows you to execute JavaScript within the context of the currently loaded page.
That makes for a different sort of BJAX -- instead of Browser extensions and Ajax, we have Bookmarklets and Ajax. Why disruptive? They let you manipulate a third party's website or application. I'll be publishing some cool Bookmarklets stuff in a few days -- an example that will make use of the dynamic script tag. If an online service, like Yahoo, provides JSON Web services that can be run as scripts, you can use their own services to manipulate their own site. For example, you could include weather information on the Yahoo movie showtimes page by means of a Bookmarklet and Yahoo's weather Web service. That way you never forget to bring an umbrella to the movies if it's raining.
But what if your target site isn't as friendly or as accommodating as Yahoo? Well, these sites do generally still provide structured data services over HTTP -- their HTML webpages, dynamic or otherwise. Once you've used your Bookmarklet to insert the necessary JavaScript into the third-party webpage, you can use a simple XMLHttpRequest to download other HTML from the site and, after appropriate processing, insert it into the target page. It's that "appropriate processing" that can be a bit difficult. Websites generally don't return their HTML with Content-type text/xml, and thus you won't find a nicely parsed XML DOM sitting in responseXML. You'll have to pry the data out of responseText instead.
There are enough cross browser differences and sticky wickets to make this kind of XML processing unpleasant. Enter Sarissa, a very handy cross browser XML processing JavaScript library. It allows you to do things like the following:
var serializer = new XMLSerializer();var doc = (new DOMParser()).parseFromString(xhr.responseText, "text/xml");var content = doc.getElementsByTagName("div")[0];elem.innerHTML = serializer.serializeToString(content);
XPATH can also be used here to select elements, obviously. No, nothing earth shattering here, but enough convenience for XML processing to allow you to focus on writing functionality rather than cross browser support.
Topics: Javascript Libraries
Comments: 1 so far
Leave a comment
About Pathfinder
Recent
- Automated Deployments Rock
- Bandwidth profiling Flex projects and more with Charles
- iPhone SDK: UIViewController Testing & TDD
- Icons are evil; so are menus - unless you do them right
- The Truth About Designing For Security
- GWT, Gadgets and OpenSocial, Part 2
- Has Many has_many: A Refactoring Story
- The Hidden Power of Canvas
- Review of fixture_replacement2 plugin
- Chess Game Viewer in GWT
Archives
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- September 2006
- August 2006
- July 2006
- June 2006
- May 2006
- April 2006
- March 2006


hi, can u provide some working example!
it will be very useful.
Comment by sagar, Wednesday, November 29, 2006 @ 1:32 am