Again – AJAX and Screen Scraping

My review on Monday of the Sprajax AJAX security scanner left me thinking about the limitations of the old school crawlers and spiders. In order to spider an AJAX application you would need to comprehensively model the behavior of a browser. That's more than just embedding Rhino or Spidermonkey into your scraper. You'd pretty much have to embed the whole browser and use it to identify all valid events that an XHTML document will accept at a given point.

The complex application state of something like an AJAX word processor could make it difficult to know when to stop spidering, i.e. when has our URL graph cycled back on itself.

There is at least one open source tool that allows one to at least record and run tests in the Firefox browser: Selenium. This terrific post by Grig Gheorghiu off of his Agile Testing blog explains some of the subtleties of using Selenium with AJAX. The tool may be a good place to start to build AJAX capable spiders and crawlers.

Related posts:

  1. AJAX, Google and Screen Scraping
  2. Ajax Testing: Doubling Down with Selenium and JMeter
  3. Sprajax? Security Scanner for AJAX
  4. Improving Test Coverage of Ajax Applications
  5. “Ajax overhaul, Part 4: Retrofit existing sites with jQuery and Ajax forms” now live at IBM developerWorks

Comments: 2 so far

  1. You can also check SWExplorerAutomation SWEA (http:\\webiussoft.com). SWEA fully supports the AJAX Web applications.

    Comment by Alex, Wednesday, January 31, 2007 @ 11:41 am

  2. Thanks guys that was extremely helpful!

    sandeep verma
    (http://sandeepverma.wordpress.com)

    Comment by sandeep verma, Monday, May 25, 2009 @ 11:10 pm

Leave a comment

Powered by WP Hashcash

Launch: Pathfinder Newsletter

    Get a monthly update on best practices for delivering successful software.

    Subscribe via email


    Subscribe via RSS      RSS icon

Topics

Search

WordPress

Comments about this site: info@pathf.com