Pathfinder Blog
Topic Archive: Search

Product Watch - Nusearch.com and Zohoshow

 

Two new AJAX flavored products launched today: Nusearch.com, a search engine with AJAX whizbang, and Zohoshow, the last piece in the Zoho Office Suite, a replacement for MS Office that includes Zohowriter.

As of this writing, it looks that Nusearch, with all it's AJAX innovations, has succumbed to the first day onslaught. I'll try and update this post when I manage to get through.

As for Zohoshow, it's basic, allows you to import existing Powerpoint files. Don't expect to do any charts or graphs or animations, though.

zohoshow.jpg

I've said before that presentations are just about the last thing for which I would want to depend on network connectivity. How many conference rooms have you been in that didn't give you access to the outside world?

Update 1: Nusearch came back up. Yes, there's some AJAX there, but in my opinion it's rather annoying. The search engine will let you preview their cached version of a page just by mousing over it's entry. It's so twitchy, though, because the table rows are close together. Also, the preview is really no better than a frame or iframe. Snooze. From a pure search perspective, it doesn't seem to do a very good job of sifting and ranking. The search results have a somewhat spammy feel to them.

nusearch.jpg

 

I like dzone's little popup screenshot much better.

dzone.jpg

Wanted: A Javascript Source Code Search Engine

 

I use quite a few search engines in my day-to-day work. I expect you do too. There's one that I use quite a bit to see what's happening in the Open Source world. No, it's not Freshmeat. It's Koders.

Koders.com allows you to search the source code of thousands of Open Source projects. It allows you to filter by language (and license). That's really very handy when your looking for coding examples. It also let's you see when and where someone is using a particular Javascript library in their project. Looking for Dojo and Scriptaculous, it is no surprise that Webwork and Rails show up.

Now endless hours of fun can be had with the Koders search engine, but what I'd really like is a search engine that allows me to search all of the Javascript on the web. It's out there, it's readable, it's searchable. Surely google or Yahoo or whoever already slurps in the Javascript in their crawls. Why not add a new google engine for searching Javascript?

To anyone looking for examples of Javascript or AJAX, the utility of such a search engine is self evident. It might make corporate IT managers a little nervous to have all of their logic flapping in the breeze. Let that be a lesson to not put your business logic in the UI.

If anyone know of such a search engine or of any hacks or tricks to make existing search engines give up their secrets, please let me know. Passing xmlhttprequest filetype:js into google gives a less than satisfying result.

Update 1: Building such a search engine shouldn't be all that hard. The pieces are already out there. See the Heritrix web crawler that the Internet Archive uses for it's work.

Again - Advertising and AJAX

Eric Picard comments on the impact of AJAX on online advertising. He covers the basics as far as what the difficulties are:

AJAX
stands for asynchronous JavaScript and XML, and it's the asynchronous
part that causes the problems. In Web pages created using AJAX, content
is loaded asynchronously with no browser refresh. This causes big
problems with counting page views and impressions: if a page never
reloads, it's difficult to define what a page view is. And if the page
is never reloaded, deciding when it's appropriate to refresh or load
new advertising is left to the developer's discretion. Similar issues
exist around software applications that include advertising.

That doesn't seem all that problematic, right? We can use AJAX to asynchronously load new ads. Further, we can drive advertising content off of user activity. Well, there's more to it than that. According to Eric, if your applications don't comply with industry standards, you will fail your impression audit which will adversely affect your ability to charge for online ads.

His solution? Avoid time based ad refreshing:

I've talked a lot with
people about time-based page refreshing and what's appropriate. This
isn't a new issue and is why the language exists in the current IAB
counting guidelines. But it's never been a simple one to deal with. My
general guidance is to avoid any kind of time-based advertising
refreshes if at all possible.

Event-based ad updating can be OK, it seems, if it corresponds to traditional webapp page loads. Should we root for a rapid update of the standards? Won't AJAX make the ads more intrusive, just like popups used to (and still do, in some cases)? Or can we hope that people will use the new technology to provide us with more pertinent offers?

Again - AJAX and Screen Scraping

My review on Monday of the Sprajax AJAX security scanner left me thinking about the limitations of the old school crawlers and spiders. In order to spider an AJAX application you would need to comprehensively model the behavior of a browser. That's more than just embedding Rhino or Spidermonkey into your scraper. You'd pretty much have to embed the whole browser and use it to identify all valid events that an XHTML document will accept at a given point.

The complex application state of something like an AJAX word processor could make it difficult to know when to stop spidering, i.e. when has our URL graph cycled back on itself.

There is at least one open source tool that allows one to at least record and run tests in the Firefox browser: Selenium. This terrific post by Grig Gheorghiu off of his Agile Testing blog explains some of the subtleties of using Selenium with AJAX. The tool may be a good place to start to build AJAX capable spiders and crawlers.

AJAX, Google and Screen Scraping

The ability to implement applications that are not page oriented, i.e. a single page is updated using DHTML, CSS, Javascript and XHR, complicates matters for search engines (not to mention screen scrapers). This is an area that is still evolving, but Backbase (backbase.com) already has some ideas on this topic.  They've published a whitepaper on this topic, entitled "Designing Rich Internet Applications For Search Engine Accessibility." They propose three methods of making your Ajax app searchable:

  • Lightweight Indexing: no structurally changes are made to your site; existing tags such as meta, title and h1 are leveraged.
  • Extra Link Strategy: extra links are placed on the site, which search bots can follow and thereby index the whole site.
  • Secondary Site Strategy: a secondary site is created, which is fully accessible to the search engine.

Lots of work, possibly, if you want to make your single page app searchable.

Dan Klyn has a different take on the matter, siting ROR (XML site description format) Google Base and Google Sitemaps as options.

Ajax bookmarks tie into this as well, for when you search for a particular term on google, you'd like to be able to navigate there directly. This article from OnJava.com suggests using the Real Simple History framework.

I suspect that we'll have some evolving standards around bookmarking, so it's probably too soon to put all of your search and bookmark eggs all in one basket. Now, what will the people who live off of screen scraping do? Their task has just become a little bit more difficult.

About Pathfinder

  • We design and build extraordinary applications for companies looking to make the next great idea a reality.
  • learn more

Topics

WordPress

Comments about this site: info@pathf.com