John Udell writes, adding a little structure to HTML content elicits a knowledge management payoff. The reason that good Web searching that exploits structured content is not available can be attributed to non availability of easy-to-use writing tools to create well-formed XML content. So there aren’t many pools that can be plumbed with XML-aware search technology. Udell beleives,by implementing content-aware search against existing repositories, you can show people the tangible benefits of more expressive content. Mixing tags with free-text search can bring the promise of XML that much closer to reality.
In the early days of XML, smart search was often cited as a key benefit. Instead of just trawling for single-celled keywords in an ocean of undifferentiated text, the story went, we'd navigate islands of structure looking for more evolved creatures. While that vison has not materialized, a middle course between simple full-text search and of unwieldy tagging schemes and brittle ontologiesis beginning to emerge. The existing trend for tagging things - Flickr photos, del.icio.us, and Furl URLs - show that people are more likely to add structure to content. The pre-requisited for this would be:
- First, tagging must be easy
- Second, it must deliver both instant gratification and longer-term value to the person doing the tagging.
- Third and most important, it must occur in a shared context so that network effects can kick in.
Udell adds, some tags are implicitly woven into the fabric of our content.-like the tag for the recently concluded Demo event in Arizona. Blogosphere coverage of events, in future shall be dependent on the organisers picking a tag and promoting it.) John Udell writes about his own experience of using Mark Logic's XQuery-based XML database, Content Interaction Server, for pumping in the RSS feeds of all the blogs that he reads. Through a query that combines free-text search for items containing the strings "Demo" or "Demo@15" with structured search for items that contain links to demo.com. It yielded a nice list of Demo-related items that couldn't have built any other way. The service works by converting the HTML content of my feeds into well-formed XHTML, storing it in the Mark Logic database, and then using the XQuery engine to perform hybrid free-text and structured searches. Although the vocabulary of XHTML is not very rich, certain elements - notably links - carry a latent semantic payload. Work on indexing these ad hoc syntaxes be to collaboratively extendto work for the whole blogosphere, is the next area of research.Search is receiving fair amount of attention, seeing good researchfrom IBM's webfountain to individual efforts like above- with convergence technology pushing expectations- search will undergo radical transformations from what we are seeing currently.
|