Scatter/Gather and Iterative Query Refinement

[HEAR96] [PIRO95] [RAO]

Description

This technique allows the user to repeatedly sift information using successive queries that eventually pinpoint the data of interest.

A browsing paradigm...which allows a user to rapidly assess the general contents of a very large collection by scanning through a hierarchical representation that acts like a dynamic table of contents. Initially the system scatters, or clusters, the collection into a small number of document groups, and present short summaries of the groups to the user. These summaries consists of two types of information: topical titles (titles of documents close to the cluster centroid) and typical terms (terms of importance in the cluster). Based on these summaries, the user selects one or more of the groups for further study. The selected groups are gathered, or unioned, together to form a subcollection. The system then reapplies clustering to scatter the new subcollection into a small number of document groups, which are again presented to the user. With each successive iteration the clusters become smaller, and their contents more refined. The user may, at any time, switch to a more focused search method. [RAO]
The document clustering algorithm is optimized for speed, to encourage interaction, rather to guarantee accuracy. [HEAR96]
IQR is really from Xerox, Scatter/Gather.
...emphasizes the participation of the user in a cycle of query formulation, presentation of search results, and query reformulation. Since the focus is on query repair, the information presented is typically not document descriptions, but rather intermediate information indicating relationships between the query and the retrieved documents. [RAO]

Advantages

Disadvantages


Picture of Scatter/Gather and Iterative Query Refinement


This page maintained by Mark Brautigam (PDA version)
Last updated 1 March 1997