Scatter/Gather and Iterative Query Refinement
[HEAR96]
[PIRO95]
[RAO]
Description
This technique allows the user to repeatedly sift information using successive queries
that eventually pinpoint the data of interest.
A browsing paradigm...which allows a user to rapidly assess the general
contents of a very large collection by scanning through a hierarchical
representation that acts like a dynamic table of contents. Initially the system
scatters, or clusters, the collection into a small number of document groups,
and present short summaries of the groups to the user. These summaries
consists of two types of information: topical titles (titles of documents close
to the cluster centroid) and typical terms (terms of importance in the cluster).
Based on these summaries, the user selects one or more of the groups for
further study. The selected groups are gathered, or unioned, together to form
a subcollection. The system then reapplies clustering to scatter the new
subcollection into a small number of document groups, which are again
presented to the user. With each successive iteration the clusters become
smaller, and their contents more refined. The user may, at any time, switch to a
more focused search method. [RAO]
The document clustering algorithm is optimized for speed, to encourage
interaction, rather to guarantee accuracy. [HEAR96]
IQR is really from Xerox, Scatter/Gather.
...emphasizes the participation of the user in a cycle of query formulation,
presentation of search results, and query reformulation. Since the focus is on
query repair, the information presented is typically not document descriptions,
but rather intermediate information indicating relationships between the query
and the retrieved documents. [RAO]
Advantages
Disadvantages
This page maintained by Mark Brautigam
(PDA version)
Last updated 1 March 1997