LVis - Digital Library Visualizer

Katy Borner and Andrew Dillon

This item is not the definitive copy. Please use the following citation when referencing this material: Borner, K., Dillon, A. and Dolinsky, M (2000) LVis: Visualization of artworks in virtual reality. Visualization 2000 Conference, London.

ABSTRACT

LVis is a joint project aiming at the 2-D and 3-D visualization of search results derived from digital libraries (DL). This paper gives an overview about the intent of the project, the data mining techniques applied, the visualization metaphors used, and the usability studies undertaken.

Keywords: browsing, information visualization, immersive environments, usability studies

INTRODUCTION

The project LVis (Digital Library Visualizer) aims to aid users' navigation and comprehension of large document sets retrieved from Digital Libraries. More concrete, it provides a prototypical interface to the DIDO Image Data Bank. Image search results are displayed visually in form of data crystals, see Figure 1.

Figure 1: Data Crystals

The paper starts with an introduction into the DIDO Image Data Bank followed by an overview of the system. Subsequently, we introduce Latent Semantic Analysis (LSA), which is used to automatically extract salient semantic structures and/or co-citation networks between documents. The LSA output feeds into clustering algorithms that group images into classes of images that share semantically similar descriptors. A modified Boltzman algorithm [1] is used to lay out images in a three-dimensional crystal structure. The general interface metaphor as well as the user interface is explained and a sample scenario is provided. Finally, we discuss the usability studies undertaken as well as their first results. The paper concludes with a discussion of the approach as well as an outlook.

The Dido Image Bank

The current implementation of LVis displays search results from the Dido Image Bank, provided by the Department of the History of Art, Indiana University. Dido stores about 9,500 digitized images from the Fine Arts Slide Library collection of over 320,000 images. The Dido database permits convenient access and use of images for teaching and research purposes at Indiana University via a Web interface at http://www.dlib.indiana.edu/collections/dido/. Each image in Dido is stored together with its thumbnail representation as well as a textual description; see Figure 2 for typical data set.

Bosch.Garden of Earthly Delights.Det.,
BOSCH, HIERONYMOUS,
Garden of Earthly Delights.Details.Couple
in a bubble,
after 1500, o/p 220x195 cm,
Madrid, Prado.2823, DeTolnay.233.11/3/76,
RB.PTG.NTH
Figure 2: DIDO image data

For demonstration purposes the search result for three predefined search keys: "Paintings by Bosch", "African", and "Chinese Paintings", containing 11, 17, and 671 images respectively, were selected as test data sets for visualization.

Overview

The general system architecture is displayed Figure 2. It can be divided in data analysis done in a preprocessing step and the actual display of a search result at retrieval time.

During the preprocessing step, a larger data set of Dido is analyzed using a computationally expensive mathematical technique called Latent Semantic Analysis. The resulting Document-Document-Similarity matrix reflects the semantic structure between images and can be used at retrieval time to organize a list of matching images semantically using Clustering Techniques and an appropriate spatial layout algorithm. This is explained in detail subsequently.

Figure 3: System Architecture

Semantic Organization of Search Results

In order to display search results spatially according to their semantic relationships, the semantic structure has to be extracted, images have to be clustered according to the extracted semantic structure, and the images have to be laid out spatially.

Extracting Semantic Structure: Latent Semantic Analysis (LSA) (Landauer et al., 1998) is used to automatically extract salient semantic structures between images based on their textual representation. LSA allows induce and represent aspects of the meaning of words. In order to do this, a representative sample of documents (image representations) is converted to a word-by-document matrix in which each cell indicates the frequency with which each word (rows) occurs in each document (columns).

LSA has demonstrated improved performance over the traditional vector space techniques by modeling word-relationships using a reduced approximation of the term by document matrix for the column and row space computed by the singular value decomposition. It overcomes the problems of synonymy (variability in human word choice) and polysemy (same word has often different meanings) by automatically organizing documents into a semantic structure more appropriate for information retrieval.

Clustering of Search Results: At retrieval time, the result of a database query is hierarchically organized, based on the LSA output. Nearest-neighbor-based, agglomerative, hierarchical, unsupervised conceptual clustering is applied to create a hierarchy of clusters grouping documents of similar semantic structure. Clustering starts with a set of singleton clusters, each containing a single document. The two clusters most similar are merged to form a new cluster that covers both. This process is repeated for each of the remaining clusters. At termination, a uniform, binary hierarchy of document clusters is produced. The partition showing the highest within-cluster similarity and lowest between-cluster similarity is selected for data visualization. Details can be found elsewhere (Borner, 2000-SPIE).

Spatial Layout: Rather than being a static visualization of data, the interface is self-organizing and highly interactive. Data are displayed in an initially random configuration, which sorts itself out into a more-or-less acceptable display via a modified Boltzman algorithm (Alexander et al., 1995). The algorithm works by computing attraction and repulsion forces among nodes. In our application, the nodes represent images, which are attracted to other nodes (images to which they have a (similarity) link and repelled by nodes (images) to which there is no link. Attraction and repulsion forces are computed based on the underlying document-document similarity matrix.

Interface

Two interfaces have been implemented for LVis. A 2-D Java applet that can be used on a desktop computer --see (Borner, 2000-DL) for details -- as well as a 3-D immersive CAVE environment. Both interfaces give users access to three levels of detail: (1) they provide an overview about document clusters and their relations; (2) they show how images belonging in the same cluster relate to one another; and (3) they give more detailed information about an image such as its description or its full size version. This paper describes the underlying interface metaphors and the user interface for the 3-D immersive CAVE environment.

Metaphors: The LVis CAVE environment presents a landscape with subject categories from the digital databank presented in sculptural forms. The notion of "headings" is extrapolated into the creation of "heads." These heads, or 3-D icons, represent subject categories. Participants travel through the environment and confront the categorical headings. In order to explore the heading further, participants must "get inside its head." Once this confrontation occurs and the head is entered, the environment is transformed to a new one displaying the image database for that heading or category.

Figures: CAVE main environment and Portal representations of Africa, Bosch, and China.

User Interface: Participants enter a virtual display theater that stages the digital library as a cyberspace Easter Island. LVis is composed of a main environment that presents gateways or portals to specific subject categories established by predefined searches. The gateways are seen as heads or sculptural forms. Each form respectively represents a category. The first CAVE iteration of LVis premiers the artist Hieronymous Bosch and the countries of China and Africa. After exploring the island, participants "head" into a chosen category by navigating inside the sculptural form. This causes the participants to be transitioned to a new environment. The separate environment is composed of the Dido databank images available for that category. These images, or slides from the library, are presented in crystalline structures.

Each crystal represents a set of images with semantically similar image descriptions. Images in each crystal are arranged so that categorically related and similar images are in close proximity to one another. The formations of the crystalline structures depend on the size of the resultant data set from the original subject search. For example, the search for the category or subject "China" produces a search result of 671 images! The relationships of these images generate an environment with multiple crystals to explore. These configurations and relationships create various crystal structures as well. Crystals may grow from one point of origin (see Figure 1). Crystals may grow in parallel rows with images next to one another.

Participants can explore the crystalline structures of datasets by navigating this new environment and gaining viewing vantage points. Participants may select images of interest in order to display a larger and clearer size version. If the larger version is not satisfactory it can be returned to its previous iconic presentation. Those that are of interest may be exhibited in unison and collected as a separate and uniquely chosen grouping. It has been suggested to us that searches for images would be helpful if the participant is able to actually see the iconic image enlarge to a true to life size relationship. This dynamic is something that cannot happen in a real life slide library. The manipulation and display of information with multiple size relationships is possible in the CAVE. This size feature of image exploration can be exploited in a virtual environment.

Usability studies

Comparison of text-based and 2-D desktop interface and 3-D immersive CAVE interface

Error rates and completion times for a range of different tasks

Learning curves (2-D visualization & Wand

Free sorting tasks

Navigation 3-D immersive VE

Discussion and Outlook

Next project steps: Visualization of art slides in real size and organized spatially to be used as course material in art history classes. Connection of LVis to online resources such as image catalogs, music libraries, and educational CAVE projects

Colored, full-size versions of Figures 1 and 4 are accessible at http://ella.slis.indian

ACKNOWLEDGMENTS

We would like to thank the students, that are involved in this project: Andrew J. Clune, Ryan Schnizlein, Ho Sang Cheon, Kevin Kowalew, Jose Montalvo, Sumayya A. Ansari, and Tyler Waite. We are grateful to Eileen Fry from Indiana University for their insightful comments on this research as well as ongoing discussions concerning the Dido Image Bank. Dave Pape of the Electronic Visualization Laboratory, UIC is the software architect for XP, the underlying CAVE application. The SVDPACK [2] by M. Berry was used for computing the singular value decomposition. The research is supported by a High Performance Network Applications grant of Indiana University, Bloomington.

References

Alexander, Garcia, and Alder. Simulation of the Consistent Boltzman Equation for Hard Spheres and Its Extension to Dense Gases, Lecture Notes in Physics, Springer Verlag, 1995.

Katy Borner: Visible Threads: A smart VR interface to digital libraries. Proceedings of IST/SPIE's 12th Annual International Symposium: Electronic Imaging 2000, Visual Data Exploration and Analysis, San Jose, CA, 23-28 January 2000. http://ella.slis.indiana.edu/~katy/SPIE00/

Katy Borner: Extracting and visualizing semantic structures in retrieval results for browsing. Accepted for ACM Digital Libraries, San Antonio, Texas, June 2-7, 2000. http://ella.slis.indiana.edu/~katy/DL00/

Katy Bornerr: Searching for the perfect match: A comparison of sorting results for image data by human subjects and by Latent Semantic Indexing techniques. Submitted to Information Visualization (IV2000). London, UK, July 19-21, 2000.

Berry, M. et al. SVDPACKC (Version 1.0) User's Guide, University of Tennessee Tech. Report CS-93-194, 1993 (Revised October 1996). See also http://www.netlib.org/svdpack/index.html.

Cruz-Neira, C., Sandin, D. J. and DeFanti, T. A. Surround-screen projection-based virtual reality: The design and implementation of the CAVE, in J. T. Kajiya (ed.), Computer Graphics (Proceedings of SIGGRAPH 93), Vol. 27, Springer Verlag, pp. 135-142, 1993.

Dolinsky, M., "Virtual Environment as Rebus" Consciousness Reframed 19-23 July 1998. Newport, University of Wales College Center for the Advanced Inquiry in the Interactive Arts. Consciousness Reframed, second annual International CAiiA Research Conference 1998.

Dolinsky, M., (1997) Creating art through virtual environments. Computer Graphics 31(4): p.34-35,82 New York: ACM Press.

The Indiana University Department of the History of Art Dido Image Bank, http://www.dlib.indiana.edu/collections/dido/

Landauer, T. K., Foltz, P. W., & Laham, D. Introduction to Latent Semantic Analysis. Discourse Processes, 25, 259-284, 1998.

Andrew Dillon: Spatial semantics and individual differences in the perception of shape in information space. JASIS, (in press).