The Role of Genre in Shaping our Understanding of Digital Documents

Misha Walker Vaughan and Andrew Dillon

This item is not the definitive copy. Please use the following citation when referencing this material: Vaughan, M. and Dillon, A. (1998) The role of genre in shaping our understanding of digital documents. Proc. 61st Annual Meeting of the American Society for Information Science. Medford NJ: Information Today Inc., 559-566.


Interacting with documents in the digital domain is challenging many of our notions about discourse and its boundaries. Hyperlinked documents on the World Wide Web defy easy categorization and evaluation - making the role and value of digital documents difficult to assess. Most importantly, in such fluid and complex environments it is difficult to understand the nature of the interaction between users and information resources.

This paper argues that notions such as navigation are limiting our understanding of these complex information spaces. Instead, what is needed is a broader framework of analysis that can embrace these concepts, and incorporate extended issues relating to shared understanding, relevance, and style. In the present paper we explore the utility of the intersection of genre theory and cognitive psychology in providing a meaningful framework for analysis and design purposes. In so doing we report the results of our latest research into the elements of genre that influence users of digital documents and provide examples of the usefulness of this analysis in web-based environments.

Keywords: human-computer interaction, genre, navigation, information shape, digital documents, hypermedia


Interacting with documents in the digital domain is challenging many of our notions about discourse and its boundaries. Hyperlinked documents on the World Wide Web defy easy categorization and evaluation - making the role and value of digital documents difficult to assess (Nunberg, 1997). Simple comparisons of Web-based information and paper documents are ill-specified at best. There is little point in proposing a homogenous class of information termed 'digital' or 'web-based' since information types vary substantially in terms of purpose, intended audience and form, independent of content. Similarly, a simple view of document types that is based only on media fails to appreciate the significant gains in our understanding of digital document design that emerged from a decade of research in hypermedia design which has clearly emphasized the need to consider the task of the user in determining the usability of any information resource.

Most importantly, in such fluid and complex environments as the Web it is difficult to understand the nature of the interaction between users and information resources. Access can be rapid, links might cross boundaries of location, subject matter, content form, style and interactivity in a way that has no real equivalent in the paper world.

The navigation metaphor has been extensively applied by researchers and evaluators in an attempt explain and design for interaction and, while the view of users as navigating within information spaces has utility, few research studies have adequately demonstrated how well this metaphor can inform design (Dillon & Vaughan, in press). Users of a hypermedia system are largely static, unlike navigators in physical environments. Furthermore, the physical navigation framework fails to tackle the issue of semantic space, which Dillon et al (1993) suggested to be the more important of the senses of space employed in hypermedia. Certainly from the user's perspective, semantic space has immediate resonance. Arguments must be followed, information has sequence and structure, and while the mapping between physical and semantic space might be tight in restricted domains such as menu hierarchies, this coupling can be very loose in extended, detailed documents incorporating multiple media.

All metaphors extend only so far and in our view, the metaphor of navigation in hypermedia serves to limit our analyses of interaction. As we evaluate usability, a concern with movement through the information space is important. But thinking always of navigation focuses on physical behaviors that are reflections of semantic processes not the processes themselves. Thus, we are considering alternatives.


Genre theory and cognitive psychology, when extended to the digital domain, offer a potentially richer and more complete understanding of user interaction with digital documents. Genre theorists have struggled with the problems of defining and understanding genres as text, as speech, as performance, as music and so on and research in this domain has shed light on the utility of genre characteristics in conveying meaning and shaping user responses (see e.g., the extension of the newspaper genre for digital use in Watters & Shepherd, 1997) . Cognitive psychologists have pursued genre from an 'outside-in' perspective, seeking to understand how individuals create and maintain mental representations of genre. Constructs such as schematic representations and mental models may provide tools with which to capture an understanding of genre in a form most relevant to users, and perhaps design. It is the marriage of these perspectives that the present authors view as most useful at this time for digital library design.

Genre Theory

In the genre literature there appear to be three approaches to defining the term:

  • as an orienting framework and set of interpretive procedures
  • as classificatory category or scheme for the analysis of texts
  • as convention associated with the production of genre instances

The first definition views genres a social constructions and attempts to understand them as the products of social and communicative practice. Bauman (1992) reports that modern day folklorists conceive of genres as 'orienting frameworks' that are:

    1) systemic, emphasizing dimensions of interrelationship that organize communicative production and reception

    2) open-ended, viewing genres as flexible and negotiable orienting frameworks; and

    3) practice-centered, focusing on discursive practice in the conduct of social life (p. 57)

From this approach, genres are studied as products of discourse communities and analyzed for how they interact with and maintain these communities over time. At the heart of this approach is an attempt to understand how the regularities in forms of expression are produced and reproduced in a social context.

The second definition typically attempts to apply an analytic framework to the analysis of specific verbal or written forms of expression. It is most often concerned with proposing an underlying set of explanations for the regularity found in verbal and written genres. Within folklore studies, Ben-Amos (1976) describes one group of scholars who seek to apply sociohistorical explanations and thus sociohistorical frameworks to the understanding of regularity in storytelling. Hasan (1996) also argues for defining genre as a framework of analysis. She applies the framework to understanding the role and meaning of the nursery tale. Berger (1992), from the popular culture tradition, defines genres as an objective-emotive framework of analysis for texts. He applies this framework to understanding the appeal of television and movie scripts. Across these definitions of genre is the desire to understand and explain the development of regularity in forms of communication.

The third definition relates to the production of genre instances. Here the focus is on creating locally defined categorization schemes for genres. Within popular culture studies, Cawelti (1976) defines genres as "a means of generalizing the characteristics of a large group of individual works from certain combinations of cultural materials and archetypal story patterns" (p. 7). Grounded in the study of oral histories, Tonkin (1992) argues that genres are "the different conventions of discourse through which speakers tell history and listeners understand them" (p. 2). Berger (1992), who also viewed genre as a framework of analysis, defines genres as the conventions shared by one text with other texts of the same class. Hasan (1996), who similar to Berger argued for genre as a framework of analysis, suggests that a genre is the specific structural rules for a certain text type. Consistent across this group of definitions is the desire to understand and describe the regularity in forms of expression from a locally constituted perspective.

Genre can be all of these things, social practice, classificatory categories, and the conventions of form and structure. In fact two definitions of genre which have emerged from rhetoric's studies of scholarly and disciplinary communication seek to combine these different aspects into a more complete definition. Berkenkotter and Huckin (1995) argue that genres are dynamic rhetorical forms derived from communicative practices and whose conventions signal a discourse communities' norms and epistemology. In this definition they capture the social, the classificatory, and the conventions of genre. Swales (1990) argues for a definition of genre which states that genres are a class of communicative events which share some set of communicative purposes, as defined by a discourse community, and which share some restraints on allowable content, positioning, and form - again a merging of the three components.

These definitions of genre can usefully be extended to the consideration of digital documents. Almost any study of digital documents is concerned with a study of the regularity, or lack of regularity, found therein. Librarians concerned with problems of classification, information scientists exploring the effectiveness of query structures, and designers looking for better navigational structures and metaphors are all seeking to better understand and produce regularity in digital documents. Genre theorists have been concerned with the same problem, only in forms of expression which have traditionally been non-digital. Despite the differences, linkages between the two can be made. One can argue for an approach to digital documents which embraces all three definitions of genre, that is, viewing regularity in digital documents: as the product and process of communicative practice within a social context, as the application of a theoretical framework for understanding that regularity, and as the locally constructed understanding of the rules and conventions for a particular form.

Cognitive Psychology

Genre theory provides a more complete conceptualization of the problem of how we understand the regularity in digital documents. This conceptualization requires focusing not so much on the artifact but on the user. To understand the significance and meaning of rules, practices, and conventions associated with a genre one cannot look simply to the artifact, one must look first to the user. Cognitive psychology's research on mental representations of structure is one avenue for pursing an understanding of digital documents this is grounded in user conceptions of regularity. Research on mental representations of structure focuses on the problem of how people organize their knowledge about the world, be it experiences, the written word, stories and so on. Two theoretical constructs invoked in the analysis of this problem are most relevant to our concern with digital documents: schema theory and mental models.

Fundamental to schema theory is the assumption that "people's knowledge is organized; when we know something about a given domain our knowledge does not consist of a list of unconnected facts, but coheres in a specifiable way" (Mandler, 1984, p. ix). A schema is a form of mental representation which maintains information in a hierarchically organized structure containing a set of units describing generalized knowledge about some domain (Mandler, 1984). Schemata describe how long term memory for different types of events, objects, and activities is organized. Mandler (1984) proposed one model of schematic processing which is closely allied with the notion of genre - schemata for stories. Stories were found to have a high degree of structural regularity when dissected by story grammars (e.g. Colby, 1973), and this regularity was found to be reflected in our knowledge about stories in general (Mandler, 1984). Mandler defines a story schema as: a mental structure consisting of sets of expectations about the way in which stories proceed. Schemata for stories have been shown to affect how we process information about stories in both recall (e.g. Mandler, 1978; Mandler & DeForest, 1979; Thorndyke, 1977), and encoding (Haberlandt, 1980; Haberlandt, Berian, & Sandson, 1980; Mandler, 1984; Mandler & Goodman, 1982). Schema theory suggests an explanation and description for how our knowledge of regular digital documents may be stored and organized: as a hierarchically organized structure containing sets of units describing genre knowledge about some digital domain. However, schema theory does not detail how one acquires this type of mental structure in the first place, nor does is suggest how we deal with documents that have a high degree of irregularity.

Johnson-Laird's (1983) theory of mental models proposes a different understanding of information processing. At its core a mental model "plays a direct representational role since it is analogous to the structure of the corresponding state of affairs in the world - as we perceive or conceive it" (p. 156). This model is built up by a person hearing or reading a series of words which are represented mentally, the sentences formed by the words are then parsed into propositional representations, one of these representations is then operated on by a procedural semantics which then outputs information in the form of a mental model. Mental model theory can be extended to argue for how schema may be acquired, and thus how schemata for digital domains may be acquired. According to Johnson-Laird a user builds up a mental model, say of a discourse, from the bottom-up. If that same user were to then encounter a similarly structured discourse on repeated occasions he or she would build up a series of mental models that would begin to exhibit some regularity. One could argue that this is where a schema is derived from, via regularities identified across mental models over time. One can also extend Johnson-Laird's mental models to the problem of irregular discourse types. According to mental models, when people encounter a text, familiar or not, they begin processing the text from the bottom-up by developing a series of inferences about the text as they read. From the propositions and their general level of coherence and plausibility they create a mental model of the text. No previous knowledge of this text type is required, and thus the problem of how people process irregular text types is solved. A schema, then, is not a necessary condition for comprehension, although it certainly seems to aid it. So text that have differing structures are still comprehensible via mental models. Johnson-Laird's notion of mental models suggests how we may build up a mental representation of a digital domain's regularity, and eventually evolving that knowledge into a schematic representation. If the domain lacks a high degree of regularity, is seems reasonable that users will never quite achieve schematic knowledge for a digital space.

van Dijk and Kintsch's (1983) theory of strategic discourse processing provides a model of mental representations of structure which contains the best elements of both mental models and schema theory, applied to the specific problem of how we understand discourse. The reader begins by constructing a textbase built up directly from the discourse. The textbase must cohere both at the local level of a set of sentences (i.e. facts are consistent) and at the global level regarding the meaning, or gist, of a text. At the global level, the reader employs a set of strategies that help reduce propositions to a single macroproposition, providing a higher level and hierarchical form of organization to a text. Based on the creation of a macrostructure, the reader is also able to establish a situation model in memory which is "the cognitive representation of the events, actions, persons, and in general the situation, a text is about" (p. 11). Finally, van Dijk and Kintsch argue that many discourse types may possess a higher level of mental representation than macrostructures, in the form of superstructures. For those discourse types which exhibit a high degree of regularity, a superstructure organizes the macropropositions. This strategy states that a reader "will try to activate a relevant superstructure from semantic memory as soon as the context or the type of text suggests a first cue" (p. 16). From then on the superstructure may be used to more efficiently process the text in a top-down fashion.

Cognitive psychology's research into mental representations of structure provides an explanation and understanding of where knowledge about genres comes from, how we store it, and how we are able to use it when interacting with text. These theories can be extended to propose that interacting with documents in the digital domain is a similar process. Users begin by building up a mental model of an information space based on their interaction with a set of its digital documents. With repeated exposure and if the set of digital documents maintains some sort of regularity, the user will begin to develop schematic knowledge of that regularity. This highly organized, abstract knowledge about digital documents would take the form of Kintsch's supersturctural knowledge. Such an organized, high-level mental representation of digital documents would aid in recall, encoding, and comprehension of the documents, as suggested by schema theory.


Extending these theoretical perspectives to practice is the current challenge we face. Below we will demonstrate one example of gaining access to users' conceptions of text and its regularities. This example is drawn from a digital presentation of academic journal articles, a known quantity against which users conceptions and actual teachings of genre rules, conventions, and norms can be compared.

In this study, experts were asked to categorize a set of individually presented paragraphs according to where in an academic journal article they belonged, introduction, methods, results, or discussion. Sixteen cards were presented with half of the cards containing all original information and half missing some cueing elements. As part of this study, the subjects were asked to provide a verbal protocol and rationale for their decision. Four cards from the cued set (i.e. containing all original information) were selected from each category based on their having the highest degree of agreement between subjects on categorization (see Table 1). The verbal protocols were subjected to a "how, why, what" content analysis. This is a form of analysis proposed by Dillon and McKnight (1990) for understanding users' conceptions of texts. The "how, why, what" analysis was used to understand users conceptions of the different components of a journal article. In this case, how refers to determining how the text was read by the users. Why refers to determining what reason, purpose, or importance users ascribed to each section. What refers to determining what content is most important, expected, or significant for each section.

Table 1. Introduction, Methods, Results, and Discussion Paragraphs


Considerable controversy exists concerning the relationship between the testosterone and aggression in human males. Although elevated testosterone levels have long been theoretically associated with overt aggression, there is a lack of corroborating evidence for this position. The results of numerous studies using self-report measures of aggression have been equivocal (Gladue, 1990; Hucker & Bain, 1990). Whereas some modest correlations have been found between endogenous testosterone levels and self-report measures of aggression (Christiansen & Knussmann, 1987; Gladue, 1991; Gray, Jackson, & McKinlay, 1991), self-report measures of aggression tend to correlate poorly, or not at all, with measures of overt, physical aggressive behavior (e.g. Meyer-Bahlberg, Nat, Boon, Sharma, & Edward, 1974).


After providing a baseline sample, each subject was escorted to the experimental cubicle, seated in front of the subject task board, and the second saliva specimen was collected. During this 10 to 15-min saliva collection period, a concentric-ring electrode was attached to the subject's nondominant wrist. The experimenter then excused himself, ostensibly to prepare the "opponent" for the experiment. When the saliva collection period was over, this second sample was retrieved and stored.


The expectations of treatment success held by subjects at the end of their first treatment session were assessed with a 2 x 2 (Hypnosis/Placebo x 1/4 sessions) factorial ANOVA. The main effect for treatment was highly significant, F (1, 71) = 56.61, p < .001, and indicated that, at the end of their first treatment, hypnotic subjects, M = 4.50, SD = 1.23, expected greater treatment success than placebo subjects, M = 2.50, SD = 1.06.


Time and level of provocation were confounded in the present study. It could therefore be argued that the observed provocation effects were due to the passage of time, rather than the opponent's escalating shock settings. Previous studies, however, have demonstrated that subjects generally do not escalate their shock settings across blocks unless the opponent increases the level of provocation (e.g., Epstein & Taylor, 1967). Thus, it is reasonable to conclude that the changes in aggressive behavior observed in this study were influenced by provocation.

Table 2 presents the summarized concepts derived from the verbal protocols for what content is important for each part of a journal article and the purpose for each component of the article, as articulated by a group of its expert users. The 'how' part of that analysis revealed that subjects rarely read the whole paragraph. A common set of reading techniques was found for deciding on the categorization of a paragraph:

reading serially until subjects began encountering keywords that indicated its categorization, such as "controversies exist" for introduction, and "confounded in the present study" for discussion

reading serially until subjects began to get an understanding of the purpose of the paragraph and the nature of the argument (if there was one), such as the interpretation of results for discussion

skimming only for keywords, such as "saliva collection" for methods and "ANOVA" for results

Table 2. Expert User Conceptions of Journal Article Components





- Summaries of previous findings in the literature

- No reference to own findings

- Presence of citations/references

- Description of the relationship between two concepts or variables

- Words or phrases that indicate a certain tone, e.g. "controversies exist", "lack of corroborating evidence"

- Trying to set up author's study

- Trying to make an argument

- Setting up past studies as a foil for presenting a new study or new question

- Setting up the background

- Justifying a current or future study


- Detailed description of what the subjects did

- Description, step-by-step, of what they did in the experiment

- To explain the procedures used in the study


- Provides specific statistics such as ANOVAs, F values and p values

- Presence of numbers

- No general discussion of the numbers

- To display and report on the results

- To report on the type of analyses that were done


- Discussion of the confounding variables/factors

- At a general level of discussion

- General discussion of findings and possible conclusions

- Reference to previous literature

- Discussion of problems with the study

- To justify the study's results

- To admit mistakes

- To interpret the results

- To provide alternative explanations of causation

- To argue for implications of what was observed

In these data we see that experts are very quickly able to infer relative position in the narrative form of an isolated paragraph presented out of context. They can do this without reference to a complete picture of the article in question. As such, they seem to be manifesting knowledge of this genre of scientific communication. This knowledge relates to the specific elements within the text that can be seen quickly (statistical details, key phrases denoting discursive tone, procedural details of the study, etc.) as well as inferential knowledge based on the motivation of the author. Dillon and Schaap (1996) invoked the term 'shape' to refer to the manner in which experts could quickly exploit the perceptual cues in this document form to determine location. These protocols suggest (although the analysis is not yet complete) that as well as the perceptual cues, or perhaps where the perceptual cues are insufficient, expert users infer location based on assumptions of the author's intent. They can do this because they are familiar with the form and thus can exploit known regularities of occurrence to predict location in the extended narrative. This also explains why Dillon and Schaap found experts were able to perform accurately even when perceptual cues for location were removed. Continued analysis of this data will examine the time differences in such task performance to test the relevant cognitive costs involved.

These data can be extended to a web-based context to inform the design of web-based and other scholarly electronic content. As a first example, these summary data could be given to a web designer charged with producing a web-based scholarly journal. The data would clearly communicate the nature and form of this type of scientific communication. It would demonstrate the conventions, rules, norms, and argumentative functions of each of a journal's sections. With this understanding, a designer could then implement a web-based version of a scholarly journal and then test the design with these summary data as a benchmark. If the test subjects did not produce the same set of conclusions regarding the importance of certain cues nor a clear indication of comprehending an article's flow, then the design may not be reproducing the necessary genre conventions. As a second example, these summary data could be given to a web designer asked to experiment with some new forms of electronic presentation that break the rules of scholarly print presentation. One cannot break the rules without knowing what they are in the first place. Armed with this knowledge, the designer could explore different presentation options, navigation schemes, and layouts that built on and extended beyond, perhaps even violated, users' expectations of the existing journal genre. In the same way as for the scholarly web site, this digital space could be tested against the existing benchmark of users' conceptions to determine how the design is actually producing a different interactional experience for the user.


Information is organized at various levels of abstraction, with genre being a cognitively relevant level that has rarely been addressed in discussions of digital documents. Our research suggests that this level of abstraction has tractable manifestations that could usefully inform the design of new applications. By exploiting these forms we may produce more cognitively compatible representations that overcome the major usability obstacle to current implementations - user disorientation. The methods by which such an analysis could be achieved are not limited to those presented here. Traditional usability measures of efficiency, effectiveness, and satisfaction can still play a central role in that they too can indicate to what extent design changes, as suggested by generic analysis, are making a difference at a behavioral level for the user. Other methods of gauging users comprehension and ability to process information could also be employed to determine the effectiveness of designing for different generic forms.

The existing tools of analysis for digital information spaces can be usefully expanded to include notions of genre. Genre, defined as both a social construction and a cognitive function, re-frames our understanding of digital documents to focus not only on the behavioral components but also on the semantic components of interaction. This review and extension of genre theory and cognitive psychology to the problems of digital documents will hopefully provide a new means of improving design and evaluation form a user-oriented perspective.


