Because It's Not There: Ekphrasis and the Threat of Graphics in Interactive Fiction

Aaron Kashtan  <akashtan_at_ufl_dot_edu>, Department of English, University of Florida


Existing scholarship on interactive fiction (IF, also known as the text adventure) tends to treat it as a video game genre and/or as a category of electronic literature. In this essay I argue that IF can be understood as participating in traditions of visual prose and ekphrastic textuality, insofar as IF consists of room and object descriptions which direct the player to visualize the things they describe. Unlike traditional ekphrastic literature, however, IF also asks the player to take practical actions in response to the images he or she visualizes. During the commercial era of IF, ekphrasis was the most effective means available of providing players with immersive visual experiences. However, graphical video games have now surpassed IF in this area. Therefore, in order to justify the continued existence of IF, contemporary IF authors have been forced to conceive of the visuality of IF otherwise than in terms of the logic of transparency. One strategy for doing this, exemplified by Nick Montfort's game, Ad Verbum, is to abandon visuality almost entirely and emphasize IF's linguistic and textual qualities. An alternative strategy, exemplified by Emily Short's game City of Secrets, is to assert that IF is visual in a non-transparent way, because IF offers visual experiences which are user-generated rather than pre-rendered.

The genre of interactive fiction has enjoyed increasing critical attention over the past few years, particularly since the publication of Nick Montfort's Twisty Little Passages: An Approach to Interactive Fiction. [1] According to Eric Eve's definition, an interactive fiction is "a turn-based program driven by textual input from the player, responding with output that is principally or wholly textual, and involving a parser and a world model"  [Eve 2007, para. 1]. In other words, IF is a program that (1) simulates a diegetic world containing various spaces and objects (the world model), (2) presents that world to the user/player through the medium of unillustrated or sparsely illustrated text, and (3) permits the user to interact with its simulated world by inputting textual commands. IF, then, is distinguished from other genres of video games by its lack of images, and from other forms of recombinatory or procedural textuality by its inclusion of a world model.
Up to this point, IF has typically been examined from the viewpoint of its textual and programmatic aspects. For Montfort and others, IF descends from the canonical traditions of riddle-making and ergodic textuality and participates in the contemporary movement of electronic literature. According to these claims, the value of IF for scholarly study lies in what it tells us about textuality, literariness, and the transformations of both in the digital era. The existing critical discourse presents IF as a primarily textual, procedural and ludic phenomenon — as an art form or communicative medium which is composed of verbal signifiers that are subject to rule-based manipulations, and which has historically been used to produce games.[2] With rare exceptions, critics have neglected the visuality of IF. In this paper I will explain the necessity of rectifying this neglect, and take tentative steps toward doing so.
The distinction between the textual and the visual, or between the verbal and the visual signifier, is impossible to define precisely, because, as W.J.T. Mitchell argues, such distinctions are always already political: "Every theoretical answer to the questions, What is an image? How are images different from words? seemed inevitably to fall back into prior questions of value and interest that could only be answered in historical terms"  [Mitchell 1990, 3]. For the narrow purposes of the present analysis, we might define a textual signifier as a sign whose visual appearance is not directly linked to its signifying value. For example, the visual appearance of the letter P can vary, to a certain predetermined extent, without altering its semantic value. Similarly, a novel can be set in a variety of typefaces while still being understood as the "same" text, and a computer program will carry out the same processes regardless of the font in which it is written. A visual signifier or image, by contrast, is one whose semantic or affective value is linked directly to its specific visual appearance, including its material embodiment and/or its phenomenological effect on the viewer. I will use the term image to refer interchangeably to "real" images and mental visualizations. While "pictorial images are inevitably conventional and contaminated by language"  [Mitchell 1990, 42], the image at least tries to claim that its meaning is contingent on its physical appearance. "The image is the sign that pretends not to be a sign, masquerading as (or for the believer, actually achieving) natural immediacy and presence" [Mitchell 1990, 43]. Obviously, this word-image distinction is as problematic and open to critique as any such distinction; see, for example, #drucker2002 for an argument that something is indeed lost when the materiality of a letter is altered. I claim merely that this distinction represents a commonsensical understanding of what distinguishes words from images. It reflects the way in which IF critics typically understand these terms when they don't interrogate them further.
Critics have typically paid little attention to the visuality of IF — which includes both its use of actual visible signs, and the visual images it may evoke in the player's mind. This may seem hardly surprising since, by the second element of the above definition, IF consists mostly or entirely of textual signifiers and makes limited use of images. In the present paper, however, I will suggest that interacting with IF is in fact a visual experience in crucially important ways, and that IF therefore has important things to teach us about the fate of the visual aspects of verbal signifiers in the digital era.
Without denying that IF participates in various traditions of potential literature and ludic textuality, as Montfort and others suggest, I here want to suggest that IF is also an heir to equally longstanding traditions of ekphrasis and of visual prose. As such, IF poses questions of the relation between descriptive text and readerly visualization that go back as far as Homer's description of the shield of Achilles — though by virtue of its ergodic nature, IF also significantly transforms those questions. By viewing IF as a visual-textual phenomenon, we can improve our understanding of the transformation of visual prose and readerly visuality in the digital era.
Moreover, a focus on the visuality of IF can improve our understanding of how the genre defines itself. A recurring concern of ekphrastic poetry is the definition of the relation of poetry to painting and, more recently, to still photography and film. As Mitchell argues, ekphrasis is the genre in which text (in the narrow sense given above) confronts its other: "Ekphrastic poetry is the genre in which texts encounter their own semiotic 'others,' those rival, alien modes of representation called the visual, graphic, plastic, or 'spatial' arts" #mitchell1994 This argument expands on James Heffernan's reading of ekphrasis as paragonal — that is, as enacting a competitive struggle between word and image. Elizabeth Bergmann Loizeaux suggests, by contrast, that ekphrasis may also be motivated by "such modest, and profound, feelings as companionship or friendship, the terms in which poets often describe their ekphrastic motives"  [Loizeaux 2008, 15]. Under either model, however, a central drive behind ekphrasis is the desire to define poetry or "textual" art itself by contrast to its other. By directly addressing the image, poetry makes claims for what it can do that the image can't, and/or asks how it can do what images seem capable of doing more effectively.
This task becomes especially pressing in the present cultural moment. Ekphrastic literature has perhaps always been both fascinated and repelled by the apparently superior mimetic power of images to text. As Murray Krieger argues, ekphrasis entails "the defensive concession that language, as arbitrary and with a sensuous lack, is a disadvantaged medium in need of emulating the natural and sensible medium of the plastic arts," which exists in an ambivalent relation to "the prideful confidence in language as a medium privileged by its very intelligibility"  [Krieger 1992, 12]. However, the more images advance in both ubiquity and mimetic power, the more unequal the terms of this relation become. Loizeaux observes that twentieth-century poets' interest in ekphrasis arises from ambivalent reactions to the growing cultural importance of the image:

The widespread presence of ekphrasis in twentieth-century poetry can be understood as both a response to and a participant in what W.J.T. Mitchell has called “the pictorial turn” from a culture of words into a culture of images that began in the late nineteenth century with the advent of photography and then film, and has accelerated since the mid twentieth century with the invention of television and, now, digital media. Excited — and haunted — by a sense of images' increasing power in western culture, poets have taken up ekphrasis as a way of engaging and understanding their allure and force.  [Loizeaux 2008, 3–4]

At the same time that images have attained unprecedented cultural power, poetry has now "further lost popular readership and its significant social role"  [Loizeaux 2008, 6]. Explicit confrontation with the image now becomes a way of justifying he continued appeal, if not the very existence, of poetry itself.
Similarly, IF authors and critics feel a need to distinguish IF from graphical video games in order to explain why IF should continue to exist today, despite its apparent commercial and technological inferiority to graphical video games. Graphics both threaten and fascinate IF authors in much the same way that paintings both threaten and fascinate poets. IF authors and critics feel a need to distinguish IF from graphical video games in order to explain why IF should continue to exist today, despite its apparent commercial and technological inferiority to graphical video games.
As an example of the study of IF from a visual perspective, in this essay I offer readings of two recent works of IF that represent opposing conceptions of the genre's visual aspects. My first text, Nick Montfort's Ad Verbum (2000), goes further than perhaps any other work of IF in stressing the genre's textual properties at the expense of its visual properties. In calling attention to the textual nature of the IF interface and of the player's input, Ad Verbum defines itself as a purely verbal artifact. My second text, Emily Short's City of Secrets (2003), seeks instead to accentuate its own visuality by providing evocative descriptions accompanied by abstract imagery. Yet the mode of visuality that this game proposes is affective, evocative and phantasmal, rather than vivid and immediately present. This game proposes that IF can be a visual experience, but that its visuality differs in significant ways from that of the graphical video game. Though these games approach visuality in very different ways, a central question for both games is whether and how the visual properties of text can compete with those of more mimetic forms of imagery. This, I would argue, is as crucial a question for interactive fiction as it is for ekphrastic poetry, because it touches upon the larger question of what happens to less explicitly transparent forms of visuality and textuality at a time when transparent forms of visuality seem to have attained a position of cultural dominance. As I will argue, IF, like ekphrastic poetry, offers visual experiences which are indirect, phantasmal, and dependent on the player's imagination. How can such visual experiences compete with the transparent visual experiences offered by media like computer games and CG film? Do we still want or need such visual experiences, and if so, why?[3] The two games I'll be discussing represent two possible answers to these pressing questions.

Toward a Theory of IF Visuality

For most IF critics, IF is a verbal, textual and literary medium whose closest affinities are with the tradition of ergodic textuality that extends from the I Ching and the Exeter Book, through the Oulipo and Cortázar, to hypertext fiction. On this assumption, the visual aspects of IF, if any, are usually ignored. Espen Aarseth, for example, treats IF as "a new type of literary artifact"  [Aarseth 1997, 107]. His reading of the Infocom game Deadline considers only its literary and ludological aspects. Montfort, the leading authority on the genre, has equally little to say about its visual qualities. According to the historical narrative he provides, the antecedents of IF are textual genres, including riddles and Oulipian potential literature [Montfort 2005, 37, 65]. The major exception to this neglect of IF’s visuality is Dennis Jerz’s article "Somewhere Nearby is Colossal Cave," which compares the geography of Will Crowther and Don Woods’s Colossal Cave (or Adventure), usually considered the first work of interactive fiction, to the geography of the real cave on which the game was based. In a photo-essay, Jerz juxtaposes Crowther's room descriptions with photographs of the real-world locations on which those descriptions are based. However, Jerz's stated goal here is to "establish that Crowther's original was not only faithful to the geography of the real Colossal Cave, but was also a fantasy remediation of that site"  [Jerz 2007, para. 2]. The question that interests Jerz is the extent to which the simulated cave faithfully reflects the real one. What he leaves unexamined is the general question of whether the exploration of such simulated spaces can be a spatial and visual experience.
This critical neglect of the visuality of IF seems unsurprising, given that one might have difficulty identifying any visual aspects of the genre. What could be the importance of visuality in a medium which, by definition, includes few or no visual images and relies primarily on text? If we distinguish visual and textual signifiers according to the definitions given above, the signifiers that make up a work of IF seem to fall into the latter category, as their semantic value doesn't depend on their precise visible instantiation. Contemporary IF interpreters give the player the option of altering details such as the font, text color and background color, without altering either the precise text that the program generates, or the code that generates it.
Figure 1. 
A scene from Ad Verbum. Reproduced by permission.
Figure 2. 
The "same" scene from Ad Verbum with a different font, font color and background color. Reproduced by permission.
According to a common-sense understanding, the IF work is the source code, or perhaps the string of signifiers produced in the execution of that code, but not the material instantiation of that code. Two players who play the same version of Ad Verbum using the two sets of interface options shown in figures 1 and 2 are playing the "same" game; the differences in font and color are purely cosmetic. This is analogous to the commonsensical assumption that the identity of a literary text resides in the text — the ordered array of signifiers — and that the material instantiation of those signifiers is merely a cosmetic feature.[4]
Yet I argue, counterintuitively, that IF may be viewed as a visual and visual-verbal genre. In the first place, and even before we consider the visual aspects of the IF interface itself, a central element of nearly all works of IF is the ancient rhetorical trope of ekphrasis. In ekphrasis, an absent object is described in terms which permit the reader or listener to visualize that object, to “see” it in the mind's eye as if it were physically present.
From the reader's perspective, the principal textual components of IF are room descriptions and object descriptions. The basic purpose of both these types of texts is designed to enable the player to visualize the phenomena described by the text. As Eric Eve explains, in IF,

the physical world is generally modelled as a series of discrete locations known as rooms. The totality of rooms in a given work of IF is often referred to as the map. Such rooms could correspond to rooms in a building, but they need not and frequently do not[...]. Conceptually, a room is that segment of physical space that is immediately accessible to the player character.  [Eve 2007, para. 7] (emphasis in original)

In other words, the typical arrangement of space in IF is that the gameworld is divided or segmented into several discrete, mutually exclusive chunks. Such a spatial arrangement is not unique to IF. The fifth item of Mark J.P. Wolf's taxonomy of video game spatial structures is "adjacent spaces displayed one at a time" [Wolf 2002, 59].[5] In graphical video games dating back to the late 1970s, such as Superman and Berserk, "adjacent spaces or rooms are displayed as a series of nonoverlapping static screens which cut directly one to the next without scrolling"  [Wolf 2002, 59]. However, in a text adventure game, by definition, these chunks of space cannot be represented by onscreen images.[6] Instead, a block of onscreen text — the "room description" — is used to make the player aware of the relevant properties of the present room, including the exits from that room and the objects it contains. The room description might be said to take the place of the absent graphical image of the room, although this formulation is anachronistic insofar as IF predates graphical adventure games. Furthermore, the image of the room is not "absent" in the sense of having been removed or abstracted, inasmuch as it never existed to begin with.
Consider, for example, the following room description from Zork I: The Great Underground Empire [Blank et al. 1980]:
You are in the living room. There is a doorway to the east, a wooden door with strange gothic lettering to the west, which appears to be nailed shut, a trophy case, and a large oriental rug in the center of the room.
Above the trophy case hangs an elvish sword of great antiquity.
A battery-powered brass lantern is on the trophy case.
This text names the room and enumerates all the visible exits from the room (the doorway and the wooden door) and the visible objects in it (the door again, the trophy case, the rug, the sword and the lantern). These objects are all "implemented." That is, they are defined in the game’s source code as objects that have certain properties, one of which is that the avatar may be able to interact with them. The description mentions no objects that aren’t implemented (although room descriptions often do mention such objects), and it does not fail to mention any visible objects that are implemented.
The qualifier "visible" is necessary because there's a trap door under the rug. This object is left unmentioned because on first entering the room, the avatar can't see it. Finding the trap door (by moving the rug) is a puzzle. The player may well know about the trap door before moving the rug, perhaps from having played the game before, but such knowledge does not extend to the avatar. If the player inputs a command referring to the trap door before moving the rug, the game responds, "You can’t see any trap door here!" In this case the player may be able to visualize the trap door under the rug, and perhaps the avatar can even imagine that there's a trap door there, if we imagine the avatar as being capable of having cognitive operations that the player doesn't share. However, the avatar still can't see the trap door in the sense that it is not physically within his or her visual field.[7] Thus the room description represents what the avatar, not the player, sees when he or she looks around the room. It is a translation of the avatar's direct visual experience into words. The player then has the opportunity to back-translate those words by activating the faculty of readerly visuality — by forming an imaginary visualization of the things the avatar sees.
The primacy of seeing in IF is indicated by the ubiquitous presence of light sources in Adventure and games descended from it. Exploration can't take place in the absence of light, and light source conservation and transport are common puzzle themes. As Jeremy Douglass observes, this made sense in Adventure "as it is highly dangerous to wander around cave systems in the dark"  #douglass2007, but the need for light sources subsequently became divorced from its original context and evolved into a generic convention. Games like Taro Ogawa's Enlightenment (1998), where the player's goal is to extinguish all the light sources in a room, or Andrew Plotkin's Hunter, in Darkness (1999), where exploration takes place via senses other than sight, are deliberate reactions against this primacy of sight #douglass2007. The default assumption in IF is that the avatar experiences the gameworld through the visual faculty, and that the text presents the avatar's visual experience to the player.
As translations of visual objects in the medium of language, IF room descriptions (and object descriptions, of which room descriptions are special cases) are examples of ekphrasis. In current critical discourse ekphrasis is most often defined as the verbal description of a visual work of art, but Janice Hewlett Koelb argues that this meaning of the term is a twentieth-century invention, dating back no earlier than Leo Spitzer’s 1955 essay on Keats's "Ode on a Grecian Urn" [Koelb 2006, 2]. Ancient rhetoricians defined ekphrasis as "[a] speech which leads one around ( periegematikos ) bringing the subject matter vividly ( enargos ) before the eyes"  [Koelb 2006, 23], whatever that subject matter might be. IF games like Zork certainly meet this definition. The degree of vividness (or enargeia ) with which the subject matter is "brought before the eyes" is a factor that varies between different games, and also between different players, since players might mentally visualize the gameworld more or less visually depending on how visually inclined they happen to be. On an anecdotal level, I tend to visualize extensively when I play IF games, but I know other IF players who claim that they don't do so, and that they understand room descriptions in a conceptual or propositional way. However, I suggest that IF games must supply the potential for visualization in order to provide a meaningful play experience.[8] What we might call visualizability is a basic requirement for traversing most if not all interactive fictions, especially those that include multiple rooms or rooms with multiple objects in physical contact with each other. In order to productively interact with the gameworld, the player must possess at least a minimal understanding of the spatial relationships between the objects in each room and between the rooms themselves. This requires constructing a mental (or actual) map, which is, to a substantial degree, a visual operation. As Eve observes, "[t]he totality of rooms in a given work of IF is often referred to as the map " emphasis in original  [Eve 2007, para. 7] and this is "probably because someone designing a work of IF containing more than a handful of rooms almost certainly needs to draw a map indicating their spatial relations before attempting to write the game, and players often find it useful to draw schematic maps as they play"  [Eve 2007, fn5].
When visualizability breaks down — that is, when room and object descriptions fail to accurately represent what the avatar can see — meaningful play and the ability to traverse the game successfully may be impeded.[9] This may happen, for example, when an object mentioned in a room description is not implemented. By convention, if the player tries to interact with such an object, the game responds that the object is not important. Sometimes, however, the game fails to acknowledge the object’s existence and instead outputs a standard response to commands that reference nonexistent objects, such as "You can’t see any such thing" or "I don’t see that here." This behavior is generally considered a design flaw or even a bug, as Eve explains: "It looks very clumsy if, having told the player that the room is decorated with striped wallpaper, the game responds with 'You see no such thing' when the player tries to examine it" [Eve 2007, para. 15].[10] Such behavior creates a gap between the visual experience of the avatar and the verbal experience of the player. Somehow, the player can read about things the avatar can’t see, and this destroys the illusion that the room description represents the avatar’s visual experience.
An opposite but perhaps more egregious breach of visualizability occurs when the text fails to mention objects that are implemented and that the avatar should be able to see. For example, in Dave Baggett and Carl de Marcken's 1994 game +=3, the avatar must give three objects to a troll as a toll to cross a bridge. The INVENTORY command reveals that the avatar is holding just one object, and the game's single room contains no other objects that can be acquired. The solution is to take off the avatar’s shirt, shoes, pants, socks, glasses and/or underwear, thereby supplying the missing two items. This solution, though perfectly logical, is cruelly unfair because none of these articles of clothing are referred to anywhere in the game.[11] In particular, they aren't mentioned in the responses to the commands INVENTORY and EXAMINE ME. According to conventions which were well established by 1994, experienced players would thus conclude that the avatar was wearing nothing important, because on looking at himself or herself, the avatar sees nothing worth mentioning. The player would assume that the avatar is wearing clothes (otherwise the avatar's nudity would be mentioned), but that the clothes have no relevance to gameplay. Objects left unmentioned are assumed to be below the avatar’s perceptual threshold, and thus either nonexistent, or irrelevant to the task of traversing the game. The underlying assumption here is that everything the avatar sees will be translated into descriptive text. In violating this assumption, +=3 precludes meaningful play.
Thus, IF is an ekphrastic medium because it consists of texts which describe visual phenomena and which prompt the reader to create imaginary visualizations of those phemonena. However, IF difers from other ergodic media by virtue of being prescriptive rather than autotelic. The reading of a static ekphrastic text, like Diderot's Salons or Ruskin's word-paintings, is a self-contained experience.[12] These texts describe absent visual phenomena in such a way as to permit the viewer to visualize them, but they do not prompt the reader to take any action in response to these visualizations. The experience of imagining what the text describes is its own reward. By contrast, when an IF player reads a room or object description, he or she is expected to take an action in response (i.e. to do work, hence the term ergodic). The player is prompted to give commands to the avatar based on the visual and other information in the description.
My argument, thus, is that ekphrasis is the characteristic mode of visual representation in IF. During the commercial era of IF (approximately coinciding with the lifespan of Infocom, from 1979 to 1989), ekphrasis, as a means of visual rendering, had certain comparative advantages over graphics. Graphical video games predate Colossal Cave by at least 15 years, but these games ran on mainframes or dedicated arcade machines. The creation of sophisticated graphics was beyond the technological capabilities of contemporary home computers. Displaying text was much less labor-intensive. For example, the first commercially successful personal computer was the Osborne 1, released in 1981. This computer had a monochrome screen which was incapable of displaying bitmap graphics [Wikipedia 2009]. On such a platform, a visual depiction of a building with keys, a brass lamp, food and water on the ground would have been out of the question. Text made it possible to "show" visual phenomena that could not have been depicted with the graphic resources available.
Figure 3. 
Adventure running on an Osborne 1. Originally uploaded by Cetcom. Licensed under the GNU Free Documentation License.
Furthermore, text was far more cross-platform than graphics. Infocom games were designed for the Z-Machine, a "software computer [which] could be implemented on many different platforms, including almost all of the popular microcomputers in the United States during the 1980s" including business machines as well as dedicated gaming machines [Montfort 2005, 126]. Since all of these computers were capable of displaying text, all the Infocom games could be ported to any platform at once simply by writing a new implementor for that platform. The use of graphics, by contrast, would have made such cross-platform availability an insurmountable obstacle.[13]
For these and other reasons, the use of ekphrasis rather than graphics made the commercial success of IF possible. According to the standard view of the genre’s history, however, IF's reliance on text was also the cause of its commercial decline.[14] Over the course of the 1980s, as the graphical capabilities of home computers advanced, the new genre of the graphical adventure gradually rendered IF obsolete.[15] According to Espen Aarseth, this was a natural succession because graphics, compared to ekphrasis, are a naturally superior mode of visuality: "Images, especially moving images, are more powerful representations of spatial relations than texts, and therefore this migration from text to graphics is natural and inevitable" [Aarseth 1997, 102].[16] By Aarseth's logic, the purpose of a game is to serve as a transparent window into an imagined space. According to what Bolter and Grusin call the logic of transparency #bolter2003, the game seeks to erase its own materiality and present the player with a vivid, sensuously present experience of existence in another world. For this purpose to be fulfilled, the gameworld must be presented with maximum visual richness. Clearly games that translate the avatar's visual experience into text do all these things less effectively than games that display the avatar's visual experience onscreen.
The assumption here is that video game history follows a teleological progression from lesser to greater transparency. IF becomes commercially unviable because it represents an earlier stage in this progression. For some authors, this is only natural: the fact that computer graphics have outstripped the capacities of IF is cause for celebration. An example of such a view is Julian Dibbell's dismissive description of Adventure as an inferior precursor to Myst: "It's hard to believe that that world once represented the high frontier of computer gaming. Where players of latter-day quests like Myst point-and-click their way through complex graphical environments of an almost liquid radiance […] Adventure was strictly hunt and peck"  [Dibbell 2001]. Other authors characterize the gaming industry's ideology of transparency as unfortunate, and describe IF nostalgically as having been sacrificed on the altar of progress. Aarseth regrets that the text adventure game, a "young, vigorous, if somewhat bland tradition of textual entertainment [...] was quickly overrun by the entertainment market"  [Aarseth 1997, 128]. More recently, Andy Klien began a 2005 article on IF by writing, "Only once in my life have I seen a wonderful medium effectively wiped out by new technology" qtd. in  #douglass2007 .
Yet interactive fiction still exists today, when the graphical capabilities of personal computers are far more sophisticated than at the time of IF's commercial collapse. New IF games are now produced by independent hobbyists and artists rather than by commercial firms. However, for contemporary IF authors graphics represent an elephant in the room, a topic that may not be directly discussed but that can't be ignored. Authors of IF in the post-graphical era cannot avoid the question of why they should bother, since graphics are now better than IF text at doing what IF text does, for which reason IF will probably never again be a commercially viable medium. By way of answering this question, IF authors and critics have sought to claim for IF another type of legitimacy, emphasizing its aesthetic and scholarly appeal rather than its commercial appeal. If IF can't be a popular and commercial medium, it can be an auterist and artistic medium. But in order to prove the aesthetic legitimacy of IF, it becomes necessary to show that IF is an independent medium from the graphical video game because IF text has properties that graphics lack.
Where contemporary IF authors and critics differ is in their conception of the precise nature of these distinctive properties of IF. Within contemporary IF work we can distinguish two very different approaches to defining the specificity of the genre. The first approach is to argue that IF is a linguistic and anti-visual medium.

Ad Verbum: Interactive Fiction and Representational Friction

One way in which IF responds to the seemingly superior representational capabilities of text is by ignoring ekphrasis almost entirely and foregrounding the textual and verbal qualities of the IF interface. The paradigmatic example of this approach is Nick Montfort's 2000 game Ad Verbum.
The player's goal in this game is to remove all the objects from a house belonging to the Wizard of Wordplay. Nearly all of the game’s puzzles must be solved by entering commands according to various linguistic constraints. Exploiting Bolter and Grusin's logic of hypermediacy, this game forcibly reminds the player of its nature as a text-based computer program, rather than a window into a simulated world. This is evident immediately in the introductory text of the game:

With the cantankerous Wizard of Wordplay evicted from his mansion, the worthless plot can now be redeveloped. The city regulations declare, however, that the rip-down job can't proceed until all the items within have been removed.

That's what the demolition contractor explains to you, anyway, as you stand eagerly on the adventurer's day labor corner. Once he learns of your penchant for puzzle-solving and your kleptomaniacal tendencies, he hires you for the job. You hop into the bed of his truck, type a few Zs, and arrive at the site, eager … [Montfort 2000]

"Z" is the standard abbreviation for the "wait" command, so the last sentence erases the boundaries between player and avatar, between typing commands and performing actions. Throughout the game the player is consistently reminded that he or she is not exploring a diegetic world, but typing commands in response to verbal descriptions. Some of Ad Verbum's puzzles in fact involve no interaction with objects or spaces, only manipulation of language. For example, on the first floor of the mansion, the player encounters a little boy, Georgie, who refuses to give up his toy dinosaur unless the player can name more dinosaurs than Georgie can. Georgie knows an arbitrarily large number of real dinosaur names, so the solution is to input fake dinosaur names — i.e. nonsense words ending in "saur" or "saurus" — until Georgie gets frustrated and gives up. Since all the player has to do to solve this puzzle is think of nonsense words, it doesn't matter whether or how the player visualizes the space where Georgie is located.
Other puzzles in the game do force the avatar to interact with rooms and objects, but in order to make the avatar do so, the player has to satisfy certain linguistic constraints. Most notably, the game contains several "constrained rooms" where the output text consists entirely of words starting with a specific letter. For example, at the bottom of figure 4 we see the initial room description of the "Wee Wardrobe."
Figure 4. 
Screenshot from Ad Verbum. Reproduced by permission.
This same constraint applies to the player's input. Obvious solutions like TAKE WEAPON don't work; if the player enters a command containing a word that doesn't start with W, the parser replies, "Wha? Wha? Withhold wrong words. Write wholesomely." The puzzle, therefore, is to command the avatar to take the two objects in the room and then leave, using only words beginning with W.[17] This constraint applies even to nondiegetic commands like HINT, SAVE, RESTART, RESTORE and QUIT, and on first entering a constrained room, the player must read a warning alerting him or her to this fact.
The constrained rooms call attention to the fact that the world of this game is a linguistic construct, a tissue of words and letters. Of course, this is true in a sense of the diegetic world of any IF game: the white house in Zork doesn't exist independently of the language that describes it.[18] Ad Verbum’s innovation is to make explicit the linguistic nature of the IF gameworld. Since the spaces of Ad Verbum are called into being by language, it's logical that these spaces can have linguistic properties, like the property of only containing objects that start with W. However, by virtue of being defined in purely verbal terms, these spaces resist translation into images. What would a room would look like if it contained only things beginning with S? The first letter of an object’s name is not a property which can be perceived by looking at it, especially if the object has various possible names. One can imagine a space based on the physical form of a letter — for example, an S room where the walls, ceiling and furniture have sinuous, snaky curves, or a V room full of sharp, severe triangles. But there is no suggestion that the constrained rooms in Ad Verbum are organized according to the visual properties of their corresponding letters. These are entirely linguistic spaces, and the language of which they are composed is in a sense stripped of visuality. In Ad Verbum, a letter is defined purely in relational terms, as a member of a set with 26 members. The question of the physical instantiation of letters is ignored.[19]
If descriptions in IF are translations of what the avatar sees into words, the Ad Verbum avatar sees things that can't be seen — for example, what letter an object starts with, or whether it contains the letter E. This avatar’s visual experience is fundamentally anti-visual. So the game frustrates the player’s ability to imaginatively reproduce the avatar’s visual experience. If the things the avatar "sees" are unseeable, the player can't imagine what it's like to see those things. This forcibly reminds the player that IF is at bottom a linguistic and programmatic rather than a spatial experience.
Montfort thereby demonstrates that the world represented in an IF game is dissimilar to the material, namely language, that represents that world. This is what James Heffernan, a scholar of ekphrastic poetry, describes as the trope of representational friction, in which the ekphrastic poem calls attention to the artificiality of the artwork it describes [Heffernan 2004, 4, 18–19, 37]. For example, Homer's description of the shield of Achilles includes the statement that "the earth darkened behind [the ploughmen] and looked like earth that has been ploughed / though it was gold "  [Heffernan 2004, 19]. At the same time that Homer celebrates the amazing power of art to reproduce reality, he reminds the reader that the work of art is ontologically dissimilar to the reality it reproduces. Homer celebrates "the wonder [...] of graphic verisimilitude" specifically by telling the reader "that what appears on the shield is not the ploughed earth itself, but gold that has been somehow made dark enough to resemble it"  [Heffernan 2004, 19]. Because the shield is made of gold, not dirt, it can represent dirt only via artifice and convention. By analogy, because poetry is made of language and not images, it can represent images only through a similar artifice. Representational friction, thus, is a trope that foregrounds the dissimilarity between the descriptive poem and what it describes. It reminds the reader that the poem is a poem, not a painting or sculpture: that the reader is not beholding a physically present picture, but imagining a picture based on his or her interpretation of graphic signifers. Representational friction reminds the reader of the nature of the activity he or she performs in reading a poem. It defines the specificity of poetry as distinct from painting and sculpture.
But of course IF players perform an activity that readers of poetry typically don't. In IF, the player does more than interpret signifiers; he or she also enters commands in response to those signifiers. These commands produce changes, often of a permanent nature, in the diegetic gameworld, and thereby determine what signifiers will be given for interpretation next. Montfort also reveals the verbal nature of the process of entering commands. The standard conceit is that when the player types a command, this is equivalent to, and can be visualized as, the avatar performing that action. When I type "take lantern" and press the enter key, I may imagine that my avatar reaches out his or her hand and takes the lantern. Of course, what actually happens is that the game program interprets the words "take lantern" as an action, then checks for whether the action can succeed or not in the present condition of gameplay. If it can succeed, the lantern is moved from its current position and added to the player's inventory [Nelson 2001, 87]. But when Montfort places constraints on the player's ability to enter commands, he reminds the player that commands don't actually involve interaction with objects in or attributes of a diegetic world; all they involve is the generation of signifiers. One puzzle requires the avatar to acquire four books using commands that follow the linguistic constraints used in the text of the books. For example, the "dust casing" does not accept commands that include the letter E, and the "abecedarian book" only accepts commands in which the first word starts with A and the second word starts with B. If the player tries to take these books using inappropriate commands, "a mysterious force holds the book to the … shelves." Possible solutions include ACQUIRE BOOK and LIFT CASING.[20]
In the context of obtaining a book, the words TAKE, GET, ACQUIRE, and LIFT all describe the same action. When I pick up a book, I can use any of these verbs interchangeably to describe what I'm doing. But in Ad Verbum, the "mysterious force" that governs the books will accept only some of these actions and not others. The force allows the avatar to rip the casing or uproot the copybook but not take or get them, merely because the former two actions satisfy the constraint and the latter two don't, even though the four actions are not semantically distinguishable and can all be visualized in the same way. Here Montfort is deliberately subjecting the player to the notorious "guess the verb" situation, where the player knows what he or she wants the avatar to do, but has difficulty finding the specific verb that tells the avatar to do it. When this phenomenon occurs in games, players typically see it a design flaw, because it violates the logic of transparency. In real life, if one knows what one wants to do and if one is physically capable of doing it, one can simply do it. In a graphical video game, the player can just press the button that makes the avatar take the desired action. So why should it be any different in an IF game? Though this is a rhetorical question, Montfort answers it by arguing that an IF game does not follow the procedures of real life, nor those of a graphical video game. An IF game is neither the real world nor a transparent representation thereof, but rather a computer program in which both the input and the output consist entirely of text.
In Ad Verbum, representational friction and guess-the-verb puzzles ultimately serve to define the specificity of IF as opposed to graphical video games. Since IF is clearly incapable of competing with graphical video games in terms of commercial appeal, Montfort seeks to claim for IF another type of legitimacy in terms of aesthetic or academic appeal. Montfort does this by stressing that the visual and spatial aspects of IF are metaphorical, not literal, because IF is a fundamentally linguistic medium. IF is an independent and aesthetically legitimate medium because of, not despite, its lack of graphics. Contemporary IF is not an atavistic throwback to the era before the graphical video game, but an artistic medium in its own right. By situating IF as a textual medium, Montfort is also able to connect it to earlier, more canonical forms of ludic textuality. Thus, Ad Verbum contains explicit references to famous constrained texts like Walter Abish's Alphabetical Africa and Georges Perec's La Disparition. In Twisty Little Passages, Montfort continues this project by arguing that IF has important similarities to the literary genre of the riddle.
Montfort doesn't refute the allegation that computer graphics are more effective in some ways than words at representing the contents of fictional spaces. He tacitly accepts this critique and suggests that the true strength of IF lies elsewhere, in its ability to manipulate the material of language, an ability that graphical video games lack. If the graphical video game is a visual medium, then IF is a textual medium. Visual effects are the proper province of graphical games, while textual effects are specific to IF.
A similar strategy is at work in many other more recent games that exploit the textual properties of the IF browser, although I don't know of any other game that does this to the same extent as Ad Verbum. For example, Jeremy Freese's Violet, the winner of the 2008 Interactive Fiction Competition, features a parser which is personified as the avatar's eponymous girlfriend. This effect is possible in IF because the parser is simultaneously the voice of a narrator and the means by which the diegetic world is presented to the player. The parser not only narrates the events of the gameworld, but actually produces that world for the player. In graphical video games, these two functions are separated. If Violet were a graphical game, Violet would be no more than what André Gaudreault calls a delegated narrator (see [Gaudreault & Barnard 2009, 135–146]). It would be difficult to create the illusion that Violet was actually creating the gameworld by speaking about it.
Moreover, if IF is an independent artistic medium in its own right, rather than an atavistic precursor of graphical video games, then it becomes reasonable to use IF for purposes other than gaming. This is the idea behind the genre of puzzleless IF, which uses IF scripting languages but often abandons the elements of spatial exploration and puzzle-solving. The classic example of puzzleless IF is Adam Cadre's Photopia (1998) and the genre also includes sophisticated chatbots like Emily Short's Galatea (2000).
Affective Ekphrasis in City of Secrets

City of Secrets (2003) is a game about spaces. For most of this game the avatar's goal is simply to explore the setting of the game, known simply as the City, in order to find a mysterious woman named Evaine. The game's puzzles are mostly about overcoming barriers to further exploration, and the primary reward the player gets for solving these puzzles is the ability to explore previously unseen spaces. The City itself is inherently worth exploring because it's a tourist destination, a place of great historical and cultural importance. Short's innovation in City of Secrets is to encourage the player to see this space rather than simply read about it. Short's descriptive language is precise and detailed, but also deliberately limited in terms of what it reveals. However, by deliberately limiting the visual information she provides, Short encourages the player to supply this information by exercising the faculty of readerly visuality.
Figure 5. 
Screenshot from City of Secrets. Reproduced by permission.
The descriptions reproduced in figure 5 accomplish the primary practical tasks of an IF room description: they enumerate the exits from each room and the implemented objects in them, thereby making this part of the game's geography visualizable. However, the descriptions are in no way ultraprecise; they provide insufficient information to permit the player to visualize exactly what these spaces look like. Short neglects to describe the architectural style of the buildings or to specify the number of buildings or the things depicted in the statues. This omission of detail is a deliberate choice on Short's part, since she has also written descriptions which are obsessively detailed. Her 2000 game Metamorphoses contains a number of murals which can be both examined and looked at through a magnifying glass, revealing additional details which can themselves be examined. Short comments, "In writing Metamorphoses I did think of what I was doing as specifically ekphrasis, and that’s one reason there are so many layers of detail within the scenery, especially the murals: I was trying to capture a little of the sense, found in Ovid and Catullus, that worked pictorial objects have astounding levels of detail"  [Short 2009].
What happens instead in City of Secrets is that the omission of details from the text creates gaps in the player's visualization of the scene, gaps which the player then has the opportunity to fill. As Wolfgang Iser has argued, filling in gaps in a text is one of the major cognitive operations performed by readers. Iser characterizes this process as a propositional or linguistic one, but Peter Schwenger, a theorist of readerly visuality, suggests that readers perform this process with images as well as words. Schwenger notes that Iser "speaks of syntheses below the level of consciousness, which he calls 'passive syntheses'. Of such syntheses the basic element is the image"  [Schwenger 1999, 57]. Another way to theorize this process is through Scott McCloud's concept of closure, the process whereby the reader of a comic creates mental images that fill the gaps (or gutters) between the comic's panels [McCloud1993, 66–68]. If the concept of closure was designed to account for texts that consist of sequences of images, then it applies to the IF text insofar as IF, as encountered by the player, involves precisely such a sequence.[22] As explained above, in playing IF the player is presented with a series of visual experiences translated into verbal terms. Closure is what sutures the gaps in this sequence of disparate images.
Schwenger and Iser's visual "filling in," which operates when we read a verbal narrative, is closely analogous to McCloud's "closure," which operates when we read a narrative composed of images. Both these modes of reading involve a synesthetic interplay between the viewer's imagination and the signifiers of the text, whether these signifiers are defined as visual or verbal in nature. Indeed, the similarity of "closure" to "filling in" suggests that these two modes of reading are less distinct than they may appear — that the decision of whether to define a narrative as visual or verbal is to some extent an arbitrary decision, one which is influenced by cultural politics as well as by the phenomenology of the reading experience. Even if we choose to define IF as a genre that employs purely verbal means, the experience of playing IF may not be all that different from the experience of playing a game that employs (ostensibly) visual means.
Playing IF, then, could be as much a visual experience as playing a graphical video game. However, that doesn't rule out the possibility that these two experiences could be visual in different ways: the visuality of IF might differ from the model of visuality associated with graphical video games. As early as 1983, Infocom took precisely this position, arguing in an advertisement that their games "unleash[ed] the world's most powerful graphics technology," i.e. the human brain: "We draw our graphics from the limitless imagery of your imagination — a technology so powerful, it makes any picture that's ever come out of a screen look like graffiti by comparison."This argument, however, still adheres to the logic of transparency: it holds that imagined visuality is more transparent than graphical visuality and therefore better.
A more nuanced way to distinguish between readerly and graphical visuality might be to emphasize the personal, subjective or affective aspects of the former. For Schwenger, reading is necessarily accompanied by a continuous passive process of image generation, but the reader's preexisting visual inclinations and his or her mental repertory of visual images affect the way in which he or she concretizes the text's descriptions:

"[L]iterature consists of a steady stream of erased imperatives," according to Elaine Scarry, imperatives that are often instructions to produce mental pictures. Yet no matter how detailed or precise those instructions may be, they are never comprehensive enough to override the individual’s memory bank of images and associations. These play upon the author’s dictated pictures, an obbligato of the unconscious, of memory and desire.  [Schwenger 1999, 4]

Even if Short's room descriptions were more detailed than they are, they would be unable to supersede the reader's preexisting mental pictures of analogous rooms; for example, however Short described the Sun Court temple, I would inevitably imagine it as looking like the U.S. Capitol. (By contrast, when I visit a similar location in a graphical video game — say, the Bevelle Temple in Final Fantasy X — I see only what the game designers want me to see, and I see the same temple as every other player. The way I understand this visual image is specific to me, but the way I visualize it is not.) What Short does do, however, is to condition how the player sees whatever it is that he or she sees, to suggest the affective resonances of the mental pictures that the player may form. The effect of Short's descriptions say less about what precisely the avatar sees than about how the avatar is affected by what is seen, as Short notes: "With City of Secrets, though, it’s true that I was trying to do something a little bit different [as compared to Metamorphoses]: to hint at the protagonist’s perceptual filters by describing styles and trends rather than straightforward physical detail"  [Short 2009].
For example, the description of the mosaic in the Sun Court reads, "The mosaic is an elegant job and executed in rich materials, but the design has a facile modern quality that does not entirely appeal to you." The temple is described as "[b]uilt in an old style, but unworn, unchipped, unpolluted." Combined with the profusion of illusionistic artwork in this area of the City, especially the façade-painting, these descriptions suggest that the Sun Court is an insincere place. It is recognizably less ancient than it appears to be. This suggests that the City's government, of which this space is the public architectural symbol, is trying to pass itself off as something it's not. Inasmuch as it is conditioned by such hints as these, the player's visualization of this space becomes affectively charged. As a counterpoint to this, here is Short's description of a nightclub called Scheherazade:

Despite the light that leaks in through the windows, the place seems to be trying for a dark and anonymous ambiance, with high-backed booths and wood paneling, a ceiling painted black, and hanging swatches of brocaded purple velvet. The decorations are mostly allusions to the City's distant shady past as an outpost of thieves and smugglers on the Vuine.

Most of these details, again, are not relevant to completing the game, but they assist the player in creatively visualizing the place. The few details that Short does provide — the black ceiling, high-backed booths, and purple velvet — hint at what gives this place a "dark and anonymous ambiance," but the player is invited to fill in the remaining details in his or her own way. The decorations, involving thieves and smugglers, suggest why the place is "trying for" such an ambiance: it is a place of darkness, of secrecy and anonymity, a hideout for outlaws or at least for people who have something to conceal. But at least this is a place that doesn't seek to present itself as something it's not.
What all these descriptions do is to condition how the player visualizes the room. They add an affective dimension to the mental picture of the room that the player involuntarily creates for himself or herself in response to the textual representation of the room.
This effect is further complicated by Short's limited use of graphics. City of Secrets includes a frame containing images, located to the left of the main gameplay window. However, these images are more suggestive or symbolic than mimetic. They suggest the dominant mood or tonality of the scene the player is witnessing, rather than showing anything in that scene. Accordingly, Jeremy Douglass calls the images in this game "ambient illustrations"  #douglass2007. In figure 5, for example, we see a stylized representation of the sun against a field of orange fading into white. This image doesn't depict anything in the Sun Court, except perhaps the sun symbol on the pavement, but it suggests the offputting, blinding sunniness of the scene.[23] What we see here is a complex, synaesthetic interplay between the images described in the text and the images that the text is. The actual images help to shape the player's mental images, at the same time that the latter inflect the player's interpretation of the former.
This is a text that attends to the way in which text is inescapably a visual phenomenon. In this context it's worth noting that although City of Secrets allows the player to change the font, text color and other such options, the title screen and the left-hand window include text which is not affected by such changes.
Figure 6. 
Screenshot from City of Secrets. Reproduced by permission.
Without speculating on the metaphorical associations of this font, I merely note that it was chosen deliberately. The player enters this game through the threshold of an image which is primarily composed of textual signifiers, yet contrary to my commonsensical definition of text, the precise visual instantiation of these signifiers is clearly important.
For a certain subset of the game's audience, City of Secrets was an even more material and visual experience than it is today. On releasing the game, Short offered players the opportunity to purchase a special edition of the game that came with a boxed set of "feelies." The term feelies refers to "[m]ultimedia epitexts such as journals, maps, and artifacts, bundled to illustrate the IF work. Popularized by Infocom"  #douglass2007. Commercial IF games were physical artifacts — floppy discs packaged in boxes and sold in brick-and-mortar stores — and the inclusion of feelies further intensified the physicality of those objects. (Feelies served the additional practical function of copy protection; games like Sorcerer and Leather Goddesses of Phobos were unsolvable without information which was printed on the feelies, and which, in a pre-World Wide Web era, would have been otherwise unavailable.) This physical side of the IF experience was lost when IF moved to a digital model of distribution. Seeing this as an unfortunate development, Short helped to create a website, feelies.org, that produced and distributed feelies for contemporary works of IF:

feelies.org started with a conversation that I had with some of my friends in the IF community, about how the one aspect of commercial IF we really missed (as players) was the feelies. Some modern IF comes with "virtual feelies" — PDF files or fake Websites or whatever that are distributed in a Zip file with the game — and I like those, but we were also missing the tangible physical objects.  [Loguidice 2004, n.p.]

The City of Secrets feelies included such items as a "[t]ourist guide to the City, including map, digitally-offset printed by Imagers.com in full color on glossy paper" and a "[q]uantity of dried liontail in a labeled plastic bag, contained in velvet and/or satin gift bag from boutique magic shop." For players who did not purchase the paper feelies, Short also created an online website for the Southern Light Rail company (this website is now defunct, but has been cached by the Internet Wayback Machine). This website prominently features the same font used in the game's title screen.
The fact that Short paid so much attention to the physical and material aspects of City of Secrets indicates that for her, the visual instantiation of an IF game is not an irrelevant cosmetic detail. It directly influences the player's experience of the game, an experience which is visual in multiple senses. The visuality of City of Secrets results from a collaboration between the preexisting visual memory of the player and the visual details, verbal and graphical, supplied by the author, as focalized through the "perceptual filters" of the avatar — who, unlike the avatars in Zork and Ad Verbum, is a well-defined character with a particular personality and history. The visual experience of this game depends on a complex and shifting interplay between the player's visual memory, the details the author provides via the protagonist-avatar, and the imagetextual aspects of the gaming itself.
Now I suggest that such a visual experience has little to do with transparent immediacy. A transparent visual representation, by definition, is minimally mediated; it presents the visualized scene without distorting filters, so that it looks the way it would if it were present before the viewer. The goal of Short's language in this game is not to create such visual representations. In a text-based game, the only way to create such visual transparency would be to provide a large amount of precise descriptive detail, so as to permit the reader to imagine exactly what every aspect of the scene looks like. However, Short argues in her blog post “The Prose Medium and IF” that such “detail for detail's sake” is unnecessary and potentially harmful in IF, where ekphrasis is prescriptive, not autotelic.[24] The purpose of details in IF prose is to give the player the information he or she needs to complete the game. Players are expected not just to process the details but to use them as a guide for how to interact affectively with the game's operations and its diegetic world. Providing excessive detail would be distracting and tiresome. Short explains, however, that detail can do something else:

Some of the most effective writers of mood create their effect not with a large number of common details (the flowers are red, the door is yellow, etc) but with a small number of very particular ones; and I think that that is especially true in IF. Words in interactive fiction individually carry more weight than they carry in static prose, if only because of the amount of attention we demand the player give to each one. […] I think I would find [P.D. James's descriptions] to be overkill in an IF game. They’d need to be shortened and focused, because each sentence would do the work of three or four sentences in the static prose version. In this respect IF is closer to poetry than to conventional prose: it is worth taking more time to select fewer words, because each one will be inspected through a jeweler’s loupe.  [Short 2008, para. 19, 20]

Short suggests here that the purpose of details in IF is not to create a vivid, immediate and sensuously present mental picture of a scene, but to suggest the mood associated with that scene. It does this by providing sparse but carefully selected details, which serve the player as building blocks around which a more complex and personal vision of the scene can be created.[25] When Short mentions that Scheherazade has high-backed booths, a dark ceiling, and decorations that show thieves and smugglers, she does more than simply inform us that these things are present; she also hints at the affective resonance of this place. She doesn't tell us what precisely this place looks like, but she provides us with affective lenses that we can apply to our own visualization of the place. Short's goal in this game is not to match the transparency of graphical video games, but to activate a mode of visuality which is affectively rather than sensuously vivid. Ekphrasis has been used for this purpose since ancient times: Quintillian wrote, for example, that lawyers should use ekphrasis only where "motivated […] by the speaker’s emotional engagement with and amplification of his client’s plight" qtd. in  [Koelb 2006, 29] . For ancient rhetoricians, ekphrasis was not a transparent means of visual representation but a tool for augmenting the emotional resonance of the described scene. City of Secrets suggests that this effect becomes, if anything, more potent when the described scene is an interactive one.
In Ad Verbum, the player needs to directly engage with the verbal properties of IF in order to finish the game. City of Secrets doesn't similarly require the player to visualize in order to complete the game (except at the minimal level described above with reference to Zork), but this is because City of Secrets is a deliberately simple game. As Short writes in the game's ABOUT text, "This game is meant to be playable even by someone who has never encountered interactive fiction before, and be a gentle introduction to the genre. It is not terribly difficult, nor is it possible to die until the very end." However, her other works do often require the player to visualize and to do so in an affective and critical way. In Savoir Faire (2002), a deliberately challenging adventure game, the player has the magical power to create "links" between two similar objects, whereby one object takes on the properties of the other or is affected by events that occur to the other. In order to use this power effectively, the player has to observe visual (and other) similarities between the two objects, and this may require a minute inspection of the two objects involved. For example, the first puzzle in the game is to open a locked pair of doors.[26] The description of the doors reads, "A pair of white-painted doors that lead into the upstairs corridor of the house. Each door panel is decorated with the family crest, picked out in ostentatious gold, as though to warn servants not to wander that direction uninvited." In a nearby room the player finds a teapot, whose object description reads, "In order to make the linkages possible, however, it has been painted a glossy white, and the crest of the family executed on one side in intricate detail." The solution to the puzzle is to link the doors to the teapot, then open the lid of the teapot, causing the doors to open. This works because the teapot and the doors are both white, openable, and decorated with the same crest. To notice these similarities, the player has to read the descriptions of both objects "through a jeweler's loupe." In doing so, the player may visualize the two objects, but even if the player doesn't do this, the player's activity of closely reading the descriptions is equivalent to the avatar's activity of closely examining the objects. Solving this puzzle requires engaging in a mental operation in which reading and looking are inextricably linked. Yet this reading/looking process is not exclusively goal-directed. At the same time that the object descriptions provide the player with the information necessary to solve the puzzle, they also help the player to imagine both the visual appearance and the affective resonances of the objects referenced. As this example suggests, affective ekphrasis can be a technique of both puzzle-solving and worldbuilding; using Douglass's distinction, it contributes to both the "gamelike" and the "narrative" qualities of IF.
City of Secrets combines the emotional vividness of visual prose with the ability to interact with the visualized world through an avatar, a combination which is perhaps unique to IF. Instead of trying to match the transparent visuality of the graphical video game, it provides an IF-specific experience of affective textual visuality. This is a second possible way in which IF can define itself as an artistically viable medium and not an inferior precursor to the graphical video game.
Ad Verbum and City of Secrets adopt two opposing strategies for demonstrating the continuing value of IF in a post-graphical age. Ad Verbum suggests that IF needn't try to compete with the visuality of graphical games because IF's strengths lie in its nonvisual aspects. City of Secrets, by contrast, demonstrates that IF can be visual in a way which may be inaccessible to graphical games. What both games implicitly argue is that even if IF games can't (or shouldn't) compete with the visual transparency of graphical video games, the creation of IF games can still be a viable artistic pursuit. The coming of graphics doesn't kill IF, but it does force IF to adapt.
To summarize, I have argued that IF is an ekphrastic medium insofar as it provides the player with a textual translation of the avatar's direct visual experience. Unlike traditional ekphrastic poetry and prose, however, IF is prescriptively ekphrastic in that it asks the player to perform concrete actions in response to its textual pictures. In the post-graphical age, prescriptive ekphrasis becomes a threatened mode of visual representation because computer graphics seem to have a superior ability to model the diegetic world of the game. In order to justify the continued production of IF, contemporary IF authors have adopted at least two strategies for responding to this threat. The point of both approaches is to argue that IF offers players experiences that graphical video games cannot match — an argument which ekphrastic poetry often implicitly makes with respect to painting. Where the two approaches differ is in how they characterize these experiences which are unique to IF. One strategy, as demonstrated in Ad Verbum, is to abandon prescriptive ekphrasis and concentrate on the purely textual experiences that IF can offer. The other strategy, which we find in City of Secrets, is to employ an affective rather than a mimetic mode of ekphrasis, thereby creating emotional effects that would be difficult to replicate with graphics.
Even the first strategy, however, is still predicated on the visual properties of the IF genre. Despite claiming to present a world composed purely of linguistic signifiers, Ad Verbum still structures those signifiers according to a world model composed of rooms and objects, and such a world model, as I've argued, must be visualizable in order to be navigable. In City of Secrets, visualization of the world model becomes the primary appeal of the game. To differing extents, both texts ultimately offer the player the opportunity to collaborate with the author in imagining a world. As the product of the player's affective visualization, this world is, at least ostensibly, more intimate and personal than the vivid, transparent worlds of commercial video games can possibly be. If authors like Montfort and Short still write IF, and if players like me still play it, then this testifies to the existence of a desire for spatial and visual experiences which are more imaginary or affective than transparent. Regardless of the vivid immediacy of the spaces that graphical video games allow us to inhabit, we still want to inhabit spaces which, to quote the inscription on the living room door in Zork, are intentionally left blank.


This paper couldn't have been written without the inspiring teaching and dedicated assistance of Terry Harpold. I thank Nick Montfort and Emily Short for their encouraging comments and for permission to reproduce images.


[1]Hereafter abbreviated IF. The genre of IF is also known as the text adventure, although these terms are not precisely synonymous. The term interactive fiction includes all or nearly all works that use a text-based parser, while the term text adventure privileges gamelike works that feature "out-of-the-ordinary undertakings involving risk and danger"  [Montfort 2005, 6]. (For further discussion of the difference between these terms, see [Montfort 2005, 6–8].) Interactive fiction implies a view of IF as an aesthetic object, while text adventure implies a view of IF as a gaming genre. As Jeremy Douglass argues, however, these perspectives are not mutually exclusive: "IF objects are sometimes games that are played, and sometimes stories that are read, and often both or neither. Further, their narrative and rule aspects interact continuously at a deep level" #douglass2007. In this paper I'm going to focus on what Douglass broadly defines as the "narrative" aspects of IF, specifically including its manipulation of signifiers and its creation of imagined worlds, and I will give less attention to the ludic or gamelike aspects of IF. This makes it easier to compare IF to non-digital forms of visual textuality which are only minimally gamelike if at all. However, it must be understood that the aesthetic and gamelike qualities of IF are inseparable.
[2]See, for example, [Montfort 2005, 2] (claiming that IF "has been a major current in electronic literature") and [Montfort 2005, 37–63] (explaining how IF descends from and is comparable to the textual riddle). Similarly, Jeremy Douglass's dissertation on IF begins, "Re-examining historical and contemporary IF illuminates the larger fields of electronic literature and game studies"  #douglass2007. Douglass further suggests that the two primary critical perspectives on IF are electronic literature and games studies #douglass2007.
[3]The unstated assumption behind these questions is that computer graphical media, like video games and CG animated films, do in fact offer visual experiences which are transparent rather than phantasmal or indirect. Although this assumption is commonly made by authors like Dibbell, it is in fact just as questionable as the assumption that IF doesn't offer any visual experiences. I even believe that phantasmal and indirect visuality is commonly found in CG films and video games, although this claim is beyond the scope of this essay. I would suggest, however, that for IF authors and critics, it's strategically useful to mischaracterize computer graphics as primarily focused on transparent visuality. Such a mischaracterization leads to a stark divide between the visuality (or lack thereof) of IF and that of video games, which makes it possible to carve out an exclusive niche for IF.
[4]Obviously this is a naïve assumption which has long since been called into question by textual critics. See, for example, McGann's demonstration of the importance of "bibliographic codes" to the interpretation of texts [McGann 1991].
[5]This spatial arrangement is the sixth item in Wolf's taxonomy of spatial structures in video games. It is essentially the same spatial arrangement we find in graphical adventure games like Myst or in the LucasArts games developed using the SCUMM engine, except that in such games the individual chunks of space are represented in three dimensions, not two. As Wolf observes, this spatial structure is similar to the continuity editing style in film. "Adjacent spaces displayed one at a time" is also the characteristic spatial arrangement of comics and graphic novels.
[6]Some IF games, such as the late Infocom game Arthur: The Quest for Excalibur (1989), do employ graphical depictions of rooms and objects. Usually, however, these depictions serve merely to illustrate the text, and the player needn't refer to them in order to complete the game. For example, Arthur gives the player the option of turning off the graphics. Roberta and Ken Williams's Mystery House (1983) was revolutionary because it included vector graphics alongside its textual room descriptions, and these graphics included essential information not given in the text.
[7] Similarly, many games feature hidden exits or switches that aren't mentioned in room descriptions, and can't be used, until the avatar learns about them. For example, in Jon Ingold's The Mulldoon Legacy, the avatar can open a secret passage by feeling a certain wall. However, this action only works if the avatar has already learned, from information given elsewhere in the game, that there is a secret passage behind the wall. If the player tries to feel the wall before the avatar has received this information, then the parser replies "Playing a restored game, are we?" thereby criticizing the player for breaking the fourth wall. See [Ingold 2000].
[8]According to game designers Katie Salen and Eric Zimmerman, meaningful play is the criterion by which the success of game design is measured. They define it as "the process by which a player takes action within the designed system of a game and the system responds to the action. The meaning of an action in a game resides in the relationship between action and outcome"  [Salen & Zimmerman 2003].
[9]I want to carefully distinguish here between visualizability and visualization. Visualizability is a property of the IF text. Visualization is a faculty of the player, which represents one of several possible modes of realizing visualizable information. It's possible that players might understand a visualizable text in a nonvisual way. For example, IF is popular among blind computer users, who are obviously unable to visualize the things the text mentions. I conjecture that such players might process the visual information in the text in a haptic or tactile way.
[11]In fact, this game was created specifically to prove that a puzzle could be simple and logical without being fair. During a debate on the rec.arts.int-fiction newsgroup, Baggett asserted that such a puzzle was possible, and created +=3 to demonstrate this point.
[12]These are two of the texts Alexandra Wettlaufer discusses in In the Mind's Eye: The Visual Impulse in Diderot, Baudelaire and Ruskin. She cites them as examples of the " 'visual impulse', that is, the desire to render the act of reading a visual experience"  [Wettlaufer 2003, 21].
[13]The most sophisticated treatment of the text-rendering capabilities of early video game platforms is #whalen2008.
[14]IF still remains significantly more cross-platform than most graphical video games, however, and this factor makes IF one of the easiest video game genres to teach.
[15]See [Montfort 2005] for a detailed account of the history of IF. Douglass critiques Montfort's historical account by arguing that the commercial era was in fact an anomalous exception to the norm of independent development of IF.
[16]See also Mark J.P. Wolf's observation, that "[p]art of the reason for the use of all text, at least initially, was the difficulty of doing graphics"  [Wolf 1997, 13].
[17]One solution is WIELD WEAPON, then WHACK WAINSCOTING WITH WEAPON (revealing a "weird widget") then WIN WIDGET, then WITHDRAW.
[18]To this extent, room descriptions in IF games are what John Hollander calls notional ekphrases, i.e. ekphrastic texts describing visual phenomena that don't actually exist and can't be seen [Hollander 1995, 4].
[19]In an unpublished essay, Edmond Chang asserts that Ad Verbum emphasizes the materiality of letters, including their physical presence. He suggests that when, for example, the player is prevented from using the constrained passage while carrying items, there is a sense that the words themselves are obstructing the player – whereas in any other IF game, the player would imagine that the objects the words represent were causing the obstruction [Chang 2004, paragraph 10]. I would suggest, though, that if the words in Ad Verbum are material, they are material in a strange way. No attention is paid to their typographic properties or type design; we don't know, for example, what font is used to write the words in the game. Nor does Ad Verbum engage in the sort of typographic play that we find in a text like Otto Messmer's Felix the Cat cartoons, where characters can physically interact with punctuation marks.
[20]These two books allude to Georges Perec's novel La Disparition and Robert Pinsky's poem ABC. Perec's book is written without the letter E. Pinsky's poem consists of 26 words, the first beginning with A, the second with B, and so on. Montfort's abecedarian book follows a similar constraint but is limited to two words, which is the most common length of an IF command.
[21]The fact that Emily Short can be identified with both the "anti-visual" and the "differently visual" approaches to IF is evidence that these two approaches are not mutually exclusive, although they may coexist somewhat uneasily.
[22]Douglass claims that closure operates in IF at the level of the command line, where the player makes "an attempt (which may be frustrated) to discover or solve the gap between the current state of the simulation and its next state"  #douglass2007. He argues, however, that closure in comics is purely retrospective (it operates only after the second panel is read) whereas closure in IF is prospective, acting to fill a gap before the player knows what's on the other side of the gap. I'd suggest that retrospective closure is also involved in IF; it operates to connect adjacent room or object descriptions, or even adjacent pieces of visual information in a single room or object description.
[23]Note also that these images aren't mapped to specific rooms; they change when the player triggers important events in the game's narrative.
[24]Excessive use of detail also tends to be considered harmful when it occurs in prose fiction; overly vivid descriptions are often criticized as purple prose. As Seymour Chatman notes, the amount and granularity of detail are among the major factors in which film differs from narrative fiction [Chatman 2004, 48], and this certainly also applies to IF and graphical video games.
[25]A literal version of this process occurs in the science fiction film Inception (2010), where "dream architects" create blueprints for dream spaces, and the individual dreamer then fleshes out these blueprints by adding "projections" of objects drawn from his or her memory.
[26]The same example is used in a different context in #mitchell2009.

