3D aural interactive hyperstories for blind children

Interactive stories are commonly used for learning and entertaining purposes enhancing the development of several perceptual and cognitive skills. These experiences are not very common among blind children because most computer games and electronics toys do not have appropriate interfaces to be accessible by them. This study introduces the idea of interactive Hyperstories performed in a 3D acoustic virtual world. The hyperstory model enables us to build an application to help blind children to enrich their early world experiences through exploration of interactive virtual worlds by using 3D aural representations of the space. We have produced AudioDoom, interactive model-based software for blind children. The prototype was qualitative and quantitatively field-tested with several blind children in a Chilean school setting. Our preliminary results indicate that when acoustic-based entertainment applications are careful applied with an appropriate methodology can stimulate diminished cognitive skills. We also found that spatial sound experiences can create spatial navigable structures in the mind of blind children. Methodology and usability evaluation procedures and results appeared to be critical to the efectiveness of interactive Hyperstories performed in a 3D acoustic virtual world.


Introduction
Interactive computer games have been used for entertainment purposes for some time. However, more recently these games have been available to a wider population of children. Today, most youth worldwide have had some type of experience with computer games delivered preferably through video-based devices (Druin et al, 1996). This scenario is not the case for children with disabilities (BCC 1993). They do not have interactive entertainment software available in quantity and variety. The case turns more critical with blind children because they cannot take advantage of visual games.
This study reacts to this growing need by introducing an application based on highly interactive hypermedia stories for blind children. Hyperstories for blind children intend to assist the enrichment of early world experiences through free exploration of interactive virtual worlds by using a 3D aural representation of the space and surrounding entities such as the ones explored in (Mereu et al., 1996;Savidis et al, 1996;. The application introduced here stems from a design model for generic hyperstories. A hyperstory is defined by the combination of a navigable virtual world, a character manipulated by the user, a set of dynamics objects and other characters, and the traces of interaction among these entities given by the plot of the story (Lumbreras and Sánchez, 1997).
Our main research issue was to explore a model to describe an acoustic navigable environment. We also explored how a spatialized acoustic modality combined with haptic manipulation of the environment allows kids to construct or reconstruct cognitive structures such as haptic/acoustic correlation, spatial navigation without visual cues, object permanence in time through the hyperstory methapor, and interface and usability issues related to interactive software for blind children.

What is a Hyperstory?
There are several applications, disciplines and computer environments that come together to the concept of hyperstory. One type of these environments are MUDs (Multi-User Dungeons) and their variations (MOOs, etc.). In the original version, these text-based systems allow many users to connect simultaneously to virtual "worlds" composed by rooms, objects, and people. Depending upon the design of a particular system, themes vary from fantasy environments with dragons and wizards to futuristic exploration with spaceships and aliens.
Our model extends these ideas by including the elements of a story. These elements are plot, roles, and characters. The main idea is to capture these elements in the representation (Hayes-Roth et al., 1996). Plot is a temporal sequence of actions involving a set of individuals. A plot and its constituent actions may be quite abstract, i.e.: A meets B, A loves B, A loses B, A wins B. Role is the class of individuals whose prototypical behaviors, relationships, and interactions are known by both actors and audience. For example, the plot outlined above ordinarily is instantiated with alternative roles, for instance: the boy in love and the girl he loves. Character is a personality defined as a coherent configuration of psychological trait, for instance, any of the characters in the present scenario might be: shy and sensitive, silly and affectionate.
We first introduce the definition of a Story Virtual Environment (SVE) as: SVE : = navigable world I + dynamic objects II + characters III (1) where:

I. Navigable world is:
In charge of modeling the virtual world composed by several navigable environments connected by links. This can be seen as a special case of hypertext, each node basically represents a container of objects and a potential scenario of the hyperstory. Physical gates, portals and doors represented as links render the connectivity. The concept of associated hypertext as underlying model to describe spatial navigable metaphors is commented in (Dieberger 1996).

II. Dynamic Objects are:
In charge of representing the objects of the virtual world. They are entities that have behavior in time and react to the events produced by the user and other entities.

III. Characters are:
The entities that carry on the main course of events involving a complex behavior. There is a distinguished character called the protagonist. This is manipulated by the user and represents the user-system connection. If the protagonist is viewed in third person, an avatar will be in charge of this representation. Characters are special cases of dynamic objects and are very important to the story level. Characters represent the main plot and elicit the content of the story.
But at this point a MUD or a computer adventure game could be in some way similar to the previous definition. A Hyperstory (HS) is an extension of this concept and structurally is: For this reason our model extends the idea of HVE by introducing the idea of an intentional sequence of events, based on plot, roles and characters. Other differences from MUDs arise from the idea of closure or an explicit end, described as a good feature of narratives (Landow 1992).

THE ADDED VALUE OF HYPERSTORIES
A hyperstory is an interactive story guided by an intentional argumentative structure in a greater degree than a casual scenario. The plot in a hyperstory is not linear, is a hyper-plot. Here, action, object activation and dialog can trigger a change in the flow of the story. Thus, we borrow ideas from hypertext/hypermedia technology by including narrative in a virtual environment context (Bernstein 1996). Hyperstories have improved conventional literary stories by allowing a "dynamic binding" between characters, the world in which they move, and the objects they act on . The learner becomes involved in this dynamic binding through a greater flexibility in the learning process. In other words, a hyperstory is a combination of a virtual world where the learner can navigate, a set of objects operated by the learner, and the pattern of interaction between entities (Lumbreras and Sánchez, 1997). Slight changes introduced by the child to the object's behavior can produce different hyperstories in the same world.

The Model
Our model is a design model based on object oriented concepts, providing a framework to describe the diverse building blocks of the hyperstory. The model supplies a framework composed by three foundational classes as described in OOD techniques (Rumbaugh et al., 1991). These classes are: context, link, and entity. In addition to these classes, there is an associated constructor called channel, similar to the route constructor in Virtual Reality Modeling Language environments. Contexts model the static world, links model the connectivity between contexts, entity is the abstract class that captures any object or character definition, and channels work as a broadcast media of events in a fan-in or fan-out fashion to the subscribed entities.
Each base class has a predefined behavior and a set of attributes that makes them different from each other (e.g. a link knows about the transportation of entities between contexts). Another example of specialized behavior arises from contexts: if an entity sends an event to a context, it sends the event to all contained objects. Thus a context works as a diffuser of events. All these base classes have behavior, based on a modal programming. We used Objectcharts as the formalism to specify behavior (Coleman et al., 1992).

MAIN CONCEPTUAL DESIGN BUILDING BLOCKS
A hyperstory specification can be split in two interrelated conceptual parts by using the following classes: • static scenarios (contexts and links), • objects (entities) and the explicit routing mechanisms (channels).

The static world
Hyperstories with several scenarios organize them according to their physical connectivity (linking). For this purpose, we can describe the virtual world as a kind of nested context model. A virtual world is defined as a set of contexts that represent different environments. Each context contains an internal state, a set of contained contexts, a set of objects, links to other contexts, and a specific behavior. Different relationships may be held between two different contexts, such as: • neighborhood (there is a link from one context to the other), • inclusion (one context is included in the other), or • none (contexts are "disjoints").
Different "real world" metaphors can be implemented easily with this simple model, such as a town, a house, a room -or houses within a town and rooms in a house. Another important concept about context is perception: a context is a spatial container that can be perceived as a whole rendered as a unity at the interface level. At this point of the design, we are dealing with the first term of the Eq. 1.

Populating the world
In order to bring life to the hyperstory, we populate the environments with objects, some active, and some passive, orthogonaly composed by a navigational dimension. To avoid misunderstandings, we briefly define some terms concerning objects in this context.
• Passive: the object answers only to simple events like "Who am I?" • Active: the object has a noticeable behavior while the time progresses -continuous or discrete-or they respond to events with some algorithm that reflects some behavior. • Static: the object always belongs to the same context. • Dynamic: the object can be carried to the contexts by some entity or travel autonomously.
By using state-based programming (Taivalsaari 1993) we can include narrative in our hyperstory. Our approach enables us to define different object behaviors according to the hyperstory stage or the Object State. Thus, by embedding narrative in the behavior of the entities we are satisfying the Eq. 2.

Audiodoom: A Hyperstory for Visually Impaired Children
Our aim was to test the hypothesis that a highly interactive and immersive aural environment can serve as a tool to stimulate and reinforce some processes and skills in blind children, such as spatial representation. Sound serves as the output media of the system, but the transient nature of the sound imposes a bias in the interface design, leaving this tightly linked to temporal constraints. For this reason, the conceptual idea of interactive narrative combined with the challenge of a game must be organized and rendered in a very simple way to model our target user, blind children aged 8-12 years old.
AudioDoom is the prototype that enables us to test our ideas about interactive hyperstories for visually impaired children. This software is based on the idea of usernavigation in a set of corridors where the child gets and interacts with virtual objects, resembling in some way the classic Doom game. In the course of the hyperstory, the child encounters characters, objects, and challenges that may change the flow of the plot of the story.
The structure of the flying source is presented as a set of perpendicular corridors with different lengths (see Fig.1). These corridors are connected by means of doors that can appear at the end or at the side as an optional exit to other corridor. In each case the user can activate the desired door in order to access to the right corridor. Related to the physical navigation inside a corridor, the user is allowed to move in forward direction step-by-step. Certain entities can appear suddenly after a current step has finished. If this happens the user must solve a challenge depending on the type of entity found. To interact with the virtual environment the child operates over the surrounding space, acting on voxels or minimal discrete units of volume. This is similar to the concept of pixel as the minimal unit in a display. The voxel concept determines a discreteness of the space, simplifying the surrounding positions of interaction and creating a concrete repository for a certain entity.
Typical actions with AudioDoom involve getting objects (box of bullets), shooting an entity (monster and mutant), or localizing and interacting with a determined character (the catcher) in a position of the space. The soul of the story presents multiple branches, but some of them are not deterministic, because some story entities may or may not be encountered, depending on the casual user-entity encounter. This scenario brings new alternatives in each session with AudioDoom. This spatial sequencing of the space enables the user to be involved in the story in crescendo level of complexity.
The added value of AudioDoom comes from the fact that we have used the hyperstory methapor to evaluate how a virtual acoustic representation can build a mental spatial representation in blind children. For this reason, we have built some tasks where the child interacts several times with AudioDoom and then tries to describe the taxonomy, organization, hyperstory entity location, and space organization of the environment by using LEGO blocks. In short, the hyperstory serves as an engagement device to test our hypothesis.

INTERACTING WITH AUDIODOOM
The actions developed with AudioDoom occur at the interface level in a certain voxel. In any moment, a voxel can be empty or contain an entity. This entity usually is a virtual object represented acoustically, a door, a box, a character, etc. This entity can receive some events from the child depending on the entity: take, activate, and open. AudioDoom presents a modal interface where the same physical event can be interpreted according to the context, mode, and entity location in the target voxel. We must take into account that an entity can have a kinetic behavior, a movement in space along the time. This activity involves several voxels because a voxel is an atomic space container. This approach may appear a little restrictive, but we can divide the environments into the desired quantity of voxels until we obtain the desired granularity.
From the child's point of view, AudioDoom is manipulated by using a wireless ultrasonic joystick called The Owl (Pegasus 1997). Through this device, the child can interact and move in the environment by clicking in different voxels of the surrounding space (see fig. 2). According to the position of the sound, the child must coordinate the haptic/kinesthesic device with the perceived sound position. This scheme of action-reaction is strongly stimulated in the child, because of the strong haptic-acoustic correlation embedded. To deal with this topic we design AudioDoom to be mainly used with a ultrasonic joystick with three degrees of freedom (X,Y,Z) and the use of 3D sound, but the child can also use AudioDoom either by interacting with the standard keyboard or the mouse.

THE DYNAMIC OF INTERACTION
The basic idea of AudioDoom is to split the entire navigable space into small atomic environments. An atomic environment is the minimal scenario of action in a given moment. In this environment the child can interact with entities in different voxels. The linear connection of atomic (A) (B)

Figure 1. Artistic version of the AudioDoom environment (A). Related to the hyperstory model, we have modeled AudioDoom as a set of context (grey rounded rectangles) linked by doors (lines with rounded ends) as the diagram (B) shows.
environments renders a corridor. This structure organizes the space into several corridors, giving a semantic and argumentative connection of the hyperstory and the space. These corridors are modeled as contexts and the doors as links.
The child can make different types of activities in an atomic environment such as: • To move forward the next atomic environment by giving a step, • To open a door, • To make a turn (this action has sense if a door appears in a different direction to the advanced), or • To interact with an entity in a certain way If we consider the type of presentation media and the method of interaction of this hyperstory with a strong physical metaphor, we must consider three key points at the interface: the structuring of elements at a moment, the orientation of objects, and the dynamic of selection and interaction. In general, the system presents one or several entities at the time, each localized in a voxel. Then, the child after the acoustic localization, tries to orient the entity and issue some events. According to the type of the entity, the interaction can be reduced to a discrete event -take a bullet box or to hit a door to be opened-or could be a chain of events with a given purpose: i.e. to shoot three times to destroy an alien, to shoot several times to destroy a mutant moving randomly between contiguous voxels.

INSIDE AUDIODOOM
AudioDoom was conceptualized with the following constraints: • Stimulate spatial relations, by exploiting physical environment surrounding the child • Present disjoint and distinguishable acoustic entities, located in some point of the space • Clearly isolate the input-output media in order to test various concepts according to each device • Reflect a real time response related to the child's actions Moreover, we choose to produce software used by a wide population. In South America, schools and institutes for disabled children have few resources. For this reason, our software must run in a minimal platform. All these restrictions must not inhibit the chance to render a virtual acoustic environment. In this version of AudioDoom, the sounds are only presented at ear level, which means that we do not include elevational cues.

IMPLEMENTATION ISSUES
Our approach followed the idea that if some entities can move between n possible voxels, we take the monophonic sound of this entity. Then by convoluting different sets of HRTF -one pair of each position-to the monophonic sound, we obtain n clips of 3D sound. This processing was done off-line. The result is a big set of 3D sounds requiring only a cheap sound board to be played (see Fig. 3). To deal with the real time mixing -background music, sound effects, entity sound, etc.-we use the Dynamic Link Library wavemix.dll -included in MS Windows-. Thus the execution hardware platform needs only a PC, Windows 3.1, and a stereo soundboard.

The Evaluation Of Audiodoom
At the beginning of our work there were several unclear topics: could this study allow to make strong inferences between causal relations?, how well defined were the theoretical ideas?, how confidently we can predict that our findings are true and reliable, and will be not limited?. Thus, we begin our evaluation with an exploratory approach to identify the mechanism related to perception and externalization of psychologically quantified variables. In this context, the evaluation of AudioDoom was an exploratory experiment, where by using several strategies we intend to know clearly the domain. The evaluation of AudioDoom was qualitative in a sense that we try to establish relevant elements about usability of interactive applications to be used without visual cues and to determine if our hypothesis was well grounded.

THE TESTING SCENARIO
AudioDoom was tested with seven Chilean blind children aged 8-11 ranging from totally blind since birth to other children with light and dark discrimination. The application was made in a Chilean school for blind children.
In the first session the child interacts with AudioDoom by using the keyboard. The keys F, J, Enter, and the Space Bar are used to serve as landmarks to orient the child in the keyboard. After a short oral explanation the child explores the interface and begins the hyperstory. The child interacts with AudioDoom during five hyperstory sessions and then we set the first evaluation. By using LEGO blocks the child tries to represent the structure of the environment of AudioDoom as he imagines and perceives it (see Fig. 5).
The process of perception-navigation-structure building of AudioDoom can be depicted in the flow char of Figure 6. As the previous figure shows, several critical points arise in this process. The first one is the detection of some flaws in the model built by the child (erroneous assignment of entities, misalignment of doors, etc.). One of the common patterns of errors arises from erroneous or complex door positioning.
As the Figure 6 shows, several critical points arise in this process. The first one is the detection of some flaws in the model built by the child (erroneous assignment of entities, misalignment of doors, etc.). One of the common patterns of errors arises from erroneous or complex door positioning.

Discussion
After a preliminary user evaluation we have demonstrated that it is possible to render a spatial navigable structure by using only spatialized sound. This mechanism preserves to a notable degree the structure, topology, structural relationships, meaningful orientation, navigation, and mobility elements. The result is preliminary because we have not included free navigation in open places within the virtual environment due to the restriction of the navigation in straight corridors with divergent branches connected to 90°. Some children show some difficulties, especially with mapping transversal corridors. This problem apparently arises from the fact that the turn disorientates the user, because the real surrounding space is fixed -chair, table, etc.-For this reason, we face as a key issue the representation of distinguishable milestones in the environment to facilitate the orientation. The use of some artificial auditory beacon can improve the orientation. Even though we use 3D sound with several limitations -no head tracking, limited quantity of voxels-children usually prefer external speakers. Children are not so clear about this fact but one reason could be that the headphones impose the isolation, limiting the oral interaction with the evaluator. The discomfort imposed by the headphone used (a Sony MDR CD30) appears to be another reason. We detected this pattern of preference at the beginning, so we adapted the HRTFs to be used to external speakers by reprocessing the amplitude of the signal of each channel. This result motivates the use and study of transaural audio, which enables to spatialize sound with external speakers (Gardner 1997).
We carefully observe the mechanism of interaction in AudioDoom. The keyboard offers better confidence because there is no ambiguity of the selected voxel. The ultrasonic joystick reflects some problems due to erroneous voxel selection and undetected clicking because misalignment of the joystick related to the ultrasonic sensors. But children report more level of satisfaction with the joystick. It seems that the movement of the child arm increases the level of immersion. Furthermore the haptic-acoustic correlation is an excellent mechanism to stimulate the available skills in visually impaired children.
One key element in the further improvement of AudioDoom is the construction of an editor, because currently AudioDoom is a hardwired solution. DirectX technology (Microsoft 1998) is a common platform to develop high-end multimedia applications. DirectX includes a wave-audio component called DirectSound that provides low-latency mixing, hardware acceleration, and direct access to the sound device. One of the services is the rendering of real time 3D sound. With this technology we are building a new version of AudioDoom in which an editor and run time head-tracking is included. This last improvement aims at enhancing the immersion and building a free navigation environment where the child can rotate and advance freely.

Final Remarks
One of the meaningful results derived from AudioDoom comes from the fact that virtual acoustic environment may now be developed as alternative entertainment environment for blind children. Moreover these ideas can be used to deliver educational material. It is well known that blind children need assistance to know and mentally map their neighborhood, school, downtown, etc. In this way we are exploring the possibility to render not only fantastic environments, but also virtual representations of real and familiar places. These representations can be modeled with the hyperstory model by including motivating elements to capture the attention of the children.
We have presented a conceptual model for building highly interactive stories. Learners have control over stories, access to diverse tools and materials to construct with, in order to develop strategies and test hypothesis with the implicit idea of fostering the development and use of the skills to determine spatial relationships and laterality. We believe that 3D aural hyperstories can contribute to make the interaction with computers much more enjoyable and learnable to learners. Children like stories and remember them easily. When children get engaged in a story, they can identify, retrieve and use relevant data to solve a challenge by having rapid and flexible access to the story sequence. From our experience with AudioDoom we have learned that hyperstories highly motivate learners, facilitate free navigation, and promote active constructivist learning by providing powerful materials and tools to construct with. 7. Acknowledgement 1. The child begins the first hyperstory sessions by using the keyboard. After some practical training, the child test the joystick and external speakers (see Fig. 1).

Several interactions with
AudioDoom are sufficient for the child to begin the construction of the main corridor by using the LEGO blocks.

3.
After some questions about each part of his model, the child continues the building of the main corridor.

4.
The main corridor is practically finished, representing the path from the begining to the center of the flying source. The child locates each entity at the perceveid position in his traveling.

5.
With confidence about the main structure, the child interacts again with AudioDoom, navigating the divergent corridors. To accomplish this task the child comes back to the previous built model and then extends it with the new perceived acoustic structure.
6. In this case the child locates a door found in the navigation of the new corridor. At this moment the child reflects difficulty because the door is actually located aside instead of the front direction of advance 7. While the child progresses in the construction of the model, he is orally inquired about the current activity. In this case the evaluator asks about the forward direction in the model and the child answers the perceived forward direction, relating adequately to the LEGO construction.
8. This is the model built by the child. You can observe the similarity with the artistic graphical version of the AudioDoom environment. This result reflects that an acoustic virtual environment creates a mental image. The most interesting thing is that the child never saw the graphical representation of the AudioDoom navigable structure.
9. This diagram represents the topology and distribution of entities of AudioDoom. With a high degree of precision the child expressed the perceived structure. This is easy by comparing the last photo and this graphic.