Requirements, Implementation and Applications of Hand-held Virtual Reality

—While hand-held computing devices are capable of rendering advanced 3D graphics and processing of multimedia data, they are not designed to provide and induce sufficient sense of immersion and presence for virtual reality. In this paper, we propose minimal requirements for realizing VR on a hand-held device. Furthermore, based on the proposed requirements, we have designed and implemented a low cost hand-held VR platform by adding multimodal sensors and display components to a hand-held PC. The platform enables a motion based interface, an essential part of realizing VR on a small hand-held device, and provides outputs in three modalities, visual, aural and tactile/haptic for a reasonable sensory experience. We showcase our platform and demonstrate the possibilities of hand-hand VR through three VR applications: a typical virtual walkthrough, a 3D multimedia contents browser, and a motion based racing game.


I. INTRODUCTION
One easy way to realize "virtual reality (VR)" that provides an immersive and multimodal sensory experience is to simply employ expensive sensors and large scale displays such as fully immersive displays, 6DOF trackers, motion simulators, 5.1 surround sound systems, and haptic devices.To make VR more viable, practical, available and appealing to the general public, researchers have struggled to engineer for a more economic alternative, such as the desktop VR, and proposed to overcome the platform shortcomings (in terms of sensing and display capabilities) with innovative interaction and content design.
Recently, hand-held devices have emerged as one possible candidate for such an alternative platform for VR.Like the desktop computing environment, hand-held devices clearly lack in sensing and display capabilities, however, it is an attractive platform, because it is portable and everyone seems to own one these days (like cell phones, phone cams, and PDA).The performance and functionalities of hand-held computing and media devices have advanced dramatically in recent times.Hand-held devices are those computer embedded systems that are small and light enough to be held in one hand such as personal digital assistants (PDA), cell phones, ultra mobile computers, and portable game consoles.Several researchers have used cell phones and PDA's for VR and AR applications [1,2], and hand-held console grade games have become a reality (e.g.SONY PSP ® ).However, it is difficult to declare that hand-held devices, as they are in their nominal configuration, are fit for implementing VR contents.Most related works to date either are limited to playing 3D graphic contents (with a button-based interface), or targeted for limited application domain, untested in terms of the degree of immersion.Moreover, one can be still skeptical whether such devices can be used for "virtual reality," e.g. to the extent of eliciting immersive feelings (not just for 3D contents viewing).
Manuscript Received on August 20, 2006.This work was supported in part by the grant from KT.
Jane Hwang, Jaehoon Jung, Sunghoon Yim, Jaeyoung Cheon, Sungkil Lee and Seungmoon Choi are with the Computer Science and Engineering Department of POSTECH, Korea.
In this paper, based on research by others and our own, we attempt to derive minimal and general requirements for a hand-held platform for virtual reality.We hope and believe that hand-held virtual reality contents can indeed exhibit sufficient immersive and sensory experience, if the platform was built with our proposed requirements and employed the style of interaction enabled through the proposed platform.We demonstrate our ideas by presenting our own implementation of a hand-held VR platform, and applications.
This paper is organized as follows.In the next section, we propose several requirements for a hand-held device to support minimal level of immersion and sensory experience as a viable platform for VR.The proposals are backed by related research and our own usability experiments.Section 3 covers the actual hand-held VR platform built according to our proposal, and Section 4 illustrates three applications.Finally, we report our experiences and come to a conclusion and avenues for future work in Section 5.

II. REQUIREMENTS FOR HAND-HELD VIRTUAL REALITY
In order to provide sufficient immersive and sensory experience through the hand-held devices, their sensing and display capabilities must be first considered.The nominal hand-held device generally lack in terms of the number of styles and modalities of interaction it can support.In particular, in terms of interaction, we must keep in mind that in hand-held devices, the place of display and interaction are co-located, and thus the user's hand-eye coordination should be an important factor.In addition, the advantages and uniqueness of the hand-hand device such as portability, low cost, and high usability must be preserved.

A. Sensing and tracking requirements
Hinkley et al. proposed a mobile interactive platform in which they distinguished between two types of sensing, i.e. foreground and background [3,4].The foreground sensing corresponded to the sensing of the user's intended movement and background to that of the environment context including the user's physical state.In hand-held devices, with the limited display channels (e.g.small visual display size and FOV, limited areal contact, low sound quality), a more flexible and dynamic interaction incorporating various form of the user state, than buttons and touch screen input, is required to overcome such limitation.Furthermore, as the hand-held display is physically coupled with interaction, it is particularly important that some form of tracking of the device (or equivalently the hand) and the user's view (or equivalently the head) exist.
The tracking of the device (e.g.relative or absolute to the environment) enables a motion or body based interaction at least through the holding hand of the user.Involving one's body stimulates one's sense of proprioception and this is known to be one of the best ways to improve the virtual experience, task performance and presence [5].In our own experimental study, we considered the use of a motion based interaction as the factor for the style of interaction [6].The results have shown that the motion based interaction on hand-held platforms could help improve the perceived FOV and presence/immersion up to a level comparable to the nominal VR platforms with the desktop or even projection based display.The motion based interface also has shown promising results in terms of task performance as well.This is an interesting case of interaction compensating for the limitation in the modality display.
One of the distinguishing characters of VR systems to the 3D graphics or multimedia viewing is the dynamic display according to the naturally controlled view points of the user, for instance, through head tracking.Note that device tracking can be coupled with user tracking, for instance, using a camera that recognizes and follows user's eyes (in our case, we used a separate sensors for device and user tracking).Through this sensing ability, it becomes possible to generate rendering based on the viewing direction, distance and even the perceptual ability of the user very naturally, contributing to the minimal level of immersion and viewing interactivity.

B. Multi-sensory display requirements
Nominal hand-held devices are obviously quite limited in terms of providing various styles of and multimodal interaction in a faithful way.Still improvements can be made with relatively low cost.First of all, in combination with the requirement and capability to track the user (at least approximately), view dependent dynamic rendering can much improve the static nature of the nominal hand-held visual display.Although not formally tested, it is expected dynamic rendering coupled with hand-held interaction can bring about higher level of focused attention (and thus immersion) and association of the visual feedback with one's proprioception.
Current hand-held devices have much support for sound generation already.Adding software support for simple 3D sound simulation (e.g.volume/phase modulation between the right and left ear) can be added with not too much computational cost.Even with simple and approximate sound spatialization, when combined with other modalities, it can prove to be very effective [7].
Finally, tactile/haptic displays, if they can be made possible, would be most appropriate for hand-held interaction.This is because, it is expected that the hand-held device itself can represent certain virtual objects that the user will interact with, and the device itself already provides natural passive tactile and haptic feedback.Furthermore, hand-held devices are usually equipped with vibrator motors just for that matter.Similarly to the auditory display, a more careful implementation and utilization of this resource easily make hand-held device a viable platform for reasonable virtual reality.Few researchers have proposed hand-held haptic devices using mechanisms other than vibration motors [8] (also see next subsection).However, currently, the vibro-tactile device seems to be the most preferred because of its size, low price, usability and relative effectiveness.Thus, for a reasonable tactile/haptic stimulation for VR, we propose to use multiple vibrators and moderately complex tactile/haptic rendering, in conjunction with other modality display.

C. Considerations for usability
The final requirement refers to the "must-preserve" quality of the hand-held devices: portability and ease of use.We believe that due to the limited display channels, the hand-held device user can easily get distracted.One source of distraction is the use of, or connection to external entities such as markers, servers, and wired modules (for various purposes like more robust sensing or tapping into more computational resource).As goes with our definition of hand-held devices, we put forth a requirement that the hand-held VR platform be self-contained in terms of sensing, display and computation.Furthermore, any sensors or display support in addition to what is already contained in the nominal platform must be in reasonable size and weight, and modularized for ease of attachment and detachment.

III. A HAND-HELD VIRTUAL REALITY PLATFORM
Based on the requirements listed in Section 2, we designed and built a "general" hand-held virtual reality platform which can be applied to various virtual reality applications.In this section, we cover the implementation detail.We claim that the nominal hand-held devices are not equipped sufficiently to realize virtual reality.Our platform implementation uses sensors and displays that are not usually available with standard hand-held media or computing devices, but the platform uses relatively inexpensive off-the-shelf components and can easily be interfaced into the hand-held device.Moreover, hand-held devices are still evolving and advancing in terms of their sensing and display capabilities.A. The proposed hand-held VR platform Fig. 1 shows the overall system architecture of the proposed hand-held VR platform.The figure illustrates the added sensing and display capabilities to a nominal hand-held computer.As for sensing, as claimed in the previous section, sensing at least some part of the user and the operating environment, and the motion of the device was deemed necessary to support reasonable level of realistic interactivity.Using the camera, which is already integrated into many of the hand-held devices today, is thus an inexpensive way for many types of sensing.It can be used not only for simple object recognition and relative tracking of the device motion, but also for augmented reality applications as well.The acceleration sensor too is becoming a standard part in many hand-held devices, and in our design, is devoted for sensing device motion characteristics (and to relieve and share the responsibility of the vision processing at the same time).To reflect the status of the user, we adopted an ultrasonic/IR proximity sensor module that can approximately measure the relative viewing position of the user.
As for the added display capability, we have claimed that at least three major modalities be supplied in one way or another.The nominal hand-held device provides monoscopic display, basic sound production and a single on-off vibration feedback.Our design adds hardware and/or software support for simple 3D sound simulation, multiple vibration motors for an improved tactile/haptic effects and view dependent display.The following sections give more details.

B. Sensing 1) Hybrid Relative Device Motion Tracking
We use a hybrid method to track the movement of the hand-held device.That is, we mainly use the camera (and vision processing) and the acceleration sensor in a supplementary fashion.Relative motion tracking refers to an approximate tracking of the hand-held VR system (thus, the user's hand or body) in relation to the environment.Even though the tracking is only approximate (mostly due to hardware constraints such as the limited computing power, use of single camera, its resolution, etc.), we believe that the user would still able to interact quite naturally and without much difficulty relying on one's hand-eye coordination, quickly adapting to the small inconsistency between the scale of the movement between the real and the virtual worlds.We make a note of the work done by Hinkley [3] which employed a proximity, two axis tilt, and touch sensor to improve interactivity of a mobile device.While we agree that this is an improvement, sensing in more degrees of freedom is required to provide the minimum "virtual reality" of our claim.Our relative tracking provides 4 DOF motion, including 3D rotation and forward/backward movement.

Proximity changes of features
Motion flow To make our hand-held VR system as self-contained as possible, we integrated a vision based motion tracking and 3-axis accelerometer (MMA 7260Q 3 axis accelerometer from Freescale™).Cameras (e.g.phone-cams) and accelerometers are becoming viable sensors for today's hand-held devices (e.g.Samsung SPH-S4000, SPH-S310).Our motion tracker tracks motion in 4 degrees of freedom, i.e. forward/backward movement, rotation about Y axis (yaw) and tilts about the X and Z axis (pitch, roll) (See Fig. 2) [6].The forward/backward motion and rotation around Y axis are estimated with the optical flow.We used the pyramidal implementation of the Lucas-Kanade feature tracker for matching the features between two sequential images [9].The tilts about the X and Z axis are measure using 3-axis accelerometer.The tilt data from the 3-axis accelerometer are digitized in relatively low resolution (8 bit, 0.92°~6.51°),and relying only on them results in an unstable virtual camera control.We stabilize (filter) the data from accelerometer when the motion flow as recognized from the camera is not significant (within a given threshold).The particular choice of the degrees of freedom derives from our observation of the users.For instance, directing pure lateral translation in a hand-held posture is rather unnatural (e.g.left/right).It is more natural to rotate around the Y axis (perpendicular to the ground, around the body) to gain the similar effect.Similar argument goes for moving up and down.It is hard to imagine the user walking side ways (holding the hand-held device in the middle) or moving the hand-held device sideways away from the middle of the body to achieve pure "translation."Even though the forward/backward and Y-axis rotation tracking is only approximate (mostly due to use of single camera without marker in the environment, its resolution, etc.), the user is still able to interact quite naturally relying on one's hand-eye coordination and quickly adapting to the small inconsistency between the scale of the movement between the real and the virtual worlds.Also note that the motion tracking data can be used for recognizing more abstract gestures (for interaction).

2) User Tracking
Aside from device motion tracking, tracking the user is also important with regards to our requirements for hand-held VR.To detect the distance of the user from the device, two range sensors were implemented (See Fig. 4).However, only one of them is sufficient for the approximate measurement of the relative user position or distance.Currently, this module is able to detect obstacles (or user's head) in the range of 3 cm to 6 m from the hand-held device screen.We assume that in a normal use, the user is facing directly toward the hand-held device and there exists an unobstructed line of sight between the hand-held device and the user's head.The viewing distance is used for a natural dynamic view dependent display as described in the next section.These modules are integrated for ease of implementation, although they should ideally be separated for modularity.

C. Multimodal Display 1) 3D Sound Simulation
To provide 3D sound, we used the 3D sound capabilities of the DirectSound TM from Microsoft.DirectSound TM uses a HRTF based technique to create sounds with apparent directionality.The 3D sound is specified according to the virtual locations of the sound sources and the location of the user with respect to the device (obtained from the ultrasonic/IR sensor).For less computationally powerful hand-held devices such as cell phones or PDA's, simpler 3D sound simulation might more appropriate using volume and phase modulation.

2) Multiple Tactile Display
To provide any sense of tactility or haptics (the third major modality in our view) on a hand-held device, one of the most practical approaches is to use vibration motors [10,11].Vibration motors in fact have been used very effectively on gloves (CyberGlove® from Immersion) and even on mobile devices (VibeTonz® from Immersion).While there has been proposals for hand-held haptics (e.g.non-exoskeleton type), their sizes are still not small enough to go with hand-held devices.As nominal hand-held devices only usually employ a single on-off vibration motors, we propose to use several more and provide the controllability at discrete levels of amplitude and frequency.Currently, our hardware (shown in Fig. 1 and 4) can support four vibration motors for various tactile effects, and when combined with the visual and aural feedback, it can even induce illusory haptic sensation as well.We hope, in the future, by manipulating the timing, intensity and placements of the multiple vibrators, a more realistic illusory haptic sensation with finer level of directionality and magnitude can be achieved.

3) View Dependent display
The narrow FOV and small size of the hand-held display (without any other provision) can cause lowered immersion in the hand-held VR.In addition, we claim that the fixed FOV despite changing viewing distance is also unnatural and can bring about similar effects.We suggest two different software FOV manipulation techniques using an approximate measurement of the eye (or head) position relative to the hand-held device using the user tracking hardware described in the previous section.
The first proposed FOV technique is to adjust the visual FOV to mimic the behavior of a magnifying glass (see Fig. 5).The FOV becomes narrower as the view distance is reduced.This method is useful for the applications in which the detailed views of the object are important but size perception is not.
The second proposed FOV technique is to use the hand-held device in an opposite way, as a see-through window into the virtual environment (see Fig. 6).As the head gets closer to the screen (or window), there are more parts of the virtual environment visible, thus the FOV widens (and objects are drawn smaller).As you can see in Fig. 6, the size of the virtual object "perceived to the user" is kept the same regardless to the eye-display distance.This approach is better suited for applications in which size or spatial perception is important such as medical training VR systems.

IV. APPLICATIONS OF HAND-HELD VR
In the previous section, we described the particular hardware and software design of a hand-held VR conforming to the pro -posed minimal requirements.In this section, we showcase three different applications of the hand-held VR platform and demonstrate its difference from the usual multimedia contents on hand-held devices.

A. Virtual Environment Walk-Through
The most typical and natural application of virtual reality is the walk-through applications.A VR walk-through application is to be different from simple, e.g.button-based, navigation in that it must be more experiential and realistic by employing such an interaction style.Table 1 shows the motion-based interaction that uses the motion of the device (or user's hand) to control navigation.The metaphoric use of the body is very natural and easy to learn for the users.In fact, as briefly mentioned in Section 2, we carried out an extensive usability experiment and found out that the motion based interface induced a wider perceived field of view and increased sense of presence compared to the nominal button based interface.For more details, we refer the readers to [6].Fig. 7 shows the user navigating through a virtual office using the motion based interface.

B. Multimedia contents browsing and manipulation
Hand-held devices, equipped with a camera, movie and music player, satellite TV receiver and memory cards, often holds an enormous amounts of multimedia data.This in turn makes the browsing and manipulation of the contents more difficult and Our proposal is to use 3D visualization and use a motion based interface.For example, we used two axis tilt (yaw and pitch) and forward/backward movements to browse contents.Table 2 lists the complete mapping between hand-held movements and multimedia contents manipulation command and movements.The mapping is similar to the interaction method in the walk-through application.As for the display, we came up with three types of layouts for browsing and manipulation of the multimedia objects.There are researches related to display design and usability for mobile applications such as this [12], and likewise, we are still assessing the usability among the three.Our three layouts are planar, cylindrical, and fish-eye.The planar layout is generally used in the current hand-held devices (See Fig. 8, a-1, a-2).The cylindrical layout is user-centered and the distances to the multimedia data is mostly the same (a spherical layout would be ideal in that sense, but spherical layout display results in distortion) (See Fig. 8, b-1, b-2).The fisheye layout is gives emphasis to the content in the center.In the fisheye layout, the contents are laid in a cylindrical fashion and the object (in the middle), that the user is watching, moves towards to the user.(See Fig. 8, c-1, c-2)

C. Hand-held Game -Car Driving Simulation
Games are another popular applications on hand-held devices as exemplified by the hand-held consoles and PDA/cellphone games.Furthermore, the motion-based games such as Nintendo's Wii™ or Samsung's Beat-box phones (Samsung SPH-S4000, SPH-S310) are gaining momentum.Our study also indicated a distinctively high level of enjoyment when a motion based interface was used [6].Our third application is a car racing game and Table 3 shows the mapping between the device motions to the various driving commands.Shown in Fig. 9 is the overall system architecture of the motion based racing game, consisted of three parts, the manipulator, the simulator and the multimodal display.The manipulator converts inputs from hand-held device to a usable form in the driving simulator.Then, driving simulator applies the input to the brake system, steering system and engine/gear system.The simulated results are displayed to the user through three modalities.Fig. 10 shows the steering interface.When using the hand-held motion as control command to the game, the mismatches between the virtual camera and actual user view direction occurs and results in the cybersickness and difficulties in the control.To adjust these mismatches, we used the roll motion of the hand-held device and changed the virtual camera as watching through the hand-held display.That is, the hand-held device acts as a steering handle prop, thus the orientation of the scene stays the same while the device rotates to the left or right for steering control (See Fig. 10).

V. CONCLUSIONS AND DISCUSSIONS
In this paper, we have argued for and proposed requirements for hand-held VR platform for it to produce a minimum level of immersive and sensory experience.An actual hardware and software implementation, according to the proposed requirements, was carried out and tested on three different hand-held VR applications.We believe that such a platform offers an experience and enjoyment differentiated from the mere button-based nominal hand-held media devices.Some of the claims have been validated through our own usability study ( which was not described in this paper, refer to [6] ).We also Fig. 10.Using the hand-held device as a steering handle prop.Note that the orientation of the scene stays the same as the device is rotated.believe the proposed system configuration is general enough to be applied to many application areas such as education, games, and mixed reality.We are continuing to formally validate that user felt immersion or presence is possible with our proposed hand-held VR platform at a level comparable to desktop or even large scale VR systems.We are also improving both the hardware and software for various sensing and display, e.g. for creating illusory directional force feedback with multiple vibrators, view dependent display and resource optimization, environment sensing and mixed reality, and other types of multimodal interaction for hand-held VR.

Fig. 1 .
Fig. 1.The overall architecture of the general purpose hand-held VR platform.

Fig. 3 Fig. 3 .
Fig.3shows a case of applying the motion tracked data to view control.
Fig.4.The hardware module for the ultrasonic/IR sensing and tactile display.These modules are integrated for ease of implementation, although they should ideally be separated for modularity.

Fig. 5 .Fig 6 .
Fig. 5. Hand-held VR as a magnifying glass; the size of the virtual object looks bigger when the hand-held display is close to the eyes.See Color Plate 29 Fig. 7.A user navigating through a virtual office using the motion based interface.The motion based interface is realized by the hybrid relative device motion tracking.time consuming.The limited screen sizes and unnatural interfaces also present difficulties for the associated multimedia tasks.Several proposals have been made to tackle this particular problem [12].Our proposal is to use 3D visualization and use a motion based interface.For example, we used two axis tilt (yaw and pitch) and forward/backward movements to browse contents.Table2lists the complete mapping between hand-held movements and multimedia contents manipulation command and movements.The mapping is similar to the interaction method in the walk-through application.

(a- 1 )
Illustration of the planar layout of multimedia contents.(a-2)Snapshot of the planar display layout.(b-1) Illustration of the cylindrical layout of multimedia contents (b-2) Snapshot of the cylindrical display.(c-1) Illustration of the fish eye layout f multimedia contents o (c-2) Snapshot of the fisheye display.

Fig. 8 .
Fig. 8. Multimedia contents layouts in the hand-held VR.See Color Plate 31

TABLE 1 :
INTERACTION METHODS FOR NAVIGATION AND SELECTION

TABLE 2 :
INTERACTION METHODS FOR MULTIMEDIA CONTENTS AND

TABLE 3 :
MAPPINGS FROM THE DEVICE MOTION TO THE DRIVING COMMANDS.