A Survey of Augmented Reality Technologies, Applications and Limitations

— We are on the verge of ubiquitously adopting Augmented Reality (AR) technologies to enhance our perception and help us see, hear, and feel our environments in new and enriched ways. AR will support us in fields such as education, maintenance, design and reconnaissance, to name but a few. This paper describes the field of AR, including a brief definition and development history, the enabling technologies and their characteristics. It surveys the state of the art by reviewing some recent applications of AR technology as well as some known limitations regarding human factors in the use of AR systems that developers will need to overcome.


I INTRODUCTION
Imagine a technology with which you could see more than others see, hear more than others hear, and perhaps even touch, smell and taste things that others can not.What if we had technology to perceive completely computational elements and objects within our real world experience, entire creatures and structures even that help us in our daily activities, while interacting almost unconsciously through mere gestures and speech?
With such technology, mechanics could see instructions what to do next when repairing an unknown piece of equipment, surgeons could see ultrasound scans of organs while performing surgery on them, fire fighters could see building layouts to avoid otherwise invisible hazards, soldiers could see positions of enemy snipers spotted by unmanned reconnaissance aircraft, and we could read reviews for each restaurant in the street we"re walking in, or battle 10-foot tall aliens on the way to work [57].
Augmented reality (AR) is this technology to create a "next generation, reality-based interface" [77] and is moving from laboratories around the world into various industries and consumer markets.AR supplements the real world with virtual (computer-generated) objects that appear to coexist in the same space as the real world.AR was recognised as an emerging technology of 2007 [79], and with today"s smart phones and AR browsers we are starting to embrace this very new and exciting kind of human-computer interaction.

Definition
On the reality-virtuality continuum by Milgram and Ki Manuscript Received on January 26, 2010.E-Mail: d.w.f.vankrevelen@tudelft.nl shino [107] (Fig. 1), AR is one part of the general area of mixed reality.Both virtual environments (or virtual reality) and augmented virtuality, in which real objects are added to virtual ones, replace the surrounding environment by a virtual one.In contrast, AR provides local virtuality.When considering not just artificiality but also user transportation, Benford et al. [28] classify AR as separate from both VR and telepresence (see Fig. 2).Following [17,19], an AR system:  combines real and virtual objects in a real environment;  registers (aligns) real and virtual objects with each other; and  runs interactively, in three dimensions, and in real time.
Three aspects of this definition are important to mention.Firstly, it is not restricted to particular display technologies such as a head-mounted display (HMD).Nor is the definition limited to the sense of sight, as AR can and potentially will apply to all senses, including hearing, touch, and smell.Finally, removing real objects by overlaying virtual ones, approaches known as mediated or diminished reality, is also considered AR.

Brief history
The first AR prototypes (Fig. 3), created by computer graphics pioneer Ivan Sutherland and his students at Harvard University and the University of Utah, appeared in the 1960s and used a see-through to present 3D graphics [151].
A small group of researchers at U.S. Air Force"s Armstrong Laboratory, the NASA Ames Research Center, the Massachusetts Institute of Technology, and the University of North Carolina at Chapel Hill continued research during the 1970s and 1980s.During this time mobile devices like the Sony Walkman (1979), digital watches and personal digital organisers were introduced.This paved the way for wearable computing [103,147] in the 1990s as personal computers became small enough to be worn at all times.Early palmtop computers include the Psion I (1984), the Apple Newton MessagePad (1993), and the Palm Pilot (1996).Today, many mobile platforms exist that may support AR, such as personal digital assistants (PDAs), tablet PCs, and mobile phones.
It took until the early 1990s before the term "augmented reality" was coined by Caudell and Mizell [42], scientists at Boeing Corporation who were developing an experimental AR system to help workers put together wiring harnesses.True mobile AR was still out of reach, but a few years later [102] developed a GPS-based outdoor system that presents navigational assistance to the visually impaired with spatial audio overlays.Soon computing and tracking devices became sufficiently powerful and small enough to support graphical overlay in mobile settings.Feiner et al. [55] created an early prototype of a mobile AR system (MARS) that registers 3D graphical tour guide information with buildings and artefacts the visitor sees.
By the late 1990s, as AR became a distinct field of research, several conferences on AR began, including the International Workshop and Symposium on Augmented Reality, the International Symposium on Mixed Reality, and the Designing Augmented Reality Environments workshop.Organisations were formed such as the Mixed Reality Systems Laboratory2 (MRLab) in Nottingham and the Arvika consortium 3 in Germany.Also, it became possible to rapidly build AR applications thanks to freely available software toolkits like the ARToolKit.In the meantime, several surveys appeared that give an overview on AR advances, describe its problems, classify and summarise developments [17,19,28].By 2001, MRLab finished their pilot research, and the symposia were united in the International Symposium on Mixed and Augmented Reality4 (ISMAR), which has become the major symposium for industry and research to exchange problems and solutions.
For anyone who is interested and wants to get acquainted with the field, this survey provides an overview of important technologies, applications and limitations of AR systems.After describing technologies that enable an augmented reality experience in Section 2, we review some of the possibilities of AR systems in Section 3. In Section 4 we discuss a number of common technological challenges and limitations regarding human factors.Finally, we conclude with a number of directions that the authors envision AR research might take.Fig. 2. Broad classification of shared spaces according to transportation and artificiality, adapted from [28].

II ENABLING TECHNOLOGIES
The technological demands for AR are much higher than for virtual environments or VR, which is why the field of AR took longer to mature than that of VR.However, the key components needed to build an AR system have remained the same since Ivan Sutherland"s pioneering work of the 1960s.Displays, trackers, and graphics computers and software remain essential in many AR experiences.Following the definition of AR step by step, this section first describes display technologies that combine the real and virtual worlds, followed by sensors and approaches to track user position and orientation for correct registration of the virtual with the real, and user interface technologies that allow real-time, 3D interaction.Finally some remaining AR requirements are discussed.
Fig. 3.The world"s first head-mounted display with the "Sword of Damocles" [151].

Displays
Of all modalities in human sensory input, sight, sound and/or touch are currently the senses that AR systems commonly apply.This section mainly focuses on visual displays, however aural (sound) displays are mentioned briefly below.
Haptic (touch) displays are discussed with the interfaces in Section 2.3, while olfactory (smell) and gustatory (taste) displays are less developed or practically non-existent AR techniques and will not be discussed in this essay.

Aural display
Aural display application in AR is mostly limited to self-explanatory mono (0-dimensional), stereo (1-dimensional) or surround (2-dimensional) headphones and loudspeakers.True 3D aural display is currently found in more immersive simulations of virtual environments and augmented virtuality or still in experimental stages.
Haptic audio refers to sound that is felt rather than heard [75] and is already applied in consumer devices such as Turtle Beach"s Ear Force 5 headphones to increase the sense of realism and impact, but also to enhance user interfaces of e.g.mobile phones [44].Recent developments in this area are presented in workshops such as the international workshop on Haptic Audio Visual Environments 6 and the international workshop on Haptic and Audio Interaction Design [15].

Visual display
There are basically three ways to visually present an augmented reality.Closest to virtual reality is video see-through, where the virtual environment is replaced by a video feed of reality and the AR is overlaid upon the digitised images.Another way that includes Sutherland"s approach is optical see-through and leaves the real-world perception alone but displays only the AR overlay by means of transparent mirrors and lenses.The third approach is to project the AR overlay onto real objects themselves resulting in projective displays.True 3-dimensional displays for the masses are still far off, although [140] already achieve 1000 dots per second in true 3d free space using plasma in the air.The three techniques may be applied at varying distance from the viewer: head-mounted, hand-held and spatial (Fig. 4).Each combination of technique and distance is listed in the over view presented in Table 1 with a comparison of their individual advantages.

Video see-through
Besides being the cheapest and easiest to implement, this display technique offers the following advantages.Since reality is digitised, it is easier to mediate or remove objects from reality.This includes removing or replacing fiducial markers or placeholders with virtual objects (see for instance Fig. 7 and 22).Also, brightness and contrast of virtual objects are matched easily with the real environment.Evaluating the light conditions of a static outdoor scene is of importance when the computer generated content has to blend in smoothly and a novel approach is developed by Liu et al. [101].
The digitised images allow tracking of head movement for better registration.It also becomes possible to match perception delays of the real and virtual.Disadvantages of video see-through include a low resolution of reality, a limited field-of-view (although this can easily be increased), and user disorientation due to a parallax (eye-offset) due to the camera"s positioning at a distance from the viewer"s true eye location, causing significant adjustment effort for the viewer [35].This problem was solved at the MR Lab by aligning the video capture [153].A final drawback is the focus distance of this technique which is fixed in most display types, providing poor eye accommodation.Some head-mounted setups can however move the display (or a lens in front of it) to cover a range of .25 meters to infinity within .3 seconds [150].Like the parallax problem, biocular displays (where both eyes see the same image) cause significantly more discomfort than monocular or binocular displays, both in eye strain and fatigue [53].Optical see-through techniques with beam-splitting holographic optical elements (HOEs) may be applied in head-worn displays, hand-held displays, and spatial setups where the AR overlay is mirrored either from a planar screen or through a curved screen.
These displays not only leave the real-world resolution in tact, they also have the advantage of being cheaper, safer, and parallax-free (no eye-offset due to camera positioning).Optical techniques are safer because users can still see when power fails, making this an ideal technique for military and medical purposes.However, other input devices such as cameras are required for interaction and registration.Also, combining the virtual objects holographically through transparent mirrors and lenses creates disadvantages as it reduces brightness and contrast of both the images and the real-world perception, making this technique less suited for outdoor use.The all-important field-of-view is limited for this technique and may cause clipping of virtual images at the edges of the mirrors or lenses.Finally, occlusion (or mediation) of real objects is difficult because their light is always combined with the virtual image.Kiyokawa et al. [90] solved this problem for head-worn displays by adding an opaque overlay using an LCD panel with pixels that opacify areas to be occluded.
Virtual retinal displays or retinal scanning displays (RSDs) solve the problems of low brightness and low field-of-view in (head-worn) optical see-through displays.A low-power laser draws a virtual image directly onto the retina which yields high brightness and a wide field-of-view.RSD quality is not limited by the size of pixels but only by diffraction and aberrations in the light source, making (very) high resolutions possible as well.Together with their low power consumption these displays are well-suited for extended outdoor use.Still under development at Washington University and funded by MicroVision7 and the U.S. military, current RSDs are mostly monochrome (red only) and monocular (single-eye) displays.Schowengerdt et al. [143] already developed a full-colour, binocular version with dynamic refocus to accommodate the eyes (Fig. 5) that is promised to be low-cost and light-weight.

Projective
These displays have the advantage that they do not require special eye-wear thus accommodating user"s eyes during focusing, and they can cover large surfaces for a wide field-of-view.Projection surfaces may range from flat, plain coloured walls to complex scale models [33].Zhou et al. [164] list multiple picoprojectors that are lightweight and low on power consumption for better integration.However, as with optical see-through displays, other input devices are required for (indirect) interaction.Also,projectors need to be calibrated each time the envronment or the distance to the projection surface changes (crucial in mobile setups).Fortunately, calibration may be automated Fig. 5. Binocular (stereoscopic) vision [143].
using cameras in e.g. a multi-walled Cave automatic virtual environment (CAVE) with irregular surfaces [133].Furthermore, this type of display is limited to indoor use only due to low brightness and contrast of the projected images.Occlusion or mediation of objects is also quite poor, but for head-worn projectors this may be improved by covering surfaces with retro-reflective material.Objects and instruments covered in this material will reflect the projection directly towards the light source which is close to the viewer"s eyes, thus not interfering with the projection.

Display positioning
AR displays may be classified into three categories based on their position between the viewer and the real environment: head-worn, hand-held, and spatial (see Fig. 4).

Head-worn
Visual displays attached to the head include the video/optical see-through head-mounted display (HMD), virtual retinal display (VRD), and head-mounted projective display (HMPD).Cakmakci and Rolland [40] give a recent detailed review of head-worn display technology.A current drawback of head-worn displays is the fact that they have to connect to graphics computers like laptops that restrict mobility due to limited battery life.Battery life may be extended by moving computation to distant locations (clouds) and provide (wireless) connections using standards such as IEEE 802.11 or BlueTooth.

Hand-held
This category includes hand-held video/optical see-through displays as well as hand-held projectors.Although this category of displays is bulkier than head-worn displays, it is currently the best work-around to introduce AR to a mass market due to low production costs and ease of use.For instance, hand-held video see-through AR acting as magnifying glasses may be based on existing consumer products like mobile phones Möhring et al. [110] (Fig. 7a) that show 3D objects, or personal digital assistants/PDAs [161] (Fig. 7b) with e.g.navigation information.[148] apply optical see-through in their hand-held "sonic flashlight" to display medical ultrasound imaging directly over the scanned organ (Fig. 8a).One example of a hand-held projective display or "AR flashlight" is the "iLamp" by Raskar et al. [134].This context-aware or tracked projector adjusts the imagery based on the current orientation of the projector relative to the environment (Fig. 8b).Recently, MicroVision (from the retinal displays) introduced the small Pico Projector (PicoP) which is 8mm thick, provides full-colour imagery of 1366 × 1024 pixels at 60Hz using three lasers, and will probably appear embedded in mobile phones soon.

Spatial
The last category of displays are placed statically within the environment and include screen-based video see-through displays, spatial optical see-through displays, and projective displays.These techniques lend themselves well for large presentations and exhibitions with limited interaction.Early ways of creating AR are based on conventional screens (computer or television) that show a camera feed with an AR overlay.This technique is now being applied in the world of sports television where environments such as swimming pools and race tracks are well defined and easy to augment.Head-up displays (HUDs) in military cockpits are a form of spatial optical see-through and are becoming a standard extension for production cars to project navigational directions in the windshield [113].User viewpoints relative to the R overlay hardly change in these cases due to the confined space.Spatial see-through displays may however appear misaligned when users move around in open spaces, for instance when AR overlay is presented on a transparent screen such as the "invisible interface" by Ogi et al. [115] (Fig. 9a).3D holographs solve the alignment problem, as Goeb-bels et al. [64] show with the ARSyS TriCorder 8 (Fig. 9b) by the German Fraunhofer IMK (now IAIS 9 ) research centre.

Tracking sensors and approaches
Before an AR system can display virtual objects into a real environment, the system must be able to sense the environment and track the viewer"s (relative) movement preferably with six degrees of freedom (6DOF): three variables (x, y, and z) for position and three angles (yaw, pitch, and roll) for orientation.
There must be some model of the environment to allow tracking for correct AR registration.Furthermore, most environments have to be prepared before an AR system is able to track 6DOF movement, but not all tracking techniques work in all environments.To this day, determining the orientation of a user is still a complex problem with no single best solution.

Modelling environments
Both tracking and registration techniques rely on environmental models, often 3D geometrical models.To annotate for instance windows, entrances, or rooms, an AR system needs to know where they are located with regard to the user"s current position and field of view.
Sometimes the annotations themselves may be occluded based on environmental model.For instance when an annotated building is occluded by other objects, the annotation should point to the non-occluded parts only [26].
Fortunately, most environmental models do not need to be very detailed about textures or materials.Usually a "cloud" of unconnected 3D sample points suffices for example to present occluded buildings and essentially let users see through walls.To create a traveller guidance service (TGS), Kim et al. [89] used models from a geographical information system (GIS), but for many cases modelling is not necessary at all as Gross et al. [66] and Kurillo et al. [95] proved.Stoakley et al. [149] present users with the spatial model itself, an oriented map of the environment or world in miniature (WIM), to assist in navigation.

Modelling techniques
Creating 3D models of large environments is a research challenge in its own right.Automatic, semiautomatic, and manual techniques can be employed, and Piekarski and Thomas [126] even employed AR itself for modelling purposes.Conversely, a laser range finder used for environmental modelling may also enable users themselves to place notes into the environment [123].It is still hard to achieve a seamless integration between a real and a added object.Ray-tracing algorithms are better suited to visualize huge data sets than the classic z-buffer algorithm because they create an image in time sub-linear in the number of objects while the z-buffer is linear in the number of objects [145].
There are significant research problems involved in both the modelling of arbitrary complex 3D spatial models as well as the organisation of storage and querying of such data in spatial databases.These databases may also need to change quite rapidly as real environments are often also dynamic.

User movement tracking
Compared to virtual environments, AR tracking devices must have higher accuracy, a wider input variety and bandwidth, and longer ranges [17].Registration accuracy depends not only on the geometrical model but also on the distance of the objects to be annotated.The further away an object (i) the less impact errors in position tracking have and (ii) the more impact errors in orientation tracking have on the overall misregistration [18].
Tracking is usually easier in indoor settings than in outdoor settings as the tracking devices do not have to be completely mobile and wearable or deal with shock, abuse, weather, etc.In stead the indoor environment is easily modelled and prepared, and conditions such as lighting and temperature may be controlled.Currently, unprepared outdoor environments still pose tracking problems with no single best solution.

Mechanical, ultrasonic, and magnetic
Early tracking techniques are restricted to indoor use as they require special equipment to be placed around the user.The first HMD by Sutherland [151] was tracked mechanically (Fig. 3) through ceiling-mounted hardware also nicknamed the "Sword of Damocles."Devices that send and receive ultrasonic chirps and determine the position, i.e. ultrasonic positioning, were already experimented with by Sutherland [151] and are still used today.A decade or so later Polhemus" magnetic trackers that measure distances within electromagnetic fields were introduced by Raab et al. [132].These are also still in use today and had much more impact on VR and AR research.

Global positioning systems
For outdoor tracking by global positioning system (GPS) there exist the American 24-satellite Navstar GPS [63], the Russian counterpart constellation Glonass, and the 30-satellite GPS Galileo, currently being launched by the European Union and operational in 2010.
Direct visibility with at least four satellites is no longer necessary with assisted GPS (A-GPS), a worldwide network of servers and base stations enable signal broadcast in for instance urban canyons and indoor environments.Plain GPS is accurate to about 10-15 meters, but with the wide area augmentation system (WAAS) technology may be increased to 3-4 meters.For more accuracy, the environments have to be prepared with a local base station that sends a differential error-correction signal to the roaming unit: differential GPS yields 1-3 meter accuracy, while the real-time-kinematic or RTK GPS, based on carrier-phase ambiguity resolution, can estimate positions accurately to within centimeters.Update rates of commercial GPS systems such as the MS750 RTK receiver by Trimble10 have increased from five to twenty times a second and are deemed suitable for tracking fast motion of people and objects [73].

Radio
Other tracking methods that require environment preparation by placing devices are based on ultra wide band radio waves.Active radio frequency identification (RFID) chips may be positioned inside structures such as aircraft [163] to allow in situ positioning.Complementary to RFID one can apply the wide-area IEEE 802.11b/g standards for both wireless networking and tracking as well.The achievable resolution depends on the density of deployed access points in the network.Several techniques are researched by Bahl and Padmanabhan [21], Castro et al. [41] and vendors like InnerWireless 11 , AeroScout 12 and Ekahau 13 offer integrated systems for personnel and equipment tracking in for instance hospitals.

Inertial
Accelerometers and gyroscopes are sourceless inertial sensors, usually part of hybrid tracking systems, that do not require prepared environments.Timed measurements can provide a practical dead-reckoning method to estimate position when combined with accurate heading information.To minimise errors due to drift, the estimates must periodically be updated with accurate measurements.The act of taking a step can also be measured, i.e. they can function as pedometers.Currently micro-electromechanical (MEM) accelerometers and gyroscopes are already making their way into mobile phones to allow "writing" of phone numbers in the air and other gesture-based interaction [87].

Optical
Promising approaches for 6DOF pose estimation of users and objects in general settings are vision-based.In closed-loop tracking, the field of view of the camera coincides with that of the user (e.g. in video see-through) allowing for pixel-perfect registration of virtual objects.Conversely in open-loop tracking, the system relies only on the sensed pose of the user and the environmental model.
Using one or two tiny cameras, model-based approaches can recognise landmarks (given an accurate environmental model) or detect relative movement dynamically between frames.There are a number of techniques to detect scene geometry (e.g.landmark or template matching) and camera motion in both 2D (e.g.optical flow) and 3D which require varying amounts of computation.
While early vision-based tracking and interaction applications in prepared environments use fiducial markers [111,142] or light emitting diodes (LEDs) to see how and where to register virtual objects, today there is a growing body of research on "markerless AR" for tracking physical positions [46,48,58,65].Some use a homography to estimate translation and rotation from frame to frame, others use a Harris feature detector to identify target points and some employ the random sample consensus (Ransac) algorithm to validate matching [80].Recently, Pilet [130] showed how deformable surfaces like paper sheets, t-shirts and mugs can also serve as augmentable surfaces.Robustness is still improving and computational costs high, but results of these pure vision-based approaches (hybrid and/or markerless) for general-case, real-time tracking are very promising.

Hybrid
Commercial hybrid tracking systems became available during the 1990s and use for instance electromagnetic compasses (magnetometers), gravitational tilt sensors (inclinometers), and gyroscopes (mechanical and optical) for orientation tracking and ultrasonic, magnetic, and optical position tracking.Hybrid tracking approaches are currently the most promising way to deal with the difficulties posed by general indoor and outdoor mobile AR environments [73].Azuma et al. [20] investigate hybrid methods without vision-based tracking suitable for military use at night in an outdoor environment with less than ten beacons mounted on for instance unmanned air vehicles (UAVs).

User interface and interaction
Besides registering virtual data with the user"s real world perception, the system needs to provide some kind of interface with both virtual and real objects.Our technological advancing society needs new ways of interfacing with both the physical and digital world to enable people to engage in those environments [67].

New UI paradigm
WIMP (windows, icons, menus, and pointing), as the conventional desktop UI metaphor is referred to, does not apply that well to AR systems.Not only is interaction required with six degrees of freedom (6DOF) rather than 2D, the use of conventional devices like a mouse and keyboard are cumbersome to wear and reduce the AR experience.
Like in WIMP UIs, AR interfaces have to support selecting, positioning, and rotating of virtual objects, drawing paths or trajectories, assigning quantitative values (quantification) and text input.However as a general UI principle, AR interaction also includes the selection, annotation, and, possibly, direct manipulation of physical objects.This computing paradigm is still a challenge [20].

Tangible UI and 3D pointing
Early mobile AR systems simply use mobile trackballs, trackpads and gyroscopic mice to support continuous 2D pointing tasks.This is largely because the systems still use a WIMP interface and accurate gesturing to WIMP menus would otherwise require well-tuned motor skills from the users.Ideally the number of extra devices that have to be carried around in mobile UIs is reduced, but this may be difficult with current mobile computing and UI technologies.
Devices like the mouse are tangible and unidirectional, they communicate from the user to the AR system only.Common 3D equivalents are tangible user interfaces (TUIs) like paddles and wands.Ishii and Ullmer [76] discuss a number of tangible interfaces developed at MIT"s Tangible Media Group 14 including phicons (physical icons) and sliding instruments.Some TUIs have placeholders or markers on them so the AR system can replace them visually with virtual objects.Poupyrev et al. [131] use tiles with fiducial markers, while in StudierStube, Schmalstieg et al. [142] allow users to interact through a Personal Interaction Panel with 2D and 3D widgets that also recognises pen-based gestures in 6DOF (Fig. 10). 14http://tangible.media.mit.edu/

Haptic UI and gesture recognition
TUIs with bidirectional, programmable communication through touch are called haptic UIs.Haptics is like teleoperation, but the remote slave system is purely computational, i.e. "virtual."Haptic devices are in effect robots with a single task: to interact with humans [69].The haptic sense is divided into the kinaesthetic sense (force, motion) and the tactile sense (tact, touch).Force feedback devices like joysticks and steering wheels can suggest impact or resistance and are well-known among gamers.A popular 6DOF haptic device in teleoperation and other areas is the PHANTOM (Fig. 11).It optionally provides 7DOF interaction through a pinch or scissors extension.Tactile feedback devices convey parameters such as roughness, rigidity, and temperature.Benali-Khoudja et al. [27] survey tactile interfaces used in teleoperation, 3D surface simulation, games, etc.
Data gloves use diverse technologies to sense and actuate and are very reliable, flexible and widely used in VR for gesture recognition.In AR however they are suitable only for brief, casual use, as they impede the use of hands in real world activities and are somewhat awkward looking for general application.Buchmann et al. [37] connected buzzers to the fingertips informing users whether they are "touching" a virtual object correctly for manipulation, much like the CyberGlove with CyberTouch by SensAble 15 .

Visual UI and gesture recognition
In stead of using hand-worn trackers, hand movement may also be tracked visually, leaving the hands unencumbered.A head-worn or collar-mounted camera pointed at the user"s hands can be used for gesture recognition.Through gesture recognition, an AR could automatically draw up reports of activities [105].For 3D interaction, UbiHand uses wrist-mounted cameras enable gesture recognition [14], while the Mobile Augmented Reality Interface Sign Inter- 15 http://www.sensable.com/pretation Language 16 [16] recognises hand gestures on a virtual keyboard displayed on the user"s hand (Fig. 12).A simple hand gesture using the Handy AR system can also be used for the initialization of markerless tracking, which estimates a camera pose from a user"s outstretched hand [97].
Cameras are also useful to record and document the user"s view, e.g. for providing a live video feed for teleconferencing, for informing a remote expert about the findings of AR field-workers, or simply for documenting and storing everything that is taking place in front of the mobile AR system user.
Common in indoor virtual or augmented environments is the use of additional orientation and position trackers to provide 6DOF hand tracking for manipulating virtual objects.For outdoor environments, Foxlin and Harrington [60] perimented with ultrasonic tracking of finger-worn acoustic emitters using three head-worn microphones.

2.3.5
Gaze tracking Using tiny cameras to observe user pupils and determine the direction of their gaze is a technology with potential for AR.The difficulties are that it needs be incorporated into the eye-wear, calibrated to the user to filter out involuntary eye movement, and positioned at a fixed distance.With enough error correction, gaze tracking alternatives for the mouse such as Stanford"s EyePoint 17 [94] provides a dynamic history of user"s interests and intentions that may help the UI adapt to the future contexts.

2.3.6
Aural UI and speech recognition To reach the ideal of an inconspicuous UI, auditory UIs may become an important part of the solution.Microphones and earphones are easily hidden and allow auditory UIs to deal with speech recognition, speech recording for human-to-human interaction, audio information presentation, and audio dialogue.Although noisy environments pose problems, audio can be valuable in multimodal and multimedia UIs.

2.3.7
Text input 16 http://marisil.org/ 17http://hci.stanford.edu/research/GUIDe/Achieving fast and reliable text input to a mobile computer remains hard.Standard keyboards require much space and a flat surface, and the current commercial options such as small, foldable, inflatable, or laser-projected virtual keyboards are cumbersome, while soft keyboards take up valuable screen space.Popular choices in the mobile community include chordic keyboards such as the Twiddler2 by Handykey 18 that require key combinations to encode a single character.Of course mobile AR systems based on handheld devices like tablet PCs, PDAs or mobile phones already support alphanumeric input through keypads or pen-based handwriting recognition (facilitated by e.g.dictionaries or shape writing technologies), but this cannot be applied in all situations.Glove-based and vision-based hand gesture tracking do not yet provide the ease of use and accuracy for serious adoption.Speech recognition however has improved over the years in both speed and accuracy and, when combined with a fall-back device (e.g., pen-based systems or special purpose chording or miniature keyboards), may be a likely candidate for providing text input to mobile devices in a wide variety of situations [73].

2.3.8
Hybrid UI With each modality having its drawbacks and benefits, AR systems are likely to use a multimodal UI.A synchronised combination of for instance gestures, speech, sound, vision and haptics may provide users with a more natural and robust, yet predictable UI.

2.3.9
Context awareness The display and tracking devices discussed earlier already provide some advantages for an AR interface.A mobile AR system is aware of the user"s position and orientation and can adjust the UI accordingly.Such context awareness can reduce UI complexity for example by dealing only with virtual or real objects that are nearby or within visual range.Lee et al. [96] already utilize AR for providing context-aware bi-augmentation between physical and virtual spaces through context-adaptable visualization. 18http://www.handykey.com/Fig. 12. Mobile Augmented Reality Interface Sign Interpretation Language © 2004 Peter Antoniac.

Towards human-machine symbiosis
Another class of sensors gathers information about the user"s state.Biometric devices can measure heart-rate and bioelectric signals, such as galvanic skin response, electroencephalogram (neural activity), or electromyogram (muscle activity) data in order to monitor biological activity.Affective computing [125] identifies some challenges in making computers more aware of the emotional state of their users and able to adapt accordingly.Although the future may hold human-machine symbioses [99], current integration of UI technology is restricted to devices that are worn or perhaps embroidered to create computationally aware clothes [54].

2.4
More AR requirements Besides tracking, registration, and interaction, Höllerer and Feiner [73] mention three more requirements for a mobile AR system: computational framework, wireless networking, and data storage and access technology.Content is of course also required, so some authoring tools are mentioned here as well.

Frameworks
AR systems have to perform some typical tasks like tracking, sensing, display and interaction (Fig. 13).These can be supported by fast prototyping frameworks that are developed independently from their applications.Easy integration of AR devices and quick creation of user interfaces can be achieved with frameworks like the ARToolKit 19 , probably the best known and most widely used.Other frameworks include StudierStube20 [152], DWARF 21 , D"Fusion by Total Immersion 22 and the Layar23 browser for smart phones.

2.4.2
Networks and databases AR systems usually present a lot of knowledge to the user which is obtained through networks.Especially mobile and collaborative AR systems will require suitable (wireless) networks to support data retrieval and multi-user interaction over larger distances.Moving computation load to remote servers is one approach to reduce weight and bulk of mobile AR systems [25,103].How to get to the most relevant information with the least effort from databases, and how to minimise information presentation are still open research questions.

Content
The author believes that commercial success of AR systems will depend heavily on the available types of content.Scientific and industrial applications are usually based on specialised content, but presenting commercial content to the common user will remain a challenge if AR is not applied in everyday life.
Some of the available AR authoring tools are the CREATE tool from Information in Place 24 , the DART toolkit 25 and the MARS Authoring Tool 26 .Companies like Thinglab27 assist in 3D scanning or digitising of objects.Optical capture systems, capture suits, and other tracking devices available at companies like Inition28 are tools for creating some life AR content beyond "simple" annotation.
Creating or recording dynamic content could benefit from techniques already developed in the movie and games industries, but also from accessible 3D drawing software like Google SketchUp 29 .Storing and replaying user experiences is a valuable extension to MR system functionality and are provided for instance in HyperMem [49].

III APPLICATIONS
Over the years, researchers and developers find more and more areas that could benefit from augmentation.The first systems focused on military, industrial and medical application, but AR systems for commercial use and entertainment appeared soon after.Which of these applications will trigger wide-spread use is anybody"s guess.This section discusses some areas of application grouped similar to the ISMAR 2007 symposium30 categorisation.

Personal information systems
Höllerer and Feiner [73] believe one of the biggest potential markets for AR could prove to be in personal wearable computing.At MIT a wearable gestural interface is developed, which attempts to bring information out into the tangible world by means of a tiny projector and a camera mounted on a collar [108].AR may serve as an advanced, immediate, and more natural UI for wearable and mobile computing in personal, daily use.For instance, AR could integrate phone and email communication with context-aware overlays, manage personal information related to specific locations or people, provide navigational guidance, and provide a unified control interface for all kinds of appliances in and around the home.

Personal Assistance and Advertisement
Available from Accenture is the Personal Awareness Assistant (Fig. 14) which automatically stores names and faces of people you meet, cued by words as "how do you do".Speech recognition also provides a natural interface to retrieve the information that was recorded earlier.Journalists, police, geographers and archaeologists could use AR to place notes or signs in the environment they are reporting on or working in.On a larger scale, AR techniques for augmenting for instance deformable surfaces like cups and shirts [130] and environments also present direct marketing agencies with many opportunities to offer coupons to passing pedestrians, place virtual billboards, show virtual prototypes, etc.With all these different uses, AR platforms should preferably offer a filter to manage what content they display.

Navigation
Navigation in prepared environments has been tried and tested for some time.Rekimoto [136] presented NaviCam for indoor use that augmented a video stream from a hand held camera using fiducial markers for position tracking.Starner et al. [147] consider applications and limitations of AR for wearable computers, including problems of finger tracking and facial recognition.Narzt et al. [112,113] discuss navigation paradigms for (outdoor) pedestrians (Fig. 15a) and cars that overlay routes, highway exits, follow-me cars, dangers, fuel prices, etc.They prototyped video see-through PDAs and mobile phones and envision eventual use in car windshield heads-up displays.Tönnis et al. [157] investigate the success of using AR warnings to direct a car driver"s attention towards danger (Fig. 15b).Kim et al. [89] describe how a 2D traveller guidance service can be made 3D using GIS data for AR navigation.Results clearly show that the use of augmented displays result in a significant decrease in navigation errors and issues related to divided attention when compared to using regular displays [88].Nokia"s MARA project 31 researches deployment of AR on current mobile phone technology.

3.1.3
Touring Höllerer et al. [72] use AR to create situated documentaries about historic events, while Vlahakis et al. [159] present the ArcheoGuide project that reconstructs a cultural heritage site in Olympia, Greece.With this system, visitors can view and learn ancient architecture and customs.Similar systems have been developed for the Pompeii site [122].The life-Clipper32 project does about the same for structures and technologies in medieval Germany and is moving from an art project to serious AR exhibition.Bartie and Mackaness [24] introduced a touring system to explore landmarks in the cityscape of Edinburgh that works with speech recognition.The theme park Futuroscope in Poitiers, France, hosts a show called The Future is Wild 33 designed by Total Immersion which allows visitors to experience a virtual safari, set in the world as it might be 200 millions years from now.The animals of the future are superimposed on reality, come to life in their surroundings and react to visitors" gestures.

3.2
Industrial and military applications Design, assembly, and maintenance are typical areas where AR may prove useful.These activities may be augmented in both corporate and military settings.

Design
Fiorentino et al. [59] introduced the SpaceDesign MR workspace (based on the StudierStube framework) that allows for instance visualisation and modification of car body curvature and engine layout (Fig. 16a).Volkswagen intends to use AR for comparing calculated and actual crash test imagery [61].The MR Lab used data from Daimler-Chrysler"s cars to create Clear and Present Car, a simulation where one can open the door of a virtual concept car and experience the interior, dash board lay out and interface design for usability testing [154,155].Notice how the steering wheel is drawn around the hands, rather than over them (Fig. 16b).Shin and Dunston [144] classify application areas in construction where AR can be exploited for better performance.Another interesting application presented by Collett and MacDonald [47] is the visualisation of robot programs (Fig. 17).With small robots such as the automated vacuum cleaner Roomba from iRobot 34 entering our daily lives, visualising their sensor ranges and intended trajectories might be welcome extensions.

3.2.2
Assembly Since BMW experimented with AR to improve welding processes on their cars [141], Pentenrieder et al. [124] shows how Volkswagen use AR in construction to analyse interfering edges, plan production lines and workshops, compare variance and verify parts.Assisting the production process at Boeing, Mizell [109] use AR to overlay schematic diagrams and accompanying documentation directly onto wooden boards on which electrical wires are routed, bundled, and sleeved.Curtis et al. [51] verify the AR and find that workers using AR create wire bundles as well as conventional approaches, even though tracking and display technologies were limited at the time.
At EADS, supporting EuroFighter"s nose gear assembly is researched [61] while [163] research AR support for Airbus" cable and water systems (Fig. 18).Leading (and talking) workers through the assembly process of large aircraft is not suited for stationary AR solutions, yet mobility and tracking with so much metal around also prove to be challenging.
An extra benefit of augmented assembly and construction is the possibility to monitor and schedule individual progress in order to manage large complex construction projects.An example by Feiner et al. [56] generates overview renderings of the entire construction scene while workers use their HMD to see which strut is to be placed where in a space-frame 34 http://www.irobot.com/structure.Distributed interaction on construction is further studied by Olwal and Feiner [120].

3.2.3
Maintenance Complex machinery or structures require a lot of skill from maintenance personnel and AR is proving useful in this area, for instance in providing "x-ray vision" or automatically probing the environment with extra sensors to direct the users attention to problem sites.Klinker et al. [91] present an AR system for the inspection of power plants at Framatome ANP (today AREVA).Friedrich [61] show the intention to support electrical troubleshooting of vehicles at Ford and according to a MicroVision employee 35 , Honda and Volvo ordered Nomad Expert Vision Technician systems to assist their technicians with vehicle history and repair information [83].

3.2.4
Combat and simulation Satellite navigation, heads-up displays for pilots, and also much of the current AR research at universities and corporations are the result of military funding.Companies like Information in Place have contracts with the Army, Air Force and Coast Guard, as land warrior and civilian use of AR may overlap in for instance navigational support, communications enhancement, repair and maintenance and emergency medicine.Extra benefits specific for military users may be training in large-scale combat scenarios and simulating real-time enemy action, as in the Battlefield Augmented Reality System (BARS) by Julier et al. [81] and research by Piekarski et al. [128].Not overloading the user with too much information is critical and is being studied by Julier et al. [82].The BARS system also provides tools to author the environment with new 3D information that other system users see in turn [22].Azuma et al. [20] investigate the projection of reconnaissance data from unmanned air vehicles for land warriors.

3.3
Medical applications Similar to maintenance personnel, roaming nurses and doctors could benefit from important information being delivered directly to their glasses [68].Surgeons however require very precise registration while AR system mobility is less of an issue.An early optical see-through augmentation is presented by Fuchs et al. [62] for laparoscopic surgery 36where the overlaid view of the laparoscopes inserted through small incisions is simulated (Fig. 19).Pietrzak et al. [129] confirm that the use of 3D imagery in laparoscopic surgery still has to be proven, but the opportunities are well documented.There are many AR approaches being tested in medicine with live overlays of ultrasound, CT, and MR scans.Navab et al. [114] already took advantage of the physical constraints of a C-arm x-ray machine to automatically calibrate the cameras with the machine and register the x-ray imagery with the real objects.Vogt et al. [160] use video see-through HMD to overlay MR scans on heads and provide views of tool manipulation hidden beneath tissue and surfaces, while Merten [106] gives an impression of MR scans overlaid on feet (Fig. 20).Kotranza and Lok [92] observed that augmented patient dummies with haptic feedback invoked the same behaviour by specialists as with real patients.

AR for entertainment
Like VR, AR can be applied in the entertainment industry to create AR games, but also to increase visibility of important game aspects in life sports broadcasting.In these cases a large public is reached, AR can also serve advertisers to show virtual ads and product placements.

Sports broadcasting
Swimming pools, football fields, race tracks and other sports environments are well-known and easily prepared, which video see-through augmentation through tracked camera feeds easy.One example is the Fox-Trax system [43], used to highlight the location of a hard-to-see hockey puck as it moves rapidly across the ice, but AR is also applied to annotate racing cars (Fig. 21a), snooker ball trajectories, life swimmer performances, etc. Thanks to predictable environments (uniformed players on a green, white, and brown field) and chroma-keying techniques, the annotations are shown on the field and not on the players (Fig. 21b).

Games
Extending on a platform for military simulation [128] based on the ARToolKit, Piekarski and Thomas [127] created "ARQuake" where mobile users fight virtual enemies in a real environment.A general purpose outdoor AR platform, "Tinmith-Metro" evolved from this work and is available at the Wearable Computer Lab37 [126], as well as a similar platform for outdoor games such as "Sky Invaders" and the adventurous "Game-City" [45].Crabtree et al. [50] discuss experiences with mobile MR game "Bystander" where virtual online players avoid capture from real-world cooperating runners.

AR for the office
Besides in games, collaboration in office spaces is another area where AR may prove useful, for example in public management or crisis situations, urban planning, etc.

Collaboration
Having multiple people view, discuss, and interact with 3D models simultaneously is a major potential benefit of AR.Collaborative environments allow seamless integration with existing tools and practices and enhance practice by supporting remote and collocated activities that would otherwise be impossible [31].Benford et al. [28] name four examples where shared MR spaces may apply: doctors diagnosing 3D scan data, engineers discussing plans and progress data, environmental planners discussing geographical data and urban development, and distributed control rooms such as Air Traffic Control operating through a common visualisation.
Augmented Surfaces by [137] leaves users unencumbered but is limited to adding virtual information to the surfaces.Examples of collaborative AR systems using see-through displays include both those that use see-through hand-held displays (such as Transvision Rekimoto [135] and MagicBook [32]) and see-through head-worn displays (such as Emmie [38], and StudierStube [152], MR2 [154] and ARTHUR [36]).Privacy management is handled in the Emmie system through such metaphors as lamps and mirrors.Making sure everybody knows what someone is pointing at is a problem that StudierStube overcomes by using virtual representation of physical pointers.Similarly, Tamura [154] presented a mixed reality meeting room (MR 2 ) for 3D presentations (Fig. 23a).For urban planning purposes, Broll et al. [36] introduced ARTHUR, complete with pedestrian flow visualisation (Fig. 23b) but lacking augmented pointers.

Education and training
Close to earlier mentioned collaborative applications like games and planning are AR tools that support education with 3D objects.Many studies research this area of application [30,39,70,74,93,121].
Kaufmann [85], Kaufmann et al. [86] introduce the Con-struct3D tool for math and geometry education, based on the StudierStube framework (Fig. 24a).In MARIE (Fig. 24b), based in turn on the Construct3D tool, Liarokapis et al. [98] employ screen-based AR with Web3D to support engineering education.MIT Education Arcade introduced game-based learning in "Mystery at the Museum" and "Environmental Detectives" where each educative game has an "engaging back-story, differentiated character roles, reactive third parties, guided debriefing, synthetic activities, and embedded recall/replay to promote both engagement and learning" [83].Lindinger et al. [100] studied collaborative edutainment in the multi-user mixed reality system "Gulliver"s World."In art education, Caarls et al. [39] present multiple examples where AR is used to create new forms of visual art.IV LIMITATIONS AR faces technical challenges regarding for example binocular (stereo) view, high resolution, colour depth, luminance, contrast, field of view, and focus depth.However, before AR becomes accepted as part of user"s everyday life, just like mobile phones and personal digital assistants (PDAs), issues regarding intuitive interfaces, costs, weight, power usage, ergonomics, and appearance must also be addressed.A number of limitations, some of which have been mentioned earlier, are categorised here.

Portability and outdoor use
Most mobile AR systems mentioned in this survey are cumbersome, requiring a heavy backpack to carry the PC, sensors, display, batteries, and everything else.Connections between all the devices must be able to withstand outdoor use, including weather and shock, but universal serial bus (USB) connectors are known to fail easily.However, recent developments in mobile technology like cell phones and PDAs are bridging the gap towards mobile AR.
Optical and video see-through displays are usually unsuited for outdoor use due to low brightness, contrast, resolution, and field of view.However, recently developed at MicroVision, laser-powered displays offer a new dimension in head-mounted and hand-held displays that overcomes this problem.
Most portable computers have only one CPU which limits the amount of visual and hybrid tracking.More generally, consumer operating systems are not suited for real-time computing, while specialised real-time operating systems don"t have the drivers to support the sensors and graphics in modern hardware.

Tracking and (auto)calibration
Tracking in unprepared environments remains a challenge but hybrid approaches are becoming small enough to be added to mobile phones or PDAs.Calibration of these devices is still complicated and extensive, but this may be solved through calibration-free or auto-calibrating approaches that minimise set-up requirements.The latter use redundant sensor information to automatically measure and compensate for changing calibration parameters [19].
Latency A large source of dynamic registration errors system delays [19].Techniques like precalculation, temporal stream matching (in video see-through such as live broadcasts), and prediction of future viewpoints may solve some delay.System latency can also be scheduled to reduce errors through careful system design, and pre-rendered images may be shifted at the last instant to compensate for pan-tilt motions.Similarly, image warping may correct delays in 6DOF motion (both translation and rotation).

Depth perception
One difficult registration problem is accurate depth perception.Stereoscopic displays help, but additional problems including accommodation-vergence conflicts or low resolution and dim displays cause object to appear further away than they should be [52].Correct occlusion ameliorates some depth problems [138], as does consistent registration for different eyepoint locations [158].
In early video see-through systems with a parallax, users need to adapt to vertical displaced viewpoints.In an experiment by Biocca and Rolland [35], subjects exhibit a large overshoot in a depth-pointing task after removing the HMD.

Overload and over-reliance
Aside from technical challenges, the user interface must also follow some guidelines as not to overload the user with information while also preventing the user to overly rely on the AR system such that important cues from the environment are missed [156].At BMW, Bengler and Passaro [29] use guidelines for AR system design in cars, including orientation on the driving task, no moving or obstructing imagery, add only information that improves driving performance, avoid side effects like tunnel vision and cognitive capture, and only use information that does not distract, intrude or disturb given different situations.

Social acceptance
Getting people to use AR may be more challenging than expected, and many factors play a role in social acceptance of AR ranging from unobtrusive fashionable appearance (gloves, helmets, etc.) to privacy concerns.For instance, Accenture"s Assistant (Fig. 14) blinks a light when it records for the sole purpose of alerting the person who is being recorded.These fundamental issues must be addressed before AR is widely accepted [73].

V CONCLUSION
We surveyed the state of the art of technologies, applications and limitations related to augmented reality.We also contributed a comparative table on displays (Table 1) and a brief survey of frameworks as well as content authoring tools (Section 2.4).This survey has become a comprehensive overview of the AR field and hopefully provides a suitable starting point for readers new to the field.
AR has come a long way but still has some distance to go before industries, the military and the general public will accept it as a familiar user interface.For example, Airbus CIMPA still struggles to get their AR systems for assembly support accepted by the workers [163].On the other hand, companies like Information in Place estimated that by 2014, 30% of mobile workers will be using augmented reality.Within 5-10 years, Feiner [57] believes that "augmented reality will have a more profound effect on the way in which we develop and interact with future computers."With the advent of such complementary technologies as tactile networks, artificial intelligence, cybernetics, and (non-invasive) brain-computer interfaces, AR might soon pave the way for ubiquitous (anytime-anywhere) computing [162] of a more natural kind [13] or even human-machine symbiosis as Licklider [99] already envisioned in the 1950"s.

TABLE 1 :
CHARACTERISTICS OF SURVEYED VISUAL AR DISPLAYS.