ARTiFICe - Augmented Reality Framework for Distributed Collaboration

(cid:178) This paper introduces a flexible and powerful software framework based on an off the shelf game engine which is used to develop distributed and collaborative virtual and (cid:68)(cid:88)(cid:74)(cid:80)(cid:72)(cid:81)(cid:87)(cid:72)(cid:71)(cid:3)(cid:85)(cid:72)(cid:68)(cid:79)(cid:76)(cid:87)(cid:92)(cid:3)(cid:68)(cid:83)(cid:83)(cid:79)(cid:76)(cid:70)(cid:68)(cid:87)(cid:76)(cid:82)(cid:81)(cid:86)(cid:17)(cid:3)(cid:58)(cid:72)(cid:3)(cid:71)(cid:72)(cid:86)(cid:70)(cid:85)(cid:76)(cid:69)(cid:72)(cid:3)(cid:36)(cid:53)(cid:55)(cid:76)(cid:41)(cid:44)(cid:38)(cid:72)(cid:182)(cid:86)(cid:3)(cid:73)(cid:79)(cid:72)(cid:91)(cid:76)(cid:69)(cid:79)(cid:72)(cid:3) design and implementation and demonstrate its use in research and teaching where 97 students in two lab courses developed AR applications with it. Applications are presented on mobile, desktop and immersive systems using low cost 6-DOF input devices (Microsoft Kinect, Razer Hydra, SpaceNavigator), that we integrated into our framework.


INTRODUCTION
Developing Virtual and Augmented Reality (VR/AR) applications requires a lightweight and flexible but still powerful VR/AR framework, which is extendable to easily integrate new devices and technologies.A wide variety of VR/AR hardware and software setups have been built in the past.However, all share a common general system architecture.The key elements of an VR/AR system are shown in Fig. 1 and comprise input and output devices whose spatial position and orientation might be tracked, a computing platform with a powerful graphics processor and a VR/AR software framework handling input, output and application behaviour.The most important part is the user (or multiple, collaborating users) working on a certain task and interacting with the system.
Diverse technologies for tracking are in use to determine the location of input and output devices as well as specific body parts of the user up to full body motion capture.Tracking data of these devices is received by the computing platform (e.g.workstation, mobile device) and handed over to the VR/AR and transforms input data to provide it for subsequent usage within the application.Based on this input data 3D interaction techniques can be supplied to the user.Recent low-cost video game controllers as well as powerful mobile hardware offer great opportunities for novel interaction supporting multiple users for collaborative virtual reality applications.
If multiple users work together, communication is controlled by a network layer while 3D interaction is handled by an event handling mechanism.Subsequently, the virtual scene is visualized to the user on its output device using the rendering engine.Ideally, a VR/AR framework should offer high quality real-time rendering, physics support, networking and scene II.
RELATED WORK Since the mid-1990s, a large number of VR/AR frameworks have been developed and a variety of systems supporting distributed VR applications emerged [5].Most software frameworks are based on scene graph libraries, e.g., open source toolkits such as Studierstube [16], VR Juggler [3], Avango [7] or commercial ones like 3DVIA Virtools [22].They provide varying support of multiple input and output devices.In the following, we briefly discuss frameworks related to our work.
Studierstube is an application framework for collaborative Augmented Reality.Its development started in 1996 and continued for almost ten years while it was used for research and teaching.It simultaneously supports multiple users as well as multiple applications, which are embedded as nodes in a scene graph.While this open source C++ based framework is very powerful, it is hard to maintain, does not provide a graphical user interface for scene management and is difficult to learn within a short period of time, which is important if used for teaching.Additionally, recent technologies such as mobile phones or depth imaging sensors can hardly be integrated.
3DVIA Virtools is a commercial development and deployment platform for interactive 3D content creation.It supports multiple users and physics behaviour to create immersive and distributed applications using industry standard VR peripherals.It offers a comprehensive graphical development environment and can deploy to a wide range of output devices.However, its application for research and teaching is limited due to high licensing costs.
One of the first AR frameworks using off the shelf software to design and develop AR applications was DART [8].DART is based on the Macromedia Director multimedia programming environment.It uses the familiar Director paradigms of a score, sprites and behaviours to allow a user to visually create complex AR applications.DART also provides low-level support for the management of trackers, sensors, and cameras via a Director plug-in Xtra.However, DART is not suitable for research and teaching due to licencing costs for Director.In addition, it lacks stereo output support and the timeline based scene management is rather made for story telling environments than VR/AR applications.
Similar to Virtools, Unity3D [17] features an editor for authoring 2D and 3D content and compromises a game engine for executing the application.Nevertheless, Unity3D by itself is no VR/AR framework but is designed for creating 3D video games and other interactive content.It offers a powerful render engine providing lighting, physics, network communication for collaboration and content distribution.Furthermore, it provides an integrated programming environment using C#, JavaScript or Boo while development can be done under Windows as well as Mac OS X.The final application can be built generally without changes for various platforms such as Windows, Mac, iOS, Android, all major game consoles, Flash and web clients.Unity3D is available for free and applications can be deployed at no charge to Windows and Mac.
Our framework uses Unity3D as its underlying development platform and rendering engine.All VR/AR specific extensions were built around Unity3D.

III. FRAMEWORK OVERVIEW & DATA FLOW
Concerning our motivation stated in section I we aimed on developing a loosely coupled modular software framework which can easily be adapted to support novel devices and interaction techniques.In Fig. 2, an overview of the framework with its components and the data flow is illustrated.
Tracking data from various workstation-based input devices as well as mobile devices are fed into the framework using a transparent and adaptive middleware layer.The middleware transforms all input data in a consistent way and delivers it to the application layer.The application layer is built on top of an external game engine.Within the application layer, the ARTiFICe core handles the tracking input data, provides interaction technique and distribution support and delivers the with real-time interaction is then visualized on different output To provide various 2D as well as 3D interaction techniques the ARTiFICe core comprises a flexible interaction framework, which offers multiple pre-defined interaction techniques and is easy to extend with novel interaction metaphors.The interaction framework offers a single interface to process tracking data from a workstation based device or from a mobile phone to control the virtual interaction device for selecting and manipulating virtual content.
Besides interaction with 3D content, the co-presence of multiple users interacting with the same content at the same point in time opens up great possibilities for collaborative work.Hence, we integrated a distribution framework into the ARTiFICe core to enable real time user-managed collaboration for various hardware setups for two or more users over the network.Further details about the ARTiFICe core are given in the next section.

IV. IMPLEMENTATION
We decided to use Unity3D as the base infrastructure for the application layer of our proposed VR/AR framework.It offers a powerful rendering engine, the possibility to extend its functionality with own features and a strong developer community.Furthermore, the free to use license which includes all features we need for authoring, rendering and physics support.All implementations are done using C++ and C#.

Middleware Layer
To gather all tracking input by various devices in one single software layer we use OpenTracker (OT) [15].It is open source tracking middleware, which offers transparent and flexible device integration, pre-processes input events and passes them to the application layer.It provides a framework for the different tasks involved in tracking input devices in VR/AR applications and eases the development and maintenance of hardware setups in a flexible manner.This is achieved by using an object-oriented design based on XML and utilizing standard XML tools for development, configuration and documentation.A multi-threaded execution model takes care of tunable performance, filters and transformations can be applied to tracking data.XML based configuration files are used to describe tracking configurations that usually consist of multiple input devices.To fetch tracking data from remote input devices, Virtual-Reality Private Network (VRPN) [18] can be used.It is a device-independent and network-transparent framework for devices used in VR/AR systems.
Hence, to provide a transparent interface and for loose coupling between the set of physical devices and the application layer we integrated OT as middleware in ARTiFICe to hand over the tracking data from input devices.Therefore, we developed a new OT node calle sink for all tracking devices.The Unity-node is referenced during run-time by the ARTiFICe core for fetching tracking data to provide them within the application.To access tracking data over the network in OT we use VRPN interfaces.

Tracking Devices & Integration
We integrated support for several tracking systems in our framework to enable desktop-based, semi-immersive, full immersive as well as mobile VR/AR setups.
For desktop setups ARToolKit [6] as well as ARToolkit+ [21] are easy to use tracking libraries providing a square planar shape for pose estimation and an embedded 2D pattern for distinguishing markers.They calculate camera position and orientation relative to physical markers in real time and thereby enable the development of a wide range of Augmented Reality applications.ARToolkit is usually applied for desktop based AR environments while ARToolkit+ enhances the original ARToolkit library and is optimized for usage on mobile devices.We use ARToolkit+ within our framework, which has been previously integrated into OT.OpenVideo [10], a data integration-and processing framework, is applied to acquire video frames from the webcam, which are processed by ARToolkit+ and later streamed into Unity3D to provide a view of the real world scene.
We integrated a 3D mouse and Razer Hydra into the OT Unity-node to enable 6DOF desktop interaction.For semi-as well as fully immersive mixed reality applications iotracker [13] as a passive marker based infrared optical tracking system is suitable.It tracks arbitrary physical objects furnished with passive markers with an update rate of 60 Hz and very low latency (20-40ms), minimal jitter (RMS less than 0.05mm), submillimeter location resolution and an absolute accuracy of ±0.5cm.For integration we interfaced OT with iotracker using VRPN and the OT Unity-node.
Besides marker-based optical tracking we also wanted to allow markerless full-body motion tracking.Therefore, we integrated Microsoft Kinect using the OpenNI/NITE [9][12] framework and FAAST [17].OpenNI/NITE provides an API to access raw depth data as well skeleton data, which are calculated based on the depth data.FAAST runs as self-contained application and reads this data.It provides gesture recognition support and full body tracking data via VRPN to OT.Using the OT Unity-node we feed real-time skeleton tracking and gestures into the ARTiFICe core.
The development of AR applications for current handheld devices must also be supported by a modern VR/AR framework.Vuforia [20] is a tracking framework that enables creation of mobile AR applications by using the display of the mobile device as a "magic lens" into an augmented world.It detects a variety of real 2D and 3D objects by robust natural feature tracking and hands over tracking-and pose-information to the AR application to visualize artificial content within the live camera image in the smartphone display.It runs on iOS and Android.In our framework Vuforia acts as mobile middleware and mobile data processing component and is interfaced to the ARTiFICe core.
Our flexible middleware concept allows configuration of all devices and any combination of them using a single OT XML configuration file.Configuration of mobile devices is treated separately using Vuforia.

Application Layer
In Fig. 3, a detailed view on our framework with its data flow and core components is given.The ARTiFICe core comprises a Manager and a tracking-, interaction-as well as distribution framework.scene authoring.Therefore, Unity3D provides a C# container class called GameObject to which geometry, transformation nodes, textures, physical properties as well as own C# classes can be attached to control visual appearance and overall behaviour of a virtual scene object.These GameObjects can be grouped in a logical manner forming a transformation hierarchy, which corresponds to the matrix stack concept in OpenGL.For in depth explanations please refer to the Unity3D documentation [19].

1) Tracking Framework:
To feed tracking data of an input device into the transformation node of a GameObject for mapping the real physical position and orientation to a virtual object we developed the ARTiFICe tracking framework.It inherits from the Unity3D base class MonoBehaviour to be able to attach the deriving classes to any virtual scene object.The overall design of the tracking framework is shown in Fig. 4.
The tracking framework differs between mobile and workstation-based tracking input.In the mobile setup TrackMobile accesses the current device pose by using Vuforia.TrackerBehaviour.For the various workstation setups a subclass for each supported device was implemented.Depending on the input device the corresponding subclass is attached to the virtual scene object.As soon as the application starts TrackProvider creates ARTiFICe Trackers through the ARTiFICe Manager, which is implemented as singleton.Each ARTiFICe Tracker is interfaced to the corresponding OT Unity-node and thereby provides the input data of all used devices to the tracking framework.In addition, for 2D marker tracking we developed our own multi-marker tracking support to be able to track cuboid-formed 3D objects and determine its absolute physical pose.All concrete tracking subclasses provide a consistent tracking data layer, which can be used as tracker object for further processing within the interaction framework.

2) Interaction Framework
Raw tracking data which is fed into the transformation node of a virtual scene Unity3D.GameObject should be controllable to provide interaction techniques (IT) for 3D objects selection and manipulation.Therefore, we developed an interaction framework, which is illustrated in Fig. 5.

Fig. 5. Interaction class hierarchy
A tracker object is used to access raw tracking data.This data is then processed by the specific IT and handed to the interaction framework.The abstraction layer ObjectSelectionBase provides a straight forward and clean interface of data handling for workstation as well as mobile setups and offers a transparent layer to integrate new techniques into the framework.The only information which must be handed over to ObjectSelectionBase is a list of the selected object(s) and the absolute pose of the interaction object, calculated by the IT.This data is then processed by the InteractionBase class and delivered to all selected virtual scene objects.Virtual scene objects which should be selectable must have the ObjectController class attached.Depending on the given pose the ObjectController manipulates the position and orientation of the selected scene object.
As concrete 3D interaction techniques, we implemented several standard VR interaction metaphors such as a simple VirtualHand, GoGo [14], Aperture [4] and HOMER [2].For interacting in a three-dimensional manner on the mobile phone, we adapted the HOMER interaction technique by using the physical pose of the device for object manipulation.

3) Collaboration & Distribution Framework
To provide multi-user support for interaction with different interaction devices and collaboration on one scene over large distances, we implemented a collaboration and distribution framework.It is loosely coupled with the interaction framework and enables distribution for mobile as well as for all workstation setups.The network functions are based on the Unity3D network layer using UDP for communication.We are using a client-server architecture with a direct connection between the server and all clients (star topology).For data exchange remote procedure calls (RPC) and state synchronization are employed.To prevent data loss the state synchronization is buffered.
An overview of the distribution framework and its connection to the interaction framework is given in Fig. 6.The NetworkBase class provides functions to initialize the server and to connect a client to the server.All connected clients are managed by the UserManager class, implemented as singleton.
To reduce necessary hardware for realizing a client-server application and to improve overall usability, one device can act as server and client simultaneously.For distributed collaboration each selectable scene object must attach a NetworkObjectController, which distributes selection and manipulation functionality over the network.To enable exclusive access of a scene object ExclusiveAccessObjectController prevents simultaneous usage by multiple users.As long as a user manipulates the scene object it is locked for other users.To provide exclusive object access to a specific user the UserManagmentObjectController is used.

VR/AR Application Prototyping
In the following we briefly describe the workflow to build a new VR/AR application.First, a new Unity3D project is created and the ARTiFICe framework is added to the project folder.If using a mobile device, the Vuforia Unity3D extension must be included into the Unity3D project as well.All workstation based input devices are configured in the single OT configuration file.Cameras, lights and scene objects are then added to the virtual environment using the Unity3D graphical scene management.They are encapsulated as Unity3D.GameObjects.Virtual entities which act as tracker-, interaction-or selectable scene objects are subsequently connected to the according classes of the ARTiFICe framework.Finally, the project is built and deployed to the desired platform and run as single or multi user VR/AR application.

Supported Devices
Currently, ARTiFICe supports the use of multiple ARToolkit+ markers, the 3D Connexion SpaceNavigator, Microsoft Kinect, Razer Hydra, 6DOF devices tracked by iotracker and mobile phones using Android 2.1 or higher.In addition, all devices that are supported by OpenTracker and VRPN can be used in ARTiFICe.

V. RESULTS
Our framework was intensely tested and evaluated using different setups (section 5.1).Currently, the whole framework runs on Windows and most parts of it (except Kinect and ARToolkit+) on Mac OS X, too.In addition, ARTiFICe was utilized to develop AR applications for the Android platform.
during winter term 2011/12.80 students built four small VR/AR applications using ARToolkit+ markers for interaction.Subsequently, students developed a distributed and collaborative VR/AR application using various interaction techniques with 3D Connexion SpaceNavigator and Microsoft Kinect.Additionally, our framework was employed for the lab Sciences Upper Austria in winter term 2011/12.17 students were able to develop a mobile distributed and collaborative VR/AR application within just four weeks.Besides that our framework is currently used within a number of on-going research projects which involve fully immersive setups, distributed mobile interaction, 6DOF desktop interaction using Razer Hydra and the Sony Move controller.
By focusing on a well-defined virtual scene management, loose coupling of input devices and interaction techniques, we created an environment which allows technically experienced users to adapt the framework to their needs for application development.In addition, we provide an easy to use framework to help students getting over the initial hurdles of creating quick prototypes of an embodied AR experience.

Setup Examples
In the following section, we present different hardware setups and interaction devices in various applications, all based on ARTiFICe.

1) Mobile Setup
In Fig. 7, a collaborative and distributed mobile application is shown providing an interactive AR game with physics elements.Therefore, an arbitrary image is tracked with a mobile phone and a virtual scene is mapped onto this image within the mobile display.Multiple users can collaborate, either by pointing their phone on the same physical image or at different physical images showing the same motive.The user on the left hand side currently manipulates a virtual block by HOMER interaction technique while the user on the right observes this interaction.

2) Desktop Setups
A VR desktop application was realized using Razer Hydra as a highly accurate 6DOF interaction device.In an application for geometry education virtual scene objects are controlled using the Hydra, as illustrated in Fig. 8.A collaborative and distributed desktop AR application using multiple ARToolkit+ markers forming a MagicBook [1] is shown in Fig. 9.An ARToolkit+ cube is used as multiple-purpose interaction device.

3) Desktop and Semi-Immersive Setup
In addition to pure VR/AR desktop setups, we used ARTiFICe to realize combined desktop and semi-immersive environments.In Fig. 10 and Fig. 11, a collaborative and distributed multi-user scenario is shown.User1 controls the speed and direction of a flying object by gesture recognition and motion capture using Microsoft Kinect (Fig. 10).User2 interacts with a 3D mouse controlling the height of the flight and clearing the obj technique (Fig. 11).

4) Immersive Setup
We used our framework to develop a server-client application to provide training for prosthesis patients.The software consists of a server application to control all parameters and a client application to visualize the virtual environment in the HMD.In Fig. 12, a demo setup of this fully immersive application is shown; a 6DOF rigid body target is Tracking of HMD-and arm target is done using iotracker.

VI. FUTURE WORK
Currently we are working on a proper Sony Move integration in ARTiFICe by implementing OpenTracker support and blob tracking for sensor fusion.More details can be found on the open source Sony PlayStation Move project site [11].
We will focus on improving mobile support and interaction.Therefore, we are currently evaluating the concurrent usage of a semi and fully immersive setup in combination with a mobile phone.Here, we aim on the flexible management of provided user interaction depending on the used device.Furthermore, integration of other mobile AR frameworks besides Vuforia will be assessed and testing of the framework on iOS is planned.
Moreover, we plan to provide our framework as open source project to the developers and research community.

Fig. 3 .
Fig. 3. Detailed framework components The ARTiFICe Manager controls the data flow between middleware and application layer.At application start-up it reads the OpenVideo and OpenTracker configuration files and loads the dependent tracking libraries.It starts an OpenTracker instance and an OpenVideo handler for ARToolkit+ marker tracking.It also closes OpenVideo and stops OpenTracker at application shutdown.A detailed explanation of the tracking-,

Fig. 12 .
Fig. 12. Immersive setup with 6DOF passive targets Stereo projection provides the HMD view to an audience to explanations.