Artiﬁcial Neural Network in Virtual Reality : A Survey

.


I. INTRODUCTION
It was Jaron Lanier, who coined the term Virtual Reality (VR) for the first time.From a psychological, sociological, philosophical and cognitive perspective, VR become an assisting tool in understanding the role of human perception.It bridges the gap between the computer and a persons five senses through simulation.The amalgamation of software and hardware in the development of VR or VR related applications presents various challenges.In order to address numerous challenges involved in VR, role of Artificial Neural Network (ANN) can be a resourceful tool.ANN is a well-established demarcation signifying its applicability in curve fitting, classification and pattern recognition problems (Haykin and Network 2004;Duda, Hart, and Stork 2012;Smits et al. 1992;Bishop and Roach 1992;Bekker and Bouwman 2009).Before understanding role of ANN in VR, it is important to envisage the evolution of VR in modern system.
VR became a very popular term in 1965 when it was pioneered by Ivan Sutherland in his paper "THE ULTIMATE DISPLAY" .He fabricated a head mounted display (HMD) E-mail: greeshmacct@gmail.com which was tethered to a mainframe.In 1977, Sandine et al. developed the first ever data glove and made VR communicable through soft body movements.After Evolution of VR with HMD and Data glove, Krueger (1985) investigated its projections onto walls by providing new terminology "VIDEOPLACE".It was the first developed virtual environment (VE) and used by art gallary.Extending to this idea in 1992, Cave Automatic Virtual Environment (CAVE) was developed and further used for the visualization of data where it was displayed on the walls of a room rather than the HMD.In 2007, Google launched a street view for navigation purposes.It took few decades for VR to evolve from heavy HMD to the Android platform.
The pace of the evolution of VR depends on its basic properties.Immersivity and Interactivity are the two key elements or assets of VR (Borghese 1997).Immersivity occurs when all the five senses are engaged in VR.Interactivity is achieved when the user is able to interact with his/her environment through any body movement.Sensory feedback and virtual world are two more chattels for creating a virtual environment (Sherman and Craig 2002).Sensory feedback is the mirroring of a subjects sensory state at a certain time interval during exposure.Virtual world includes virtual objects and their relationships inside the environment that makes the resulting virtual environment meaningful.The basic applications of VR are effectuated in the field of military and rehabilitation ( This review paper gives insights into various algorithms and their respective training functions related to ANN and also provides a summary of how these algorithms help in solving problems of VR development and interfacing.We also try to articulate that which algorithm should be used with the type of application.In addition to these, it also addresses whether it is essential to integrate other techniques along with ANN for a comprehensive VR experience.

II. ARTIFICIAL NEURAL NETWORK
The brain consists of millions of neurons and trillions of subsequent connections between them.In order to accomplish a particular task, each neuron fires either individually or as part of a group.According to the Connectionist model (Mc-Culloch and Pitts, 1943) all neurons are connected to each other by synapses and renders a very logical explanation for the comportment of real neurons.But this model did not accommodate learning since the network formed by the interconnections was non-recurrent.To overcome such limitations, Frank Rosenblatt (1958) proposed a model called the Perceptron.The perceptron is the basic unit of processing in a Neural Network and enables pattern recognition by accepting inputs in the form of associative units.These associative units select certain features on the basis of functions.Perceptron was meant for machines, but later on it evolved into programs and laid the foundation of neural network programming.The neural network has an inherent advantage of learning and adapting from its environment.This property makes it suitable for use in application of pattern recognition.Another property of ANN is that it can build connectivity patterns based on error approximation.Their generalization property helps in identifying similar patterns in the test data (Benardos and Vosniakos 2007).This comprises of both supervised as well as unsupervised learning.Multilayer perceptron or feed forward net and radial basis function are the two main models and the rest of the models are their derivatives (Bishop 1995).Training algorithms developed into models are nothing but derivatives of approximation with the adaptation of weight values.According to benchmark methodologies (Prechelt 1995) the data acquired for each problem is categorized into three sets.The first is a training set which modifies the weight.The second is a validation set which stops the training process.It is used to improve the generalization performance (Islam et al. 2009).As the validation error starts to increase, the training process comes to a halt.The third is a testing set which is primarily used for prediction or pattern recognition.It uses the Jacobian matrix or the Hessian matrix with the gradient depending on the type of algorithm.The limitation of the Jacobian matrix lies in that it uses either the mean or the sum of squared error as its performance measure.The mean is a very generic measure of performance.A performance measure basically means a 'one-of-a-kind' distinguishing algorithm/mathematical index which effectively differentiates one entity from another.To understand variation of algorithms of ANN with an application, let's take an example.The Back Propagation Algorithm (BPA) is commonly used in applications involving face recognition or face detection tasks (Bouzalmat et al. 2011;Juell and Marsh 1996;Smach et al. 2006;Shah et al. 2013).It is beneficial because it can train both the feed forward net and the recurrent net.It forms an arbitrary complex nonlinear mapping, though it fails to empirically understand the precise conditions to generate any arbitrary mapping procedure.The learning is slow and the number of hidden layers and neurons are un-known.Such disadvantages can be compensated for by using a second order derivative.It uses Hessian, δ 2 E δω 2 , of the error with respect to the weights to adapt the step size in the direction of the optimal weight update.To solve problem in computational and biological fields, ANN provides distinct training algorithms.For example, to classify Computed Tomography (CT) images there are three training algorithms (Sharma and Venugopalan 2014).The first is a gradient descent algorithm which updates the weights and biases in the direction of the negative gradient of performance function.Second, a conjugate gradient algorithm which adjusts the weights in the direction in which the performance function is decreasing very rapidly.The third is the concept of the quasi-Newton algorithm that utilizes the fact that an objective function can be handled hurdled as a global variable, by the method of minimizing a sum of squares(?;Gavin 2011).There are multiple training functions and sub algorithms available in different algorithms for a specific function i.e. if at all a module requires higher convergence speed or if at all the paradigm lies within the realm of a pattern recognition problem.However, there are some causalities related to the function available, for instance, for the aforementioned criterion, the former will require a large storage and the later will fail miserably in the function approximation problem.Most of the learning algorithms are based on the gradient descent method while the others use a metaheuristic approach.This approach includes a global search strategy with a much more diverse solution.To resolve such problems, a novel hybrid approach is proposed (Yaghini, Khoshraftar, and Fallahi 2013).It joins the global metaheuristic and local gradient based algorithms (Improved Opposition based Particle swarm optimization-Back Propagation algorithm (IOPSO-BPA)).The training time and accuracy of the proposed hybrid algorithm is calculated for eight benchmark problems and then compared with (BPA), (IOPSO) and IOPSO-GA (Genetic Algorithm).BPA needs more time and iterations to converge, but lesser number of dispersion solutions.IOPSP-BPA produces lesser number of dispersion solutions, but it is more stable in the case of variable starting conditions.
In summary, in this section we introduced basic of ANN and its different training algorithms.In the upcoming sections, we will see ANN's contribution towards tremendous applications of VR.

III. APPLICATION IN VR Facial expression recognition
Facial expression is one of the components of nonverbal communication that can carry a lot of information.It exhibits the emotional state of a person and helps in comprehending the human behaviour.ANN in conjunction with VR finds application in tasks such as users identifying their details over the internet.This is accomplished through the back propagation algorithm that trains the ANN by taking into context the users audio, video and image inputs (Sait and Raza 2011).In an extension of its wide ranging applications in entertainment, it is of great importance in the field of rehabilitation too.In this context, Besides all of these approaches, ANN faces some competition within itself.For example, to identify static faces in different expressions, Lawrence et al. [38] carried out a comparison between the Convolutional Network (CN) and the multilayer perceptron (MLP) and subsequently found CNs to be better than MLPs (though the architecture of CNs was inspired from the MLPs) by considering spatially-local correlations.In the 21st century, VR has, in the course of time, found many insightful applications in virtual showmanship, video conferencing, online chatting and online video games(Liang, Pan, and Chen 2004).

Human Body Tracking
Human Body Tracking finds relevance to a plethora of applications in diverse fields ranging from cognitive enhancement to sports and rehabilitation.In this section, we will try to explore the role of ANN in the field of rehabilitation.There exists a neural approach to track the human body silhouette (Goffredo et al. 2006).Snake algorithm (Active Contour Models) in combination with ANN (Resilient BP) uses the segmentation and contour method to track the human arm.Here, ANN acts as a predictor that improves the accuracy with which the arm movement is tracked.The velocity and acceleration of each contour point are taken as the ANN inputs, whereas the ANN output acts as an input for SNAKE.The results display an estimation of the silhouette over the recorded time series.Along with the advancement in algorithm, it is possible to make an action recognition system by tracking the human body with Kinect(Martínez-Zarzuela et al. 2014).Kinect is used to track the body position and the subsequent processing & classification is done using Fuzzy based NN (based on the concept of Adaptive Resonant Theory (ART).This amalgam facilitate application of VR in sports psychology.In the ensuing studies, a Flexible Action and Articulated Skeleton Toolkit (FAAST) was formulated which delineated the applications of action recognition in VR (Suma et al. 2011).ANN finds its role not only in Human Body tracking but also in the field of hand tracking or gesture recognition with the means of data gloves.Interfacing data gloves with VR makes it possible to distinguish gesticulations in VR, to move the fashioned avatar or to perform tasks using body movements.However, this ideology can become perplexing when as the degree of freedom increases in hand motion.Applications of hand gestures to pilot and control objects in VR has become a norm.Data gloves and neural network classifiers are combined together to form control system for VR applications.Its first demonstration was given by Weissmann and Salomon(Weissmann and Salomon 1999).They used cyber glove data that measured the angle of 18 joints in the hand along with VR to recognize hand gestures such as fist, index finger and victory sign.The authors compared BP with Radial Basis Function(RBF) and found the RBF to be more advantageous as it could be easily retrained at run time, owing to its linear character.However, advantage of BP is that it has more recognition rates.To solve such ambiguity Salomon and Weismann (Salomon and Weissmann 2000) used an evolutionary algorithm which tuned RBF in such a way that it gave recognition hits comparable to BP .Analogous to this, a cyber-glove was used by Deyou Xu (Xu 2006) for interfacing with a virtual reality based driving training system of a Self-Propelled Gun (SPG).They apparently trained 300 static hand gestures in BPNN.
All the papers mentioned above used the Stuttgart Neural Network Simulator.Besides these two training algorithm i.e.BP and RBF, Luzanin and Plancak (Luzanin and Plancak 2014) used a five sensor data glove to recognize simple and complex hand gestures.They applied the Probabilistic Neural Network (PNN) for training and found that it got slow with large training sets.A clustering ensemble could solve this particular problem by reducing the volume of the training data set without damaging the quality of training.From desktop VR to immersive VR, ANN identify numerous ways to solve VR's problems.For example, the prediction of human arm movement in Collaborative Virtual Environment(CVE) is possible by means of neural networks (Stakem and AlRegib ).Feed forward back propagation neural networks is an effective solution in this regard.If we talk about gesture's application in another field then it can be in sign language or language translation (Fels 1994;Fels and Hinton 1995).In regards to this, hand gestures are transformed into speech using NN.Here the hand shape is treated as the root word, and based on this, the hand movements, in their different directions, are translated into words such as come, go, I, you etc.
However human body tracking is not limited for identifying hand gesture or silhouette,but, it is also equally applicable in foot detection.Relevant to this statement, one multimodal football game was built for successfully detecting human foot gesture (Lv et al. 2014).

Face detection/tracking
Face tracking has always posed a problem in VR because it depends on the orientation of the face and the luminescence of the environment.In addition to this, face detection also become challenging when sensors or wires are interfaced to the virtual environment(VE).In such conditions, VR needs a high tracking frame rate with low latency for detecting faces.Subsequently, training algorithms should be trained on the basis of some faces in advance.This learning phase takes time(Rowley, Baluja, and Kanade 1998).Therefore, a need for an intelligent training algorithm which can discriminate between faces and non faces arises (Shah et  The advantage with BP is that it can distinguish between faces and non faces easily.The BP algorithm is preferred over other neural network algorithms because of its unique ability to minimize errors and higher accuracy (Chaudhary et al. 2012).One of the basic problem that tickle face detection is various kinds of light detection methods.To solve this problem, Girado et al. (Girado et al. 2007) came up with an unique solution .They used a centre camera to detect the 2D face position with the help of neural networks.It could also track upright, tilted, frontal and non frontal faces.They also used two Kohonen nets, one which was a recognizer containing 128 neurons for recognizing a face and a tracker with 32 neurons for tracking the already recognized face.The neural-network ellipse followed the face using a block matching motion estimation algorithm .A different approach to ANN is taken when the avatar is authenticated by employing different ideologies for verification and recognition in "second life"(Yampolskiy, Klare, and Jain 2012).

Radiosity
It is a global illumination algorithm that is used to evaluate the transmission of light in virtual world characters.The ROVER uses BPA since meshing cannot be completed by radiosity alone.The BPNN is trained with light rays and the resultant neural meshes create a realistic illumination in VR (Sillion, Puech, and others 1994;Moller 1996).

Visualization
Data mining finds VR to be a very competent tool for visualizing data.VR allows immersion and hence generates an indelible impact on the viewer.The user can interact with the data and understand the hidden potential of the data effectively.The user can also manipulate data dynamically and transfer it to other platforms.It can also be used for 3D molecular visualization (Lv et al. 2013).Collaborative Virtual Environment (CAVE) is one such environment that aids in data visualization (Churchill, Snowdon, and Munro 2001).Dimensionality reduction is one of the issues that needs attention in VR visualization.Some techniques such as clustering and neural networks, reducing the number of dimensions can be of great use for efficient systems.In one approach ANN was used for creating VR space (Valdés, Romero, and Gonzalez 2007;Valdés 2003;2002).A VR space for the visual representation of information systems is defined as Υ =< O, G, B, m , g 0 , l, g r , b, r >.O a relational structure composed of objects and relation.G is a nonempty set of geometries representing the different objects and relations.B is a non-empty set of behaviours such as walking.m metric space of dimension m.(g 0 , l, g r , b, r)is a collection of characteristic functions for selecting which of the original relations will be represented in the virtual world.The construction of VR-spaces representing symbolic knowledge in the form of production rules is applied to make corresponding spaces.VR space helps to identify the general rules from specific rules.NN is used as a space mapping function to produce high quality VR spaces.Data and symbolic knowledge representation in virtual spaces can be performed using Samann and nonlinear discriminate networks for unsupervised and supervised mapping to low dimensional feature spaces.Virtual space can also be utilised for producing perceptual maps through texts based on questionnaires.In addition to this, it finds innumerable applications in the specialization of gene expression (Valdés, Romero, and Barton 2012) and in geophysical prospecting data (Valdés, Romero, and Gonzalez 2007).

Speech recognition or synthesize voice
Real time speech driven face animation with expressions is extremely advantageous whilst creating a virtual environment as it helps in the understanding of human emotional perception through the use of virtual avatars.Hong et al. (Hong, Wen, and Huang 2002) delineated how a speech driven talking face for an individual is created.In this paper, first the audio-visual database was created.DARPA TIMIT speech database was chosen for sentence selection and was defined without facial expressions and with facial expressions(Sad and Smile) .MLP was trained for expressionless face and faces that contain expressions.For expressionless faces, Mel-Frequency Cepstrum Coefficient(MFCC)was an audio feature vector which was taken as an input in MLP.MLP trained the audio for visual mapping and used a three layer perceptron with 25 hidden layers.It estimated a visual feature and its contextual information.For building maps for speech driven expressive and talkative faces, the MLP was retrained for each input.This was done through a two step mapping.First, it mapped speech to the Utterance Motion Unit Parameter(UMUP).The second step was to map the estimated UMUPs to UMUPs and Expressive MUPs.In another application like speech fingerprinting ,ANN can amplify the process by taking part (Matrouk et al. 2014).ANN makes it possible to recognize an isolated word, person and word-person with a high recognition ratio.

Walkthrough or Navigation
Virtual Navigation or walk-through is helpful in exploring building or cities and finds application in real estate, hospitals, education (fig.6).ANN plays an important role in predicting the number of paths during wayfinding in the environment.This technique uses an amalgamation of ANN and the Point Distribution Model.This model depicts the number of paths that are taken by the ANN as an input.The output is provided in the form of the maximum likelihood measure to enable the selection for the desired path.It helps in designing the walkthrough and in predicting the users moves in VE.Additionally, it predicts the performances of the virtual navigation on the basis of the collected psycho physiological data (Courtney et al. 2013).On comparing Multiple Linear Regression (MLR) and ANN for prediction, ANN is found to be better in terms of prediction performance.Consequently, it is used in designing a spatial cognitive paradigm with psycho-physiologically-based adaptive systems.

Animation
Animation involves interplay of avatar movements, virtual object appearances and varying contrast.The use of ANN in animation is shown through the examples mentioned below.ANN enhances the contrast of a low vision virtual environment by using a HMD.It uses MLP with logistic sigmoid activation and is trained by the scaled conjugate gradient method (Everingham, Thomas, and Troscianko 1999) and artificial neural network classifier within the framework of a Markov Random Field model (Everingham, Thomas, and Troscianko 2003).The visual experience can be elevated to make VR more immersive.This technique can become a mobility aid for people with severe visual impairments.Such people can identify objects from the shown images.It is possible to animate a human avatar in VR using NN.An approach defined in kinematics is useful in avatar animation wherein NN calculates the precise joint angle which leads to obstacle free walking (Awang and Shamsuddin 2006;Amin and Earnshaw 2000).

Virtual creature controller
ANN can act as a controller by wielding the laws of energy and physics.In one paper, as described by Hambli et al. (Hambli, Chamekh, and Bel Hadj Salah 2006), ANN was used to calculate the structural deformation of objects in a virtual environment.This was demonstrated by taking an example of a tennis racket and ball.The input data for the artificial neural network were the kinetic properties of the ball (impact velocity) and the output data was the resultant structural deformation in the racket.The controller can also make it possible to give commands to the virtual character such as turn left, walk etc by controller.In one approach Marks et al. (Marks et al. 2006) used three types of virtual characters; monoped, biped and quadruped .Neural network processes the sensory information from the body and from the environment of the virtual character and provides controller information as output to the actors.To paraphrase, motor action can be said to be controlled by the sensory feedback of vir-tual characters.One model proposed by Iwadete et al.(Iwadate et al. 2011) realized the adaptive behaviour (walking behaviour) of a virtual creature, an autonomous agent, to reach a destination and to avoid obstacles and other creatures.They used a combination of Central Pattern Generator (CPG) and ANN (particle swarm optimization) as a controller.Not only could the virtual character exhibit animations such as walking or following movement commands, they could also reproduce and have their own virtual ecosystem.To generate behaviour in a virtual character, Continuous Time Recurrent Neural Network (CTRNN) is used.It uses the brain plasticity model to learn from the environment (emergentist approach).Its objective is to develop a neural network with growing topologies that can simultaneously give rise to multiple characteristics in an agent (Nogueira et al. 2013b).It is also possible to enable reproduction in a simulated environment which increases the life span of an agent (Nogueira et al. 2013a).A virtual ecosystem was created in which locomotion and food searching behavior is controlled using Recurrent Neural Network (RNN) (Ouannes et al. 2012).It used the sigmoid bipolar function.RNN has an additional benefit over the traditional feed forward NN in that it has memory.Here memory indicates a time at which an event occurs.

IV. CONCLUSION
In an era when niche areas of cutting edge technological research are capturing the public imagination and moving out of the laboratory into everyday life, there is a broad impetus that can be the key to dramatic progress.Research into Virtual Environments on one hand and Artificial Neural Networks on the other has largely been carried out by two very different groups of people with different preoccupations and interests.While VR has been explored with the aim of creating even more refined environments for training and simultaion, ANN has been a nackbone for various studies such as genetic studies, behavioural analysis, speeech recognition and computer vision.Both VR and ANN , in their respective scopes, have emerged and expanded to great depths today, with several extensions and applications.While the usage of their constructs had majorly been independent initially, over the last few years,the inevitable convergence between the two fields has come to light.VR provides for a nearreality experience with minimum risks and cost, while ANN has the advantage that it can handle the non-linear characteristics of lag data sets and is self-adaptive.Thus, when integrated, VR and ANN can be used to develop powerful systems capable of performing a wide range of applications.
This paper reviews the advantages and the issues arising from combining the two in order to develop an intelligent virtual environment.In addition to this, it also addresses the issue that lies in the recurrent need for a different algorithm for every type of application by exploring the concept of an optimized algorithm that could possibly adapt to the application being developed.Further, modifications to ANN, such as those brought by the implementations of BP, RNN and MLP were identified to be promoters of a higher degree of cohesiveness between the user and the environment.The instances of ANNs being deployed in VR frameworks have also been delineated that include facial expression detection, human body tracking, face detection, data visualization and speech recognition amongst others.Clearly, beyond a mere juxtaposition of the constructs offered by the VR and ANN, a dense interplay of the two in the form of well integrated models open up a whole new domain for exploration and problem solving.Henceforth, the synthesis of the two fields builds an environment typically characterized by a better, responsive and stimulated tracking and analysis of events that in turn shall act as an overhaul to the existing practices and conventions.
Parsons and Rizzo 2008; Parsons et al. 2008; Riener and Harders 2012; Cameirão et al. 2010; Schultheis and Rizzo 2001; Laver et al. 2012; Saposnik et al. 2010; Rizzo and Kim 2005; Seymour et al. 2002; Reger et al. 2011; Zyda 2005).The following section provides an introduction to ANN and its role in various applications of VR.In order to develop such an application, the selection of an appropriate algorithm is very crucial.ANN is used in the present work because of its adaptability and connectivity.For example, gesture recognition uses Probabilistic Neural Network while face recognition uses the multilayer perceptron.