A Motion-based User Interface for the Control of Virtual Humans Performing Sports

—Traditional human computer interfaces are not intuitive and natural for the choreography of human motions in the field of VR and video games. In this paper we present a novel approach to control virtual humans performing sports with a motion-based user interface. The process begins by asking the user to draw some gestures in the air with a Wii Remote. The system then recognizes the gestures with pre-trained hidden Markov models. Finally, the recognized gestures are employed to choreograph the simulated sport motions of a virtual human. The average recognition rate of the recognition algorithm is more than 90% on our test set of 20 gestures. Results on the interactive simulation of several kinds of sport motions are given to show the efficiency and interestingness of our system, which is easy-to-use especially for novice users.


INTRODUCTION
Motion-based user interface (MUI) is one kind of perceptual user interface (PUI), whose aim is to make human-computer interaction more like how people interact with each other and with the physical world [1]. As its name implies, MUI employs users' body motion directly to control different programs on computers. Compared with traditional user interfaces, e.g. keyboard, mouse, joystick etc, MUI provides an intuitive way for users to naturally interact with computers. The concept it embodies is to let computers adapt to human beings' interaction habits instead of asking people to comply with some fixed operation modes.
In the fields of virtual reality and computer animation, applications based on MUI are usually referred as performance animation, which means "what you perform and see is what you get as the final motion". Its core technology is how to reconstruct the expressive motions in real world and map them into the characters in virtual world [2]. Animation authoring by direct performance allows intuitive control of characters in the virtual world, which attracts great research interest in recent years. However, most of the existing performance animation systems [5][6][7] employ the complex vision-based motion capture system as their input device, which requires great professional Manuscript Received on June 10, 2011 E-mail: xiubo.liang@gmail.com skills to operate. As a result, they are not suitable for novice users with little experience. Furthermore, the vision-based systems often suffer from the problem of occlusion and can only work in a specified environment, which limits the use of such performance animation systems in everyday surroundings.
Motion sensors (including accelerometer, magnetometer, gyroscope etc) are considered as the ideal alternatives of vision-based motion capture devices for MUI in ordinary applications, because they get rid of the limitation of lighting and can be used in almost everywhere. As a typical application of MUI based on motion sensor, Nintendo Wii surges a revolution in human-computer interaction since its issuance. Many game corporations conform to the trend and release their own MUI console games gradually. Now days, this kind of games are very popular all over the world. However, one of the drawbacks of these systems is that the recognition algorithms can only detect human motions in a rough manner. For example, the swing of the arm and the slight twist of the wrist are both recognized as the swing action in the Wii tennis game. Apparently, such techniques cannot be employed directly to choreograph complex motions, such as sport motions.
In this paper, we developed a motion-based user interface to choreograph complex sport motions. In order to provide an easy-to-use system for novice users in everyday surroundings, we choose Wii Remote (with an accelerometer embedded) instead of the cumbersome motion capture system as our input device. The overall working process of our system is composed of three major steps. Firstly, the user performs some gestures in the air with Wii Remote. Secondly, our system identifies user's choreographing intents by recognizing the gestures with pre-trained hidden Markov models. Finally, the recognized gestures are employed to choreograph the simulated sport motions of a virtual human. Fig. 1 shows some examples to demonstrate how our system works.

II. RELATED WORK
The two main parts of this paper are a motion-based user interface and an interactive sport motion generation algorithm.
In this section, we discuss the related work on these two aspects.

Motion-based user interfaces for character animation
Computer puppetry is an intuitive way to map the movements of a performer to an animated character in real-time. Its core technology is how to transferring the observations of the motion capture sensors to an animated character whose size and proportion may be different from the performer's [3]. Johnson et al. developed a "sympathetic" interface to control virtual characters in an iconic and intentional manner. The input device in their system is a plush doll embedded with wireless sensors. Over 400 participants successfully used the system and offered positive comments [4]. Dontcheva et al. presented an acting-based animation interface for creating and editing character animation at interactive speeds. Their system creates the connection between the actor and the character through a motion capture system and a set of reflective widgets, which free the animator from the confines of a mouse, a keyboard, and a limited workspace [5].
However, puppeteering requires a specialized input device and a significant amount of expertise, which limits its range of use. In order to overcome the shortcoming, researchers begin to explore performance animation interfaces which allow users to control virtual characters with their body motion directly. Chai and Hodgins employed two video cameras and a small set of retro-reflective markers on actor's body to create an easy-to-use system for character animation. The low-dimensional control signals are transformed into full-body motions by constructing a series of local models from a motion capture database [6]. Similarly, Ishigakis et al. developed their system with the same device. The advantage is that they integrate the prerecorded motions with both online performance and dynamic simulation. As a result, their system can synthesize motions with both physical realism and user's personal style [7].
Recently, as micromechanical technology became mature, the size and price of motion sensors drop dramatically. Many researchers begin to employ them to develop new animation interfaces. Slyper et al. created an action capture system with accelerometers, whose readings are continuously matched against existing motion capture data with a sliding window, and an avatar is then animated with the closest matched motion [8]. However, their simple matching metric cannot extract the semantics of motions which may be important in some interactive applications. Xiubo et al. solved this problem with their pattern recognition approach [2]. Shiratori and Hodgins proposed a Wiimote-based user interfaces for the control of a physically simulated character through the motion of user's arms, wrists, or legs [9]. But their system can only deal with simple swing motions of arms and legs, while our system can recognize much more gestures and generate more complex motions.

Responsive motion generation for animated characters
A simple way of generating new motions is firstly breaking the whole motion clips into small pieces and then rearranging them according to the constraints from users' input or virtual environments [10][11][12]. However, natural body configurations can't be created with such methods when the new environment is quite different from the environment when capturing. Physically-based method is considered as an alternative of the above methods to generate natural motions. Honda Motor Corporation successfully controlled the movements of real robots while keeping balance using zero-moment-point (ZMP) [13]. Macchietto et al. demonstrated a real-time motion simulation system capable of keeping balance while tracking a reference motion and responding to external perturbations by changing centers of pressure (COP), which seeks to control the linear and angular momentum of the character [14].
There are a lot of researches focusing on controllers. The benefits of such methods are effectiveness, robust, and easy transfer to new topology. Yin et al. developed robust controllers tracking motion trajectories with balance strategy involved, which also responded to external pushes [15]. Shiratori et al. created controllers for the trip recovery responses that occurred during walking [16]. Lee et al. presented a dynamic controller to synthesize under-actuated 3d full body biped locomotion, which takes motion capture reference data to reproduce realistic human locomotion through real-time physically-based based simulation [17].
More and more methods of this kind are inclined to hybrid with captured motion data. Zordan and Hodgins presented a system to synthesize hit and react motions based on captured data, which used inverse kinematic for hit and controlled forward dynamics for react [18]. Arikan et al. described an algorithm to animate characters being pushed by external forces by picking and modifying recorded motions [19]. Shapiro et al. introduced hybrid methods which usually run with kinematic models but can change to dynamics to react to external force, and change back once reaction is over. [20]. Komura et al. modified motions to maintain balance in response to external influences in animating reactions for locomotion bipeds [21]. Zordan et al. incorporated unexpected impacts into a motion capture-driven animation system through combination of physical simulation and best plausible re-entry motion clips [22].

III. OVERVIEW
In the field of virtual reality and video games, users usually prefer intuitive and natural user interfaces to interact with virtual environments. Thanks to Wii Remote, our system can perceive the acceleration signals of users' natural gestures, which will be used to detect users' interaction intents. In order to generate the responsive animation, we captured a series of sport motions in advance and segmented them manually into distinct actions. As a result, the recognized gestures can be mapped to the separated actions. That is to say, the gestures are employed to choreograph sport motions. However, as the force properties of a gesture are different when the gesture is performed at different time, it will be uninteresting if we just replay the pre-captured motion. Therefore, we employ stunt motion simulation and responsive motion simulation techniques to generate new expressive motions according to the force properties embodied in the performed gestures. The schematic block diagram of our system is shown in Fig. 2.
The major algorithmic steps of our system are explained below: (1) Motion sensing. The real time acceleration signals are transferred from Wii Remote to our system through Bluetooth communication.

IV. GESTURE RECOGNITION
In order to bring good user experience, a gesture interaction system should meet three conditions: small training workload, high recognition rate and short response time. In this part, we describe the details of the gesture recognition algorithm to show why our system satisfies the above conditions.

Automatic generation of training samples
The traditional machine learning methods require users to gather a lot of samples for training. This process is quite boring and will greatly reduce users' interest in using the gesture interaction system. In this paper, we explored to find and validate the method of automatic generation of training samples. Kay pointed out that adding noise could improve the recognition ability of machine learning models under some conditions [23]. Therefore, we generate a series of new samples based on an original sample by adding noise. A motion sensing sample is essentially time sequence of 3D acceleration signals, which can be represented by a high-dimensional vector: ∈ is the 3D acceleration data at frame j . Then a new sample can be generated by equation (1): • Guassian noise. To generate random values satisfying the Guassian distribution with parameter b , whose mathematical expectation 0 µ = and variance 2 b σ = .
The control parameter for the noise adding algorithm is Signal to Noise Ratio (SNR), which is the ratio of signal variance to noise variance. The best value of SNR could be acquired by a series of experiments (See Part Ⅵ for detailed discussion). Fig.  3 shows some examples of the sample generation algorithm.

Preprocessing of training samples
When an action is performed by different users or even if the action is performed by the same user at different time, its speed and size may vary greatly. Therefore, we should normalize the training samples. First, filter out the redundant data at the start and end of the sample. Then, interpolate between any two frames to get a c2 continuity cubic spline and resample it to obtain the new sample of specified length. At last, normalize the amplitude of the sample with the normalization formula for random variables: Because of the multiplicity of the gestures, there is a great deal of irrelevant information in the samples and the main features are submerged in a great deal of useless data. Therefore, we employ the Principle Component Analysis (PCA) algorithm [24] to extract the key features of the samples, which are then fed to our machine learning model for training and recognition.

Gesture training and recognition
A Hidden Markov Model (HMM) consists of an underlying Markov chain having a finite number of states and a finite set of observation probabilities, one of which is associated to each state. The probability based mechanism of HMM makes it ideal for the processing of time series data. HMM was successfully used in the field of speech recognition [25]. Similar to speech, motion signals are also time sequence data. HMM should also have the ability to model our data very well. Therefore, we employ HMM as our recognition model.
There are two main forms of HMM: the continuous one and the discrete one. The former has more powerful modeling ability than the latter [26]. Therefore, we choose the continuous HMM for motion recognition, which can be represented as followings: where S is the number of states; π is the initial probability of Assuming that after preprocessing, a sample is represented by a feature sequence Once k λ is trained, the probability of the feature sequence O can be computed with Forward-Backward algorithm [25]. The gesture with the maximum probability in the K-length vector is selected as the recognition result (see equation (4)). V.

INTERACTIVE SPORT MOTION GENERATION
In this part, we present stunt motion simulation and responsive motion simulation to synthesize new sport motions based the captured data.

Stunt motion simulation
We propose a method to make changes to the velocity curve of each joint and achieve more faithful and smooth result. The only parameter for motion exaggeration in our system is acquired automatically from the force properties embodied in the acceleration signals. The more powerful movement the user performs, the more exaggerated the resulting motion will be. We use the average of the acceleration readings as the power parameter, denoted by B . The fade-in and fade-out principle is employed to process the velocity curve. Let ( ) where A is the displacement between the beginning position and the ending position. In general, ( ) F t can be a linear, quadratic, or exponential function etc. We choose the quadratic form as our exaggeration function, as the quadratic function has a definite turning point. This could be intuitively utilized to control the amplitude parameter for the exaggeration. Therefore, 2 ( ) F t at bt c = + + , where c is the constant value of ( ) F t at 0 t = which can be specified by the user and 2 (4 ) / 4 B ac b a = − is the parameter we get from the acceleration signals which corresponds to the maximum exaggeration of the motion. Then we can calculate a , b and the exaggerated velocity ( ) ke v t . Finally, the trajectory of each joint is obtained by the integration of the exaggerated velocity. Fig. 5 gives an example of exaggerated action and its reactive motion.
Angular velocity scaling is suitable to generate motions that rotate in air like flip and twist turn. Unlike the method presented by [27], which searches a fragment that can be repeated from a ballistic motion and copies it and blends the copy with the original motion, we modulate the rotation directly by scaling the angular velocity of the original motion. The rotations are described with quaternion. We first extract the angular velocity of the th j joint by the following equation [28]: where ( ) j i ω and ( ) j q i are the angular velocity and rotation of the th j joint at time i respectively, h is the time interval between i and 1 i + . Then, a constant scaling factor is used to scale such original angular velocity to obtain the exaggerated one. After that, the new angular displacement could be calculated as follows: where * ( ) j i ω is the exaggerated angular velocity of the th j joint at time i , and * ( ) j q n is the exaggerated angular displacement of the th j joint at time n . Fig. 6 shows an example of stunt motion generated based on angular velocity scaling.

Responsive motion simulation
We consider cases where no protective steps are taken. The whole responding process can be divided into two stages generally [29] [30]. Experiments in Physiology show that it takes the central nervous system about 100-250ms to act since the moment of hit. People behave greatly different in the two stages, and we treat the simulations separately. We call response in the first stage passive response, and the next active response, intuitively.

Passive response
When part of body is hit, we dynamically simulate its moving way. We model the body as connected rigid bodies, but with springs adhere to neighbor bones, since muscle mostly react to weaken the force coming from outside. Mia's work [31] also gives the suggestion in agree with our comprehension. We formulate it as a resistance factor s k which prevents the changes of body's original state 0 θ : where θ is the current joint angle, 0 θ the original, and s k the resistance factor.
The stiffness of each spring s k varies, and generally speaking, ones at the end are smaller than those near the torso, for example spring at the elbow and the one at shoulder. And, in our implementation, we not only exert impulse on those collision parts, but break the total impulse into more portions, to influence adjacent body parts according to their distance from the contact area. Formally, we write: [ ] ,where i d is the distance between the th i joint and the position on the character being pushed, m is the impulse amount and ω is a parameter that governs how fast the impulse declines. This is out of the same consideration with [32], and makes the hit more plausible.

Active response
The passive stage lasts a short time, and then the character will realize his own situation and make his active response, which is his real purpose. The passive response is relatively easy to simulate, but the active one is much harder considering thousands of different acts a real person may take. We simply assume the character wants to regain his initial pose.
Character's state, i.e. his pose, is represented by joints' configuration and the root's. Suppose current character's state is 0 s , and his initial pose d s . We will take our character from state 0 s to d s along a valid and nature trajectory.
An inertia-scaled PD-servo [18] is used in both steps, it is formulated as: where I is the inertial matrix of the outboard body for each joint, θ the current joint angle, s .To make it better, we also add balance consideration following the algorithm suggested by Wooten and Hodigins [33]. An offset is calculated for in hips and ankles as: where x is position of COM, d x the target, and p k , d k the offset gain and damping terms for hips and ankles respectively. The offsets are then added to d θ in equation (8) above. The Fig. 7 gives an example of the responsive motion simulation for hit and reaction.

VI. EXPERIMENTAL RESULTS
We developed a prototype system of motion-based control interface to choreograph sport motions with a Wii Remote, a Bluetooth device, a PC and a projector. The development kits are QT 4.5 and GHMM-0.7.0. To evaluate our system, we captured a series of sport motions, including walking, running, jumping, leaping, flipping, fighting etc. 20 gestures were designed as the choreographing orders (See Fig. 8).
To get the best value of SNR, we did a series of experiments on the recognition accuracy under different SNRs. It shows that result is best (over 95%) when SNR is about 4 (see Fig. 9). In the future, when using this module, we directly generate samples with SNR of value 4 which will insure good recognition accuracy Fig. 7. The process of responsive motion simulation. The process from left to middle is passive response and the process from middle to right is active response. We also did some experiments on the flexibility of the recognition algorithm. Acceleration samples were generated with the best SNR acquired in the last experiment. The average recognition accuracy is over 90% for various magnitudes of the same gesture (see Fig. 10). That means our algorithm is robust to the size of the gestures performing by users.
For sport motion generation, our system can create not only the truly simulated motions, but the vividly exaggerated ones based on pre-captured motions. That is to say our system can synthesize the artistic motions that go beyond the ability of humans and cannot be captured from actors. For example, when doing motion capture, our actor can only do the forward flip and twist turn with one round, while the user gives the gesture commands of "front flip twice" and "circling twice". In such cases, our system still works well in converting user's choreographing intent to the resulting motions due to the stunt motion simulation techniques. See Fig. 11 and Fig. 12 for two examples of the synthesized results from our system.

VII. CONCLUSION AND FUTURE WORK
In this paper, we present a natural user interface for the control of sport motions based on user's live performance. The subjective feedback from the users during our experiments shows that it's very attractive to them. The physical and artistic motion simulation techniques promote the interestingness of the system greatly.
The main technical contributions of the paper are as followings: • We present a novel approach to choreographing sport motions by live performance captured with Wii Remote, which can be used in everyday surroundings. • Thanks to the noise-adding mechanism, motion recognition in this system is user-independent and the online training is not required. • Our system can generate new expressive motions by stunt motion simulation and responsive motion simulation techniques, which is quite attractive to novice users. However, there are some limitations of the current method. As there is only an accelerometer in the Wii Remote, the acceleration readings are in the local coordinates of the device. If user holds Wii Remote in a different orientation when making the gestures, the recognition accuracy may decrease significantly. Furthermore, it's very difficult to segment the acceleration signals into different actions automatically. Therefore, we ask the user to push a button at the beginning of a gesture and release it at the end of the gesture. In the future, we will try to overcome the problem by integrating more motion sensors (such as gyroscope and magnetometer) to our system. Another drawback of our system is that the scaling-based motion exaggeration method may generate some artificial results in some cases. In the future, more physical constraints such as balance-control may be taken into consideration to ensure appropriate physical plausibility in the exaggerated motion.
It's our hope that this method and its future variations will play a significant role in creating expressive motions with a more intuitive and general way.