Stereo Educational Game with Vision Based Interaction in Virtual Environment

Interaction and immersion are crucial to educational game quality. Thus, a vision based marker location interaction and stereo rendering method for the game are proposed. This approach could support the players to have more intuitive interaction and more immersion feeling. First, a general architecture of the educational game based on computer vision marker localization interaction was presented. Then, a new method to vision based marker localization and identification interaction was studied. The basic idea of this interaction is using a static camera to track the position and pose information of the marker in the handheld device. So the game system can judge the operation purposes of players such as pick up or put down a virtual objects based on the position and pose information of the marker easily. The experiment result shows this interactive method has a high tracking accuracy. Furthermore, rendering module of the game is designed by using a stereo rendering method, so it can produce stereo vision for players. Last, a feeding animal game for children is realized to verify the presented approaches. Children can carry different food to different animals by a handheld device with marker so that they can learn what food the animals like. The running result illustrates that the presented approaches are effective, and they can provide a natural interaction for game in virtual environment.

Abstract-Interaction and immersion are crucial to educational game quality.Thus, a vision based marker location interaction and stereo rendering method for the game are proposed.This approach could support the players to have more intuitive interaction and more immersion feeling.First, a general architecture of the educational game based on computer vision marker localization interaction was presented.Then, a new method to vision based marker localization and identification interaction was studied.The basic idea of this interaction is using a static camera to track the position and pose information of the marker in the handheld device.So the game system can judge the operation purposes of players such as pick up or put down a virtual objects based on the position and pose information of the marker easily.The experiment result shows this interactive method has a high tracking accuracy.Furthermore, rendering module of the game is designed by using a stereo rendering method, so it can produce stereo vision for players.Last, a feeding animal game for children is realized to verify the presented approaches.Children can carry different food to different animals by a handheld device with marker so that they can learn what food the animals like.The running result illustrates that the presented approaches are effective, and they can provide a natural interaction for game in virtual environment.

I. INTRODUCTION
With the rise of commercial games that are directed toward children, interest in the educational potential of computer games has been generated.Much research has focused on how to better bring commercial games into the classroom.[1][2][3][4][5][6][7][8].
Although these work come to a fruitful conclusion, most of them require a simper and more economical interaction to attract players.The first thing to be sure of is the definition of game and its core.As to definition of game, Chris Crawford thinks there are four common essential factors: reappear, interaction, conflict, and safety [9].As for the educational games, especially for the children, an easier-to-use and intuitive human-computer interaction plays a crucial role.
Taking Nintendo Wii which is one of the most popular game consoles as an example, the chief reason for its success is that it applied a new device called space location which provides player a better and more natural interaction way in comparison with the traditional game consoles such as PS3 and Xbox 360 [10].
The core of excellent game interface design is natural and good human-computer interaction.Players should enjoy the convenience of human operation really [11].It is a pity that the traditional mouse and keyboard interaction are widely used by most games.They make player feel inconvenience to play especially in virtual reality games, because it is difficult for player to grasp and move the 3D virtual object in the virtual environment only by mouse which itself is a 2D device or keyboard which is not an intuitive approach for players.Although using data gloves is proved to be the most effective approach because it can achieve powerful and natural interaction between player and computer such as providing 3D control which makes player feel easy to grasp and move virtual models, it is cumbersome and awkward for players to wear with data glove's wireless components and too expensive for regular players for its high price [12].Another new eye mouse interface is used to operate a computer using the movement of the eyes.This eye-tracking system is for eye motion disability rehabilitation [13,14].Computer vision based method attracts more and more attentions relying on its clear superiority.Since mechanical devices such as mouse, keyboard, and gloves are not required, the method is totally non-contact which can provide players with more natural and unencumbered interaction experience.Furthermore, the cost of this method is very low because it only needs camera and printed marker to track and locate the marker [15].
Compared with 2D viewing, 3D stereoscopic vision provides players with higher comprehension of remote environments [16].3D stereo game will be key factor that molds the future development of Sony, Simon Benson who is senior executive of Sony's 3D development team said in an interview.In Simon Benson's opinion, there is a possibility that 3D stereoscopic will bring game industry in the future graveness replace.Bryan Del Rizzo, spokesman of NVIDIA, said that 3D stereoscopic will grow rapidly in popularity [17].And constraints in using stereo rendering, like real-time refresh rates, are overcome with the development of hardware such as faster processors, memory, bus, and so on [18].
Previously there is interaction system constructed in the form of virtual reality game using a stereo vision system [19].
In this paper, we investigate a marker location technology based interaction and stereo rendering approach for educational game which aims to provide a more efficiency and natural interaction and immersion feeling for players.Marker tracking and location method for the virtual reality game can provide a well real time performance.Our idea is to acquire position information of the handheld device by the special marker which is attached on the device.World coordinate system is defined at first.And then we calculate the transformation matrix of marker.Last, the position information can be obtained.In order to evaluate tracking accuracy of the game based on interaction approach using marker location technology, experiment data is collected and analyzed.After that we design the rendering module of the game based on stereoscopic display.Based on above investigation, a marker location stereo game named feeding animal is realized.This paper is organized as follows.Section 2, we will introduced the architecture of our game system.And then approach of marker tracking and location will be illustrated in section 3.This is followed by elaboration of our stereo rendering method in section 4. The next step is the implement of the vision based marker location stereo game of our method in section 5. Finally, we will draw some conclusions in section 6.The rest of paper will describe our method to design and develop a competition game in detail.

II. GAME ARCHITECHTURE
As shown in Fig. 1, our marker localization based stereo vision interactive game contains two sections: hardware section and software section.The hardware section in the left of the figure shows that a player is interacting with system by a handheld device.The input device in the hardware section is a camera which is fixed on the top of the operation desk and is applied to track the marker attached on the handheld device.The software section in the right demonstrates the control flow of the game system.It consists of ten components: (  Before modules mentioned above begin to work, there is an initialization process of the game.Software library such as OpenGL, OpenCV, and ARToolKit are initialized in the first period.Then the hardware devices are driven to start work, in which camera is the main one.If those steps can be done successfully, marker recognition module will be instantiated.Sound effect module initialization will come next.After that, 3D virtual models will be loaded and stereo rendering module will begin to render some models and pictures.And then the system will wait for players' input by handheld devices.
Primary procedure is designed as shown in Fig. 2. In the first step, we define twelve positions, among which ten positions are stationary and the rest two are mobile.The stationary ones indicate positions to render 3D virtual models that stay put, eight of which are set as right 'dump site' to put model and the other two are set as model producing area to supply models and each of the two players can control one half of which.The two mobile positions are prepared to render models that mapping handheld devices.
Then the game will wait for players' input to start up.If game begins, virtual objects in game will be rendered, camera will begin to work and marker attached on the handheld device will be tracked.Coordinates and angle information are obtained after that.Player's intention will be judged whether he want to move a model to the marker or put down his current model.When the marker has entered corresponding area, current model from producing area will be moved to the surface of the marker.After original model is switched to the marker from producing area, a new random model will generate in place instantaneously.The angel information monitored is used to decide if the player prepares to throw his model when there is a model on the marker in the process of marker tracking.When the angle of marker is greater than a threshold value which we set as fifty degrees, the system immediately define current position as player's 'dump site' and judge if it is accord with rules or right.Suppose coordinates of player's 'dump site' is P 0 (x 0 , y 0 , z 0 ) T and right 'dump site' are Pi (x i , y i , z i ) and the distances between P 0 and P ii are d i .(i=1,2, 3, 4.) Then di will be calculated as follows: If d i are far greater than max-threshold, we take it as an illegal operation.After a model is put in one of the four target locations, the operation will be still judged.Specifically, we determine it like this: At first four distances d 1 , d 2 , d 3 , d4 are calculated and compared.We take the position whose distance from P 0 is smallest as the player's choice.And if it matches model on the marker, we take it as a right operation, otherwise, we take it as a mistake.

III. MARKER LOCALIZATION
In order to implement real time interactive accurately between players and computers, it is the key to solve the problem of tracking movement of marker and locate it.Therefore, it is important to calculate position information of marker and then track and locate the marker.The procedure will be described in detail as follows.

World coordinate system definition
As is shown in Fig. 3, there are three coordinate systems in our system: world coordinate system, camera coordinate system and marker coordinate system.World coordinate system is defined at a fixed point on the operation desk; camera coordinate system is also fixed and defined at the front of the camera; marker coordinate system is defined at the center of marker which follows the motion of marker.Firstly we will calculate transformation matrix from world coordinate system to camera coordinate system.
After rotating on an axis that does not coincide with coordinate axes, Camera coordinate system can coincide with world coordinate system, so we can obtain camera coordinate system by compounding of translation and rotation.Assume this transformation matrix is: Where, R is the rotation matrix and P is the translation vector.If coordinates of a point under world coordinate system and camera coordinate system are P c and P w , we can express translation vector as follows: P=P c -P w (3) The next step is to calculate rotation matrix.Here we choose Euler method.Definitions of α, β, and γ are shown as Fig. 4. α, β, and γ are angles of X, Y, and Z axes between world coordinate system and camera coordinate system.Rotation matrix of world coordinates relative to camera coordinates rotating in the order of Z-X-Y is as follows [20]: Where, R 00 =cosαcosγ-sinαsinβsinγ; R 01 =cosαsinγ-sinαsinβcosγ; R 20 =sinαcosγ-cosαsinβsinγ; R 21 =sinβsinγ-cosαsinβcosγ.In (4), α, β, γ, are measurable, so the transformation matrix of world coordinate system can be obtained.

Transformation matrix between marker and world
coordination system Suppose translation matrixes of marker coordinate system relation to camera coordinate system and world coordinate system are B and C.
On analysis, matrix B can also be seen as a compound of a translation matrix and a rotation matrix.The difference is that marker is moving, so its transformation matrix changes in accordance with it.And of course we can access this matrix because there is a marker that the camera can recognize.Here matrix B is calculated in real time by ARToolKit which is excellent tool for detecting marker position.
If coordinates of a point in world coordinate system, camera coordinate system and marker coordinate system are (x w , y w , z w )T, (x c , y c , z c )T, and (x m , y m , z m )T, it follows that: (x c , y c , z c , 1) T =A(x w , y w , z w , 1) T  (5) (x c , y c , z c , 1) T =B(x m , y m , z m , 1) T (6) (x w , y w , z w , 1) T =T(x m , y m , z m , 1) T ( 7) Where, T is transformation matrix of marker to world coordinate system.
It follows from this: (x w , y w , z w , 1) T =A -1 B(x m , y m , z m , 1) T (8) Based upon previous work, transformation matrix T can be calculated as follows: T=A -1 B (9)

Obtain position information of marker
Position information of marker contains marker's coordinates under world coordinate system and its angle with X axis.Coordinates of marker is used to calculate distances between marker and target positions, and player's intention can be recognized by angle of marker.
If we set its coordinates as M= (a, b, c) T and its angle with X axis as θ, from above calculation, we can draw: M= (t 03 , t 13 , t 23 ) T (10) θ =arccos (t 22 ) (11) Where, t 03 , t 13 , and t 23 represent the previous three elements in the fourth column of matrix T, and t 22 represents element in the third row and the third column of matrix T.
Then the marker we set can be tracked and located by position information.

Accuracy Evaluation
In order to verify the tracking accuracy of the marker localization approach, we have carried out experiments which generated position data calculated of marker in world coordinate system by above approach.We have also measured real position data of marker with measuring tools.Then comparison of the two data showed whether the approach could acquire precise enough coordinates of marker for our game.In this experiment, we set Z-value of marker as four hundred mm which is in line with the ordinary people.Three data were acquired: X-value, Y-value, and angle of marker, which were shown as Fig. 4-6.
In Fig. 4-6, x 0 , y 0 , and θ 0 represent measured values of marker's coordinates and angle.Accordingly, x, y, and θ represent values calculated by computer using our approach.Among these three data: x 0 -x, y 0 -y, and θ 0 -θ, another two are always set to 0 when one of them are to acquire.The ultimate data we want are Δx, Δy, and Δθ.They are called x-error, y-error, and angle-error respectively.These errors are obtained as follows: It is therefore clear that these three variables can indicate directly the level of precise.From the experimental data, we know the maximum error of X-value, Y-value, and angle of marker are respectively 9.24 mm, 7.66 mm, and 4.56 degrees.The average errors of them are 4.85 mm, 3.30 mm, and 2.92 degrees.
These errors include human error, environmental error not just because computing method we use.Human error we consider involve that our operations are not accurate enough, position of camera are not calibrated accurately, measuring tools can introduce error, the marker are not moved accurately, etc. Environmental error will be introduced by insufficient light and vibration of camera or marker.It is clear that these factors have much bigger impact on the error of marker's position and angle.Therefore, it is safe to assume that the maximum error of X-value or Y-value of marker introduced by computing method is lower than five mm.Similarly, it can be considered that maximum error of marker's angle is lower than two degrees.This means that these errors are not big enough to affect users to immerse themselves in the game.The experiment proved that our approach is effective and can ensure the precision requirement of the game.
Beside that, we can find there are some laws in the experiment data.Any of their errors previously mentioned grow bigger with their self-value and reach a maximum when their self-value are at the limit.The reason is, with the increase of X-value, or Y-value, the distance of camera and marker get bigger, so the sharpness of image containing marker captured by camera get lower.When the marker is put near the edge of camera's shooting range, the sharpness of image is the lowest and the errors of marker's position are the biggest.

IV. STEREO RENDERING
As is shown in Fig. 7, two eyes of human are side by side, with about 65 mm separation between.Therefore there is small difference between two scenes seen by left eye and right eye.It is called parallax which produces the binocular ability to perceive depth.Brain fuses two separate views seen by two eyes into one image.Naturally we can feel stereoscopic vision.Therefore the basis of 3D stereo display is to reproduce parallax artificially.Our approach is described as follows.

Projection plane
Focal length Fig. 7. Binocular stereo vision schematic Fig. 8 illustrates the procedure to draw back buffers and produce stereoscopic vision.In order to support stereo operation, the GLUT library should be initialized in the first step.Then the initial display mode such as GLUT_DOUBLE, GLUT_RGB, GLUT_DEPTH, and GLUT_STEREO is set separately, among which GLUT_STEREO is bit mask to select a stereo window.After that color buffers to be drawn into are specified and cleared.Here we use quad-buffered stereo mode, so the back-left color buffer and the back-right color buffer both need to be specified and cleared firstly.Any buffer of the two can be cleared before.Next, the two eye positions will be derived respectively [21].To determine projection transformation matrix, first we apply subsequent matrix operations to the projection matrix stack and replace the current projection matrix with the identity matrix.Then perspective projection transformation is selected because image created by this way has stronger sense of reality.The projection transformation matrix P is obtained as follows:  (15) where, A=(x 2 +x 1 )/(x 2 -x 1 ); B= (y 1 +y 2 )/ (y 1 -y 2 ); C= (z 1 +z 2 )/ (z 1 -z 2 ); D= (2z 1 z 2 )/ (z 1 -z 2 ).After that subsequent matrix operations are applied to the modelview matrix stack.Modelview matrix is also replaced with the identity matrix at first.Then view transformation is defined to modify the position and direction of viewpoint.Suppose coordinates of viewpoint and reference point denoting center of the scene are (x 3 , y 3 , z 3 ) T and (x 4 , y 4 , z 4 ) T, and up vector is (x 5 , y 5 z 5 ) T. It is supposed as follows: k=(x 3 -x 4 , y 3 -y 4 , z 3 -z 4 ) T ( 16) m= (x 5 , y 5 , z 5 ) T /||(x 5 , y 5 , z 5 ) Next, we can set the lights for scene to display in each eye screen and draw two back buffers in order.Finally, the contents of one of the two back buffers and the front buffer are swapped, and then another back buffer and the front buffer swap their contents.This procedure goes round in circles and stereoscopic vision is produced.Once wearing stereo eyeglasses, users will feel as though they are caught up in the game scene.

V. IMPLEMENTATION
As shown in Fig. 9-10, we developed an educational stereo game which is based on the marker localization approach.The game supports two players at the same time.The game is running in Core2 E4300 computer with GeForce 7300 graphic card and an inexpensive USB camera which runs at a frame rate of 30 frames per second.Player's handheld device is a plastic spade with a marker attaching on the surface.Two plastic spades are matched with two virtual spade model whose position and pose track real spades in real time.Animal models are all stationary in fixed position.There are two kinds of statuses of food models: one is that models stay at the setup area and update to next random model when current one is moved.The other status is on the markers, which are bound to markers and follow markers' movement.Players need to move virtual food models to coordinate animal model.The key screens in the game are shown in Fig. 11-Fig.13.Firstly, Fig. 11 shows the startup screen of the feeding animal game.Player will see the main interface shown in Fig. 12 after he presses the start button.When the game is over, it will show the game result at the screen shown in Fig. 13.

VI. CONCLUSION
An educational game in virtual environment based on approaches of marker localization and stereo rendering is developed.The computer vision marker localization approach provides the ability to calculate the position and poses of the marker in real time.Thus, it can be used to develop an intuitive human-computer interface (HCI) for educational game.The HCI based on the marker localization approach allows plays to pick up, move, and put down the virtual objects.The stereo rendering approach presented produces stereoscopic vision for players.The equipments of the game are low cost and easy to obtain (include one static camera, two plastic spades, a pair of stereo glasses and a computer with windows XP system).The implementation result demonstrates the effectiveness of our approach.Marker localization interaction and stereo rendering method elaborated in this paper can be used to develop other kinds of games, especially educational games for its superior interaction and immersion.Although our method provide an intuitive HCI in comparison with the traditional mouse and keyboard, player still need to use a special device to interact with the virtual objects which might let player feel uncomfortable.To avoid above backward, future work will try to allow player to interact with virtual objects by their hand directly.

Fig. 9 .Fig. 10 .
Fig. 9. Players with stereo eyeglasses are playing the game by devices with markers.