REGISTRATION OF A 3D MODEL ON A SINGLE IMAGE IN WIDE ANGLE AUGMENTED REALITY

This paper presents a new design of a wide angle augmented reality (AR) system and an effective registration method using a single image. The AR system is designed to meet three requirements: (1) the field of view (FOV) of the AR system must be wide enough to view reference constructions in outdoor scenes; (2) the precise registration of virtual objects should be guaranteed at any position of the image; and (3) the registration should be valid after motion of the camera. We use camera calibration and calibration-free methodologies in our approach. The internal parameters of the camera are determined roughly by the camera calibration and how the distortion coefficient of the camera lens is refined. The external parameters are determined by the calibration-free approach. At least four coplanar points forming a projected parallelogram are needed to construct the affine coordinates that are used for the registration. The only presumed condition is that the user specifies some projected parallelogram lines of the reference object interactively. The experimental results demonstrate the display prototype of the wide angle AR system with the distorted images removed and registration of 3D virtual models in static and dynamic camera environments.


Introduction
Augmented reality (AR) is a rapidly growing research area of virtual reality. After publication of the concept [1] and definition [2] of AR, many research results have been published recently with many applications. However, registration remains a major problem, and it is closely related to the components of the AR system.
At the component level of AR systems, several kinds of 3D sensors (magnetic sensors, mechanical sensors, optical sensors, and hybrid styles) are used to increase the precision of registration [3]. Another important component of an AR system is a display device. Previous researchers have used two kinds of display systems: the see-through head mounted display (HMD) and the video see-through HMD [4]. The field of view (FOV) of a single eye covers 75.3 degrees and the overlapped region of both eyes is over 60 degrees [5]. However, the FOV of the camera could not achieve these angles. In the case of the seethrough HMD with a camera sensor, the location gap between the camera and the eye is a critical problem [6]. In the case of the video see-through HMD, the HMD's FOV is limited to the camera's FOV.
Camera calibration has been developing as a major issue for several decades. Most previous research into camera calibration has employed a narrow angle lens, necessitating auxiliary equipment to achieve precise camera parameters [7].
In calibration-free approaches a registration method using projection invariant properties has been introduced [8], [9]. Kutulakos presented a registration method that used reprojection and reconstruction properties with four non-coplanar points in affine space from two reference images [10]. The approaches showed the feasibility of registration, but overlooked the internal parameters of the camera.
Accordingly, the results are not applicable to our wide angle AR system, and that registration process is not suitable for our dynamic camera environment. The reason is that the affine coordinates cannot be constructed from one image.
In this paper we propose an AR system with a new concept of dual purpose video see-through HMD.
The system has a wide angle CCD camera, and the FOV is wide enough to view building structures in our campus. We have developed a method to solve the distortion recovery problem of the wide angle AR system and the registration problem using a single reference image. The only presumed condition is that the user specifies some projected lines forming a parallelogram on the known reference object interactively. We can register the virtual object at any position of the affine coordinates from the reference image. Section 2 of the paper contains the overall system configuration and the functional description. In Section 3 we formulate the distortion removal problem for wide angle lenses and describe our solution.
Section 4 describes the construct of the affine coordinates from a single image and the registration procedure for virtual objects. Section 5 describes the prototype display system, distortion removal results, and registration results for static and dynamic camera environments. In Section 6, we give our conclusions and present possibilities for future research.

Proposed AR System
In this section we describe our AR system configuration and propose a new concept for the display system.

FUNCTIONAL STRUCTURE AND PROCEDURE
The overall AR system is shown in Figure 1. The CCD camera photographs objects in the real world and sends the image to the AR subsystem and display device. In the AR subsystem the camera calibration is processed to determine internal parameters of the camera. The internal parameters are lens center, focal length, uncertainty factor, and distortion coefficients. The internal parameters remain valid until the focal length of the camera is changed. The distorted input image is recentered to the lens center and distortion is removed at this stage.
For tracking we interactively specify some projected parallel lines of a known reference object.
After the tracking process we construct the affine coordinates of the input single image in an alignment process. Then we determine the external parameters of the reference object using calibration-free methods. Next we select positions, orientations, and sizes of virtual objects that will be rendered. The selection is done only once, and the virtual objects will be registered on the affine coordinates precisely.
The registrations of virtual objects remain valid even after movement of the camera. In the rendering step the virtual objects are rendered with real objects in the image. Finally, the virtual object is projected on the display along with the real background image.

DUAL-PURPOSE DISPLAY SYSTEM USING A WIDE ANGLE LENS
The see-through HMD and the video see-through HMD represent traditional display systems. We propose a concept of a dual purpose video see-through HMD in Figure 2. This device can be a video see-through HMD if it employs a full mirror or a see-through HMD if it employs a half mirror. One advantage of the HMD is that the FOV of the camera and that of the eyes are geometrically equivalent.
In AR systems a narrow angle lens can avoid the distortion problem between the real object and the virtual objects that need to be overlaid. However, the narrow FOV is uncomfortable. That is why we adopt a wideangle lens in our AR system.

Distortion Removal for Wide-angle Lens
We use Tsai's non-coplanar camera calibration method for internal parameters [11], and we refine the distortion coefficient using an iterative algorithm. In this section we describe the distortion removal process.

CAMERA CALIBRATION
Camera calibration is a nonlinear minimization problem with a known procedure that calculates the parameters of camera rotation and translation with respect to the world coordinates.
Using Tsai's noncoplanar calibration method for a single camera [11], we determine the internal parameters: lens center (C X , C Y ), focal length f, uncertainty factor s x , and distortion coefficient k 1 .

REMOVAL OF DISTORTION BY ITERATIVE LEAST SQUARES
Most camera systems have radial and tangential distortion. Distortion models of lenses are well known [12]; we consider radial distortion only.
We use the Hough transform to handle a line. A property is that a line in the XY plane must be a point in the ρθ plane. ρ is the shortest distance from the origin of the XY plane to the line and θ is a counterclockwise angle between the x-axis and the closest point. The line equation is The sinusoidal equation for i p in the ρθ plane is cossin, iii xy ρθθπθπ =+−≤≤ From the two sinusoidal curves, there is one intersection point, because two points in the XY plane make two sinusoidal curves and the curves intersect at a point in the ρθ plane. Hence, these intersection points are measured. Measured points in the ρθ plane are Errors at each point and total errors at stage j are defined by ( ) The minimum variance of the Hough space is represented and verified to a line in the XY plane. The initial k 1 value is from the result of Tsai's internal calibration. Next, the k 1 value is selected using a divide-and-conquer algorithm. Hence, the problem is to find the distortion coefficient that minimizes E.

Registration of a Virtual Object
The first step of registration is to construct the affine coordinates from a reference object volume.
We can generally view many parallelogram points on the construction, using these parallelograms with an interactive method for recognition and correspondence problems. The second step is to register a virtual object in the affine coordinates of the reference object.

FORMULATION
We use camera coordinates for the registration process in Figure 3. We have 3D models that we want to superimpose, and we assume that the length, the height, and the depth of the reference object are already known.
We can therefore measure the vertices of the reference object (x R , y R , z R ) T in world coordinates. The virtual object's points (x V , y V , z V ) T are given in the 3D models, and the projected points (x', y', 0) T of objects are tracked in the camera image. We also have the projection information P as a result of the camera calibration process.

MEASUREMENT
We can select one, two or three orthonormal parallelograms of the known reference object according to the viewing direction in Figure 4. There are four cases of parallelograms of a reference according to camera position. The four cases can be regrouped according to the number of faces. We can generate the affine coordinates for each case.

CONSTRUCTION OF THE AFFINE COORDINATES
A 3D line L and its projected 2D line L′ are represented in parametric form [13]. We know the size of the reference object from our assumptions. Let the half-length along the x-axis be width x λ , y-axis be height y λ , and z-axis be depth z λ .

Three measured faces
Let q 1 , q 2 , q 3 be the three center positions of the measured faces. The three projected points are determined simply by equation (1) from the three measured surfaces. Also, the 2D center position O R of the object volume is obtained from parallelogram points p 5 , p 6 , p 3 , p 4 by equation (1). We now know all 2D positions of the base vectors including the center position and their 2D lengths.
The direction cosines are determined from two projected parallel lines. Let L' 1 , L' 2 be the two lines.  The direction cosines ( ) T b b b 13 12 11 , , of B 1 are determined with three parallel pairs of lines by equation (2). The half size of the x-axis length of the reference object is the length of B 1 , which is known from the given condition. If the two projected points are observed at ( ) (  ) T   T   v  u  v  u   2  2  1  1   ,  ,  , , the point ( ) T a a a 13 12 11 , , is given by the perspective projection equations. Owawawa ++   ===++   ++  (4) Direction cosines represent the orientation of the reference object directly. Let the angles between the x, y, z-axis of camera coordinates and B 1 , B 3 , B 2 be γ, β, α respectively in Figure 6. We can use the model's bases to represent the orientation. Hence, we can represent the orientation of the reference object with fixed angles in the camera coordinates. The rotation of the reference object with respect to the camera coordinates is defined by ,,,, The translation T i R and the rotation R i R of the reference object can be determined by the same method as for three measured faces case, except for the weights in the calculation of T i R . The translation T i R is defined by equation (4) with w 3 =0.

One measured face
One parallelogram can be measured by camera position. A vanishing point of the parallelogram is always inside the parallelogram in this case ( Figure 5(c)). The direction cosines of B 1 , B 2 , and B 3  The projected position (o 1 ', o 2 ', 0) T of the O R is determined with equation (6), and the base vectors can be determined with equation (7). Therefore the three bases of the affine coordinates from a reference image can be constructed:

REGISTRATION OF A VIRTUAL OBJECT
After constructing the affine coordinates, a virtual object can be registered with given scale S, given translation T i V , and given rotation R i V in the affine coordinates of the reference object at first frame once.
The virtual object is matched with the real object in the affine coordinates. The scale, the position and the orientation of the virtual object are invariant with respect to the real object after the motion of the camera. We suppose that the y-axis base vector B 3 has invariant size for this construction. The 2D size of B 3 at position E in the affine coordinates is the difference between the position E(a 1 , a 2 , a 3 ) and the other position F(a 1 , a 2 , a 3 +1). Let the projected positions of E, F be (u E , v E ) T , (u F , v F ) T respectively. Then the 2D size of B 3 at E is defined by The 2D size of B 3 can be determined at any position in the affine coordinates. Therefore, distances of the virtual objects from the camera can be comparable with the length in Equation (8). Also, the equation can be applied to the reference object for occlusion problems.

Experimental Results
The overall AR system was implemented on a Pentium PC, and the rendering part was also implemented on a SGI workstation to obtain the best rendering. All programs were developed using C and C++. The 3DS and SEG formats were used for virtual models. We used a Meteor board in the PC and a Kukje NTSC color camera that has 4-mm focal length and 45 horizontal degrees FOV after the distortion removal process. We constructed a prototype of the dual-purpose video see-through HMD (Figure 7(a)). An input image from the camera has 640 × 480 resolution and 24-bit RGB color.

Figure 8: Static registration process
The distorted image from a 4-mm wide-angle lens is used for the input image (Figure 7(b)). The recentered image with distortion removed (c) is reconstructed after the 12th iteration of the recovery process with total error E 12 < 0.03. The linearity of the longest line on the right hand side of the image can be clearly observed.
We apply the static registration process to a library scene on KAIST campus to supply an outdoor image. The library is far away from the camera in Figure 8 (a). We drew several lines (b) and performed the construction process for the affine coordinates. Then we selected a virtual library model (c) and registered it at (0,0,0) in the affine coordinates. The static registration results are shown in (d).
For dynamic registration in Figure 9, from the first input image (a), we registered a virtual model (b) at (3, 0.5, -1) using the same process as the static registration example. We registered the 'Dabotop' model, which is a famous pagoda in Korea. The registration position (3, 0.5, -1) of the virtual object was invariant in the coordinates after the camera was moved. Accordingly, we could keep the same position in the new input images (c), (d). The registrations yielded good results and are applicable to many other applications.

Conclusions
We have described a precise registration method for the wide-angle AR system, and proposed a new prototype of the dual-purpose video see-through HMD of the AR system. The AR system was employed to view building structures in our campus and it was verified the feasibility of the dual-purpose video see-through HMD.
In the static registration results, the re-centered input image with distortion removed was in good agreement with the computer-generated graphical object. The static registration results were precise and the virtual object was registered at any position we want in the affine coordinates.
We needed only one image for the construction of the affine coordinates in the registration process, which was applied to the dynamic camera environment. Furthermore, the affine coordinates were constructed from at least four coplanar points of a parallelogram. This was suitable for dynamic camera environments. As a result, the registration of virtual objects at any position of the affine coordinates could be realized.
Currently, we are attempting to estimate the quantitative precision of registration in the results. For further work, we are interested in the occlusion problems in dynamic environments, and in research in related areas.