NAVIGATION AND ILLUMINATION CONTROL FOR IMAGE-BASED VR

Simulation sickness in virtual reality applications is usually due to the non-realtime display of virtual environment. Realtime display of arbitrarily complex scene is a hard problem in traditional geometry-based computer graphics. Image-based modeling and rendering (IBMR) provides an alternative approach whose rendering time complexity is independent of scene complexity. However, due to the lack of geometrical information, capabilities that are obvious to geometry-based virtual reality become difficult problems in image-based virtual reality. In this paper we discuss how two fundamental capabilities, navigation and illumination control, can be achieved in image-based virtual reality applications. To navigate an image-based scene, we need to know where pixels should be moved to and how to solve their visibility. While correspondences or optical flows can answer the former question, the visibility is more difficult to answer, as depth information may not be available. Deriving from epipolar geometry, we propose a triangle-based visibility-ordering algorithm, which can correctly resolve the occlusion without depth information. To control the illumination, we propose a new image representation that not just allows navigation but also re-rendering under various illuminations. By treating each image pixel as an ordinary surface element, we measure the apparent BRDF of each pixel from reference images. By manipulating these apparent pixel BRDFs, we are able to re-render (change the illumination of the scene in an image) without any geometry information. Even shadows can be correctly re-rendered.


Introduction
The inability of realtime rendering of arbitrarily complex scene is one of the major causes of simulation sickness in virtual reality applications. Traditional geometry-based computer graphics requires significant amount of time to render complex scenery due to the dependency of the rendering time on scene complexity. Even with the state-of-the-art graphics accelerator, realtime rendering is still far from satisfactory. Image-based computer graphics provides an alternative to render complex scene within a short period of time. It is especially useful in virtual reality applications to prevent simulation sickness. Several image-based approaches [15,12,8,18] have been proposed in recent years. One well-known example of using image-based computer graphics in virtual reality is Apple's QuickTime VR [3]. However, due to the lack of geometrical information, capabilities that are obvious in geometry-based computer graphics, such as navigation and illumination control, become difficult in image-based computer graphics. In this paper we describe how navigation and illumination control in image-based computer graphics can be solved without knowing the geometry of the scene.
To navigate (i.e. generate an image from a n ew viewpoint) in the virtual environment represented by a set of reference images, there are two sub-problems to solve. The first is where pixels should be moved. It can be solved by pixel reprojection [4] if depth is known or by correspondence determination [6] if depth is unknown. The second sub-problem is the visibility problem: of determining which pixel is in the front if multiple pixels move to the same position in the new image. The most straightforward method is depth buffering. However, in some cases depth information may not be available or accurate. This is especially common for real-world photographs.
In that case, only correspondences or optical flow information can be determined, and visibility cannot be solved by depth buffering. McMillan [15,14] proposed a clever solution to this visibility problem. The visibility is solved by drawing pixels in a specific order. Mark et al. [13] and Shade et al. [19] applied this pixel-based drawing order to resolve visibility in their works. Unfortunately, the algorithm is time-consuming, as it can only draw one pixel at a time. It would be more efficient if neighboring pixels can be grouped together and drawn at once using existing graphics hardware. However, McMillan's algorithm does not apply to image entities larger than a pixel. In this paper we extend this pixel drawing order to triangles in order to accelerate the rendering.
Another important capability of traditional computer graphics is illumination control. Without geometry, changing the illumination is no longer obvious. Standard illumination models, such as Phong's model, are no longer applicable to image-based computer graphics. Several attempts have been made to re-render the image without knowing the geometry. Nimeroff et al. [16] re-rendered the scene under various natural illumination (overcast or clear skylight) with the knowledge of the empirical formula that model the skylight. Belhumeur and Kriegman [1] determined the basis images of an object with the assumptions that the object is convex, and all surfaces are Lambertian. The image can then be re-rendered using linear combination of these basis images. However, the illumination is uncontrollable, as the coefficients used in the linear combination are not directly related to the direction of light source. In the second half of this paper we describe a new imagebased BRDF representation, which not only allows the change of viewpoint, but also the change of illumination. There is no restriction on the shape and surface properties of scene objects as in previous approaches. Both indoor and outdoor illumination can be synthesized. Most important, the illumination is controllable.
Throughout this paper we concentrate on discussing the visibility and the illumination control of planar perspective images. For virtual reality applications, panorama is more appropriate. The discussion below can be trivially extended to cube-based panorama represented by six planar perspective images. The same algorithm can be applied on each of the six faces of a cube-based panorama.

Navigation
As mentioned earlier, navigation can be broken into two sub-problems. The sub-problem of moving pixels can be found in several previous works [4,11]. In this section we focus on the second sub-problem, the visibility problem.

EPIPOLAR GEOMETRY
Before describing the proposed algorithm, we first describe some basics of epipolar geometry. Consider a planar perspective image c I captured with the center of projection at c & . We use the overhead dot notation a & to denote a 3D point and the overhead arrow notation a r to denote a 3D directional vector. A desired image e I is generated with a new center of projection at e & . Figure 1 shows the geometry in 3D. From Figure 1 we know that 2 p & will never be occluded by 1 p & as viewed from e & no matter where the exact positions of 1 p & and 2 p & might be. Therefore if we always draw 1 i before 2 i during reprojection, the visibility problem is solved without knowing or comparing their depth values. Hence if we can identify those pixels whose intersection points may occlude each other and derive the drawing order, the visibility problem can be solved without depth buffering.
To identify the pixels that can occlude each other, we first intersect the epipolar plane with the planar projection manifold (image c I ). The intersection line is called the epipolar line. Figure 2 illustrates the terminologies graphically. When the positive epipolar ray ω r intersects with the projection manifold c I , the intersection point on the projection manifold is known as the positive epipole. Figure 2 denotes it by a positive sign. On the other hand, if the negative epipolar ray intersects with the projection manifold, the intersection point is known as the negative epipole and is denoted by a negative sign. Note that all epipolar lines pass through the epipole (either positive or negative). When the epipolar rays are parallel to the planar projection manifold, no intersection point is found on the plane. All epipolar lines are in parallel.
All pixels in c I that lie on the same epipolar line have a chance to occlude each other. i , then 1 i will never occlude 2 i . Hence we should always draw 1 i first. The arrow on the epipolar line in Figure 3 indicates the drawing order of pixels. On the other hand, if 2 i is closer to the negative epipole on the epipolar line than 1 i , then 2 i will never occlude 1 i . By intersecting all of the epipolar planes with the image c I (Figure 2), we obtain pictures of drawing order ( Figure 4). Note that once c & and e & are known, the picture of drawing order is already determined. It is not necessary to define the epipolar planes explicitly. Hence no depth information is required in constructing the drawing order.  Only three main categories of drawing order exist. If the positive epipolar ray intersects the projection manifold, a converging pattern will be obtained. (Figure 4(a)) On the other hand, if the negative epipolar ray intersects the projection manifold, a diverging pattern will result. (Figure 4(b)) If the epipolar rays are parallel to the projection manifold, the epipoles can be regarded as located infinitely far away and the epipolar lines will be all in parallel. (Figure 4(c)) McMillan [15,14] used these drawing patterns to solve visibility without depth buffering. Figure 4. The drawing patterns.
Pixels on different epipolar lines can be drawn in arbitrary order. However, the epipolar lines only tell us the ordering of pixel-sized entities that lie on the same line. If we group pixels to form larger entities (such as triangles) that overlap with multiple epipolar lines ( Figure 5), the ordering of them is not clear.

Figure 5. Larger image entities overlap
with multiple epipolar lines.

TRIANGULATION
To warp an image efficiently with triangles, we first have to form the reference image into a set of triangles based on the associated depth map or the map of optical flow. The basic idea is to group adjacent pixels with similar depth or optical flow. Since the depth has a scalar value, the depth map can be regarded as a special kind of heightfield. Several algorithms [17,5] have been proposed to "triangulate" the heightfield. If only the optical flow is available, the magnitude of the gradient of optical flow can also be used for triangulation. Figure 6 shows the result of triangulation.

ORDERING OF NEIGHBORING TRIANGULES
Given two arbitrary triangles, 1 t and 2 t , obtained by triangulating the image, we can check whether these two triangles may occlude each other by checking the range of epipolar lines the triangles occupy. If elements (pixels) in these two triangles share any common epipolar line, the ordering of these two triangles is relevant. On the other hand, if two triangles do not share any common epipolar line, the ordering of the triangles is irrelevant.
Instead of considering the order of any two arbitrary triangles from the mesh, we first consider the ordering between each pair of neighboring triangles that share a common edge, as in Figure 7(a). Now we shall show that the ordering of any two neighboring triangles can be determined by the position of the positive or negative epipole. Theorem 1 Given two neighboring triangles that are sharing a common edge, the planar projection manifold can be divided into two halves by extending the shared edge. The triangle with the positive epipole on its side should be drawn later during warping. On the other hand, the triangle with the negative epipole on its side should be drawn first during warping. If the epipole (either positive or negative) lies exactly on the shared edge, the ordering of these two triangles will be irrelevant.
Proof: Let's denote the triangle with the positive (negative) epipole on its side as n t and the other as f t . All epipolar lines on the planar projection manifold must be straight lines, and they must all pass through the positive (negative) epipole. Now we can draw a straight epipolar line starting from the positive (negative) epipole and passing through both n t and f t . Since the positive (negative) epipole is on the same side as n t whenever the straight epipolar line passes through both n t and f t , it should first pass through n t , followed by the shared edge and finally f t . (Figure 7(a)) Therefore whenever there are elements in n t which are sharing a common epipolar line with some elements in f t , the elements in n t should be closer to the epipole than those in f t . If the epipole is positive, all elements in n t are closer to the positive epipole than any element in f t . Hence no element in triangle f t will occlude any element in n t , and we must draw f t before n t during warping, denoted as fn tt → . On the other hand, if the epipole is negative, no element in n t will occlude any element in f t as all elements in f t are farther away from the negative epipole. Then we must draw n t before f t during warping, denoted by nf tt → . When the epipole lies on the shared edge (Figure 7(b)), one can always separate these two triangles by drawing a line, which coincides with the shared edge. In other words, no element in n t and f t shares any common epipolar line. It follows that their ordering is irrelevant and we denote this relationship as nf tt ↔ .
If the epipolar ray does not intersect with the planar projection manifold, the epipoles can be regarded as located infinitely far away. All epipolar lines are then parallel and pointing from the negative epipole to the positive epipole. We can still determine which triangle is on the same side as the infinite epipole by determining the direction of the epipolar line.

TOPOLOGICAL SORTING
Using the simple method described, one can always derive the drawing order of two neighboring triangles. This ordering can be further extended to cover any two arbitrary triangles from the mesh by constructing a drawing order graph. By representing each triangle as a node and the relation → as a directed edge in the graph, we can construct a graph of drawing order. No edge is needed to represent the relation ↔ as the ordering is irrelevant. Note that the constructed graph may contain disjointed subgraphs. Figure 8(a) shows seven connected triangles. The drawing order of each pair of neighboring triangles are shown as arrows crossing the shared edges between neighboring triangles. The constructed graph is shown in Figure  8(b). Figure 8(c) shows two valid drawing orders derived from the example graph. Note that there is no unique ordering for the same graph.
There is no need to construct the graph explicitly. The graph can be implicitly represented as a set of ordering relations between each pair of neighboring triangles. Hence for each shared edge we determine the drawing order between neighboring triangles using Theorem 1. The time complexity of the graph construction is obviously O(E) where E is the number of shared edges. As each triangle has three edges, E must be smaller than 3N where N is total number of the triangles. Hence, the time complexity should be linear to the total number of triangles.
The final step to discover the ordering of all triangles is to perform a topological sort on the drawing order graph. The details of topological sort can be found in various introductory algorithm literatures [10,21]. The basic idea of topological sort is to output one triangle 1 t at a time such that no other triangle is needed to be drawn before 1 t , i.e. 1 t is not on the right hand side of any → relation. The time complexity of the topological sort is O (E+N) where E is the number of relations (edges in the graph) and N is the number of triangles. Since E is at most 3N, the time complexity is actually linear to the number of triangles.   Figure 13 shows three frames from an animation sequence of warping an image of a Beethoven statue. The images on the first row show the result if they are forwardly warped in a pixel-by-pixel manner. Since no splatting is performed, gaps exist in between the pixels. Images on the second row are the final images after running our algorithm. The third row shows the corresponding warped triangulations, together with the drawing order. To distinguish one triangle from another, we use three distinct colors to distinguish neighboring triangles. The intensity of triangle indicates the drawing order. The darker the color, the earlier the triangle in the drawing order is. Figures 13(h) and 13(i) show how the visibility is resolved by the drawing order. Note that all triangles in the front (those closer to the viewpoint) are lighter in color than those in the back. The coloring in Figures 12(d) and 13(g) look random because the drawing order is irrelevant. In this case, all triangles are visible, and no occlusion occurs as viewed from the original viewpoint.
An indoor scene is shown in Figure 12. The first row shows the final warped images while the second row shows the warped triangles together with the drawing order. Note how the visibility is correctly resolved even though no depth buffering is used. The shaft in the attic scene correctly overlaps the background without comparing depth values. Since the image is now triangulated as a set of connected triangles, no gap exists between them. The holes between the triangles in our result are intentionally introduced to prevent excessive elongation after warping two neighboring triangles with discontinuous depth or optical flow values.
We have compared the speed of triangle-based and pixel-based warping on an SGI Octane computer with MIPS R10000 CPU and an MXE graphics engine. When graphics hardware is utilized, Table 1 shows that the triangle-based warping has a significant improvement in term of rendering speed over the pixel-based warping even though the determination of drawing order is purely done by software. The column "Visibility sorting" indicates the average time needed for the image-based visibility sorting (the determination of drawing order). The column "Render" indicates the average time needed for drawing triangles onto screen, which is done by the graphics accelerator. As expected, visibility sorting takes a longer time as it is done purely by software. Nevertheless, there is still a significant reduction in the total rendering time as compared with pixel-based warping. The rendering time is reduced at least 80% in all test cases. Obviously the speedup is strongly related to the number of triangles. Since the resolution of the two test scenes is the same, the frame rate of pixel-based warping is also the same in both test cases.

Illumination Control
Another important capability in virtual reality applications is illumination control. The change of illumination not only improves realism, but also recognition of objects in the virtual environment. Unfortunately, once the scene has been captured and represented as a set of reference images, we cannot apply standard illumination models to change the illumination because no geometry information is available. In this section we propose a new image representation to allow the inclusion of illumination.

BRDF OF PIXEL
The bi-directional reflectance distribution function (BRDF) [9] is the most general form representing surface reflectivity. To calculate the radiance emanating from a surface element in a specific direction, the BRDF of this surface element must first be determined.
Methods for measuring and modeling the BRDF can be found in various sources [2,20]. The most straightforward approach that includes the illumination variability of image-based rendering system is to measure the BRDF of each object material visible in the image. However, measuring BRDFs of all objects in a real scene is tedious and often infeasible. Imagine a scene containing thousands of small stones, each with its own BRDF. The situation is even worse when a single object exhibits spatial variability of surface properties. Furthermore, associating a BRDF with each object in the scene implies that the rendering time has to be dependent on the scene complexity.
Our solution is to treat each pixel on the image plane as a surface element with an apparent BRDF. Imagine the image plane as an ordinary planar surface, and each pixel can be regarded as a surface element. Each surface element emits different amount of radiant energy in different direction under different illumination. In order to measure the (apparent) BRDF of each pixel, the location of image plane must be specified (see Figure 9), not just the viewing direction. By recording the BRDF of each pixel (Figure 9), we capture the aggregate reflectance of objects visible through that pixel window. The light vector L from the light source and the viewing vector V from the view point E define the two directions of BRDF. This approach does not depend on the scene complexity. Moreover, it is also a unified approach for both virtual and real world scenes.
Note that the apparent BRDF represents the response of object(s) to light in each direction in the presence of the rest of the scene, not merely the surface reflectivity. If we work from views (natural or rendered) that include shadows, it follows that shadows appear in the reconstruction.

MEASURING BRDF
To measure the BRDF, we have to capture the images of virtual or real world scene under different illuminations. A directional light source is cast on the scene from different directions. Rendered images and photos of virtual or real world scene are captured as usual. The algorithm is

For each view point E
For each directional source's direction (,) θφ Render the virtual scene or take a photograph of a real world scene illuminated by this directional light source and named ,, The parameter θ is the polar angle, and φ is the azimuth. The direction (0,) φ is orthogonal to the image plane. The parameters are localized to the coordinate system of the image plane, so transforming the image plane does not affect the BRDF parameterization. The reason for using a directional light source is that the incident light direction is identical at any 3D point. In real life, a directional light source can be approximated by placing a spotlight at a sufficient distance from the scene. The BRDF ρ of each pixel inside a view can be sampled by the following algorithm, One assumption is that there is no intervening medium that absorbs, scatters, or emits any radiant energy.
Since the viewing direction of each pixel within one specific view of the image plane is fixed, the BRDF ρ is simplified to a unidirectional reflectance distribution function (URDF) that depends on the light vector only. Hence, the function ρ is parameterized by two parameters (,) θφ only. From now on, when we refer BRDF, we actually mean the URDF as viewed from a certain viewpoint.
Traditionally, the BRDF is sampled only on the upper hemisphere of surface element, since reflectance must be zero if the light source is behind the surface element. However, in our case the reflectance may be nonzero even the light source direction is from the back of the image plane. This occurs because the actual object surface may not align with the image plane ( Figure 10). Instead, the whole sphere surrounding the pixel has to be sampled for recording its BRDF. Therefore the range of θ should be [0,π]. Nevertheless, sampling only the upper hemispherical BRDF is usually sufficient, since the viewer seldom moves the light source to the back of objects.

MANIPULATING THE LIGHT SOURCES
Once the BRDFs are sampled and stored, they can be manipulated. The final radiance (or simply value) of each pixel is determined by evaluating equation (1) where j i ρ is the reflectance of the j-th object illuminated by the light. L i is the aggregate reflectance we recorded when measuring the BRDF of pixel.
Light Direction: With Equation (1) the light direction can be changed by substituting a different value of (,) θφ . Figures 14(a) and 14(b) show a teapot illuminated by a light source from the top and the right respectively.

Light Intensity:
Another parameter to manipulate is the intensity of the light source. This can be done by changing the value of I i for the i-th light source. Figure 14(c) shows the Beethoven statue illuminated by a blue light from the left.

Multiple Light Sources:
We can arbitrarily add any number of light sources. The trade-off is the computational time. Our current prototype can still run at an acceptable interactive speed using up to three directional light sources. In Figure 14(d), the Beethoven statue is illuminated by a blue light from the left and a red light from the right simultaneously.

Type of light sources:
Up to now, we have made an implicit assumption that the light source for manipulation is directional. Directional light is very efficient in evaluating equation (1) because all pixels on the same image plane are illuminated by light source from the same direction (,) ii θφ . Despite this, the method is not restricted to directional light. It can also be extended to point source and spotlight. However, it will be more expensive to evaluate equation (1) for other types of light sources, since (,) ii θφ will need to be recalculated from pixel to pixel.
Since the image plane where the pixels are located is only a window in the 3D space (Figure 9), the intersected surface element that actually reflects the light may be located on any point on the ray V in Figure 11. To find the light vector L correctly for other types of light sources, the intersection point of the ray and the object surface have to be located first. Note that there is no such problem for directional source, since the light vector is the same for all points in the 3D space. One way to find L is to use the depth image. While this can be easily done for rendered images, real world scenes may be more difficult. Using a range scanner may provide a solution. Figures 15(a) and 15(b) show a box on a plane illuminated by a point source and a directional source respectively. Note the difference in the shadow cast by these sources. Other light source types derived from point source can also be used to re-render the image. Figures 15(c) and 15(d) CDROM show an attic scene illuminated by a spotlight and a slide projector respectively. Note how the illumination is correctly accounted for foreground and background objects in both cases.

Conclusions and Future Direction
In this paper we described how two important capabilities, navigation and illumination control, can be achieved in image-based virtual reality applications. We proposed a triangle-based imagewarping algorithm, which solves the visibility problem without using depth buffering. By grouping pixels to form triangles, the image warping can be sped up using existing graphics hardware. Moreover, the gap problem of pixel-based approach is also removed at the same time. Both the graph construction and topological sorting have a linear time complexity. Moreover the graph construction and topological sorting are required only when the user changes his/her navigation direction.
We have also proposed and implemented a new image representation to allow the image-based objects to be displayed under varying illumination. The new representation does not restrict the shape or the surface properties of objects in the scene. The objects can be concave and highly specular. Moreover, the illumination is controllable.
In some cases "holes" (unfilled pixels) will still exist even though multiple images are warped and blended together. The sampling scheme (placement of the camera during the capturing phase) requires further investigation. We have also extended the visibility algorithm to panoramic images [7], which are commonly used in commercial virtual reality software. Currently, all of our test data consist of synthetic scenes. We are undertaking the capture of real world scenes with a hand held camera. Compression is another major problem to solve in the future. This is especially important when illumination is included. A few attempts have been made in our previous work [23,22,24]. There is still much work to do in using the image-based object as a basic rendering primitive in virtual reality, and our work is only a preliminary step in this direction.

Acknowledgement
This work is supported by Hong Kong Research Grants Council CRCs Scheme grant no. CRC4/98.