Space Time Carving for Shape and Motion Recovery

Recently, a lot of methods for shape and motion reconstruction have been studied extensively. This paper describes a method for simultaneous shape and motion reconstruction without any corresponding points. We call the method space-time carving. The proposed method is based on a space carving method which can reconstruct 3D shape without any corresponding points. The method uses photo-consistency for 3D space carving. Our proposed method also uses photo-consistency for carving high dimensional space which represents not only shape but also motion. As a result, the proposed method can recover not only 3D shapes but also 3D motions of objects. The experimental results show that the proposed method works well.


INTRODUCTION
The recovery of object shape and motion is one of the most important problems on computer vision. Thus, a lot of methods are studied in this field [1，2]. Recently, shape and motion recovery can be simultaneously accomplished by using extended multiple view geometry [3]. In this framework, an N-dimensional vector which includes shape and motion information is recovered from camera images, where 3  N . Thus, shape and motion are recovered simultaneously as in the case of shape recovery.
However, most of the methods for recovery of shape/motion need corresponding points in camera images. Although it is not so hard to find corresponding points for sparse object shape, it becomes a very hard problem when we would like to recover dense shape. In order to avoid the corresponding problem, Kutulakos el al. [6] proposed space carving method for 3D shape recovery. This method pays attention to ``photo consistency'' on object surface, and does not require explicit corresponding points on input images. However, the method cannot recover moving objects because the method assumes that input scenes are static. Although some methods which can recover moving object based on concept of space carving [4，5], the methods do not use motion information, directly for shape recovery. Basically, they recover ``static'' object for each time using multiple cameras.
In this paper, we propose simultaneous shape and motion recovery method without using explicit corresponding points. The proposed method carves a high dimensional space which represents motion as well as shape. As a result, we can accomplish recovery of 3D scene which includes moving objects without using explicit corresponding points.

II.
EXTENDED MULTIPLE VIEW GEOMETRY

Projection from high dimensional space to image plane
At first, we summarize extended multiple view geometry[3]. By using the geometry, object shapes and motions can be recovered simultaneously as in the case of the existing 3D shape recovery method. In this section, we describe a camera projection from 4 or 5-dimensional space to 2-dimensional space by extended projective camera matrices. At first, we describe a point projection from 4-dimensional space to image plane.
A point as follows: where k is non-zero scalar, and W and m are homogeneous representations of W and m. The matrix Q has 5 3 components and it indicates projection from 4D to 2D.
Next, we describe a projection from 5D to 2D. The projection from a point in 5D space to an image point m is described as follows: where l is non-zero scalar and Ṽ is homogeneous representation of V. The matrix R has 6 3 components and it indicates projection from 5D to 2D.
Projections from high (N) dimensional space to image plane are accomplished by We call the matrix extended projection matrix, and we call multiple view geometries for high dimensional space extended multiple view geometry. The extended multiple view geometry can describe various relationships which cannot be described by existing multiple view geometry for 3-dimensional space. The extended multiple view geometry can describe dynamic configurations as shown in the following sections.

Projection of 1D linear moving points
Let us consider a projection of 3D point X moving in the

Space Time Carving for Shape and Motion Recovery
Fumihiko Sakaue, Masaaki Takami and Jun Sato Nagoya Institute of Technology direction of U with speed  . The position of X at time t can be represented by . Now, let us consider the case where the point X is projected to image plane by camera P. The projection of X is described by: where k is non-zero scalar. Now, let us define a 5 3 extended projection matrix Q which represents projection from 4D point to image plane, and a 4-dimensional point W as follows: By using Q and W, Eq.
(3) can be rewritten as follows: where W is a homogeneous representation of W. This equation indicates that the projection of a moving point in 3D space can be described as a projection of a static point in 4-dimensional space. Thus, we can recover shape X and motion  from input images.

Projection of planar moving points
Next, we discuss a projection of planar moving point in 3D space. Let us consider a 3D point X which moves on a plane in 3D space. If we consider two independent vectors 1 U and 2 U on the plane, the moving point at time t can be described as , where  and  indicate speed of point X in each direction. Now, let us consider a projection of the moving point by camera P. The projection of the moving point is described by We can also consider this projection in high dimensional space. Let us define a 6 3 extended projection matrix R and 5-dimensional vector V as follows: By using R and V, Eq.(7) can be rewritten as follows: V R m l (10) where Ṽ denotes homogeneous representation of V. The Eq. (7) indicates that the projection of a planar moving point in 3D space can be considered as a projection of a static point in 5D space. The Eqs.(3) and (7) indicate that projections of moving points in 3D space can be considered as projections of static points in high dimensional space, which include shape and planar motion information. Furthermore, the equations also indicate that we can recover not only object shape, but also motion from image points.
Note that, the theory can also represent arbitrary 3D motion using a projection from 6-dimensional space[3].

Multilinear constraint on high dimensional spaces
In the section, we describe multiple view constraints from high dimensional spaces to image space. Now, let us consider a case where an . The projections can be described as follows: We call the constraint multilinear constraint. For example, the multilinear constraint for three cameras in 5D space is described as follows:

Space carving for 3D shape recovery
In Section II, we introduced extended multiple view geometries in order to represent object shapes and motions simultaneously. Of course, we need to find corresponding points from input image set in order to recover shape and motion by using the geometries. In particular, if we would like to recover dense 3D shape, a lot of corresponding points are required. On the other hand, Kutulakos et al. [6] proposed space carving method which needs no explicit corresponding points for 3D shape recovery. By extending of the space carving using extended multiple view geometry, we propose space-time carving which can recover not only shape but also motion. In order to derive space-time carving, we summarize the existing space carving method in this section. Let us consider 2 cameras taking the same scene from different points at the same time. Now, we assume that a surface of the object is Lambertian surface. In this case, when the two cameras observe the same 3D point, observed colors in each camera is the same. On the other hand, the colors are not same when two cameras observe different 3D points. We call the property ``photo consistency'' in this paper.
Let us consider the point is defined as follows: The recovery of 3D shape is realized by the following steps using the carving rule. 1) Given 3D space is separated into set of voxels. And then, a recovery shape by space carving is larger than truth object. The recovered shape by the method is called as ``photo-hull''. The space carving method can recover 3D scene without explicit corresponding points. We need only images taken by cameras and relationship such as fundamental matrices for the scene recovery. In the following sections, we extend this method for recovery not only shapes but also motions by using extended multiple view geometry.
Note that, although we assume that a surface of an object is Lambertian surface in this paper, we may use another reflectance model for photo consistency such as Phong model [7], Torrance-Sparrow model [8] and so on.

Shape and motion recovery by carving of high dimensional spaces
Now, we propose method which can recover shape and motion simultaneously without explicit corresponding points in input images. The method is based on the space carving method described in previous section. The space carving method carves 3-dimensional space which represents 3D shapes for shape reconstruction. On the other hand, our proposed method carves high dimensional space which describes not only shape, bus also object motions such as described in 2.2 and 2.3. The space is constructed by components of space and time, and then, the space can be regarded as space-time [9]. Thus, we call the carving method space-time carving.
At first, we described carving of 4-dimensional space which is described in 2.2. Now, let us consider 2 extended projection matrix 1 Q and 2 Q observing target scene. A linear moving point X with speed  is denoted by static point W in 4-dimensional space which represents shape and motions. A point W is carved or left in the space by a following rule: Since the point W includes shape and motion information, we recover shape and motion simultaneously without explicit corresponding point in the input images.
A carving of 5-dimensional space which includes 3D shape and planar motion can also be accomplished. When a point V in 5D space is projected by extended projection matrix 1 R and 2 R , 5D space recovery is realized by the following rule: Since the point V includes shape and 2-dimensional motion information, we can recover shape and planar motion information simultaneously without using explicit corresponding points.

High dimensional space recovery under weak calibrations
In Section 3.2, we proposed simultaneous shape and motions recovery by carving of high dimensional spaces. The method, however, requires full calibration of extended projection matrices. In this section, we propose a method which can be used even when we know only relative relationship between cameras such as trifocal tensors. Now, let us consider 4-dimensional space recovery, where we calibrate only trifocal tensor which represents relative relationship among 3 extended projection matrices. The space time carving based on the trifocal tensor is realized by the following steps. Let    Fig.2 (right) indicates the voxel space used in the method described in this section. The right voxel space has projective ambiguity and the space is given by projective transformation of left set. Thus, the recovered point W' includes projective ambiguity. However, we do not need to eliminate the ambiguity because there are various applications which can be realized from projective reconstruction [9].
A recovery of 5-dimensional space can be realized by the same way. For the recovery of 5-dimensional space, 4D point W' is replaced to 5D point V' and it includes object shape and planar motion information of the 3D point. In this case, the recovered point V' includes 5-dimensional projective ambiguity.
Note that, we can eliminate the N-dimensional projective ambiguity by a projective transformation. The projective transformation is accomplished by the following computation.
projective transform matrix and Ŵ and W are real vector and recovered vector. Since the matrix has DoF and Eq.(20) gives N linear independent equations, the matrix can be estimated from more than which has known shape and motion.
By transforming using the estimated matrix, projective ambiguity of recovered vectors are eliminated.
Furthermore, we can eliminate ambiguity of arbitrary components in recovered vector. For example, we consider elimination of motion ambiguity  from 4-dimensional vector DoF and the equation gives an independent equation, we can estimate the transform matrix from 9 corresponding points. Note that, we do not need shape information for the estimation; we need only motion information of the object.

Recovery of planar moving objects using real images
In this section, we show some experimental results by our proposed method. At first, we show object motion recovery result from real images. In this experiment planar motion objects are taken by a camera. Note that, we does not need multiple cameras for shape and motion recovery since Eq.(9) indicates that static camera in multiple instants can be regarded as multiple cameras in 5-dimensional space. The cameras are completely calibrated and the space-time carving method described in 3. 2  . An experimental environment and target objects are shown in Fig. 3. The target objects were moved to a direction indicated in the image with constant velocities. The target objects are taken three times by a static camera and the shape and motions are recovered from the three images. The taken images are shown in Fig. 4 which size is 480 640  . In the images, target objects are extracted beforehand by a simple background subtraction.
The object motion recovery result is shown in Fig.5. In the image, recovered voxels are the color of the voxels indicates direction magnitude of motions. A color circle in left bottom shows direction and speed for each color. A hue of color indicates direction and saturation indicates speed of the motions. For example, if an object moves to right, the object is colored by yellow. The direction and speed estimated by the proposed method is correct in most of the case, and then, the results indicate that our proposed method works well.

Motion recovery under weak calibration
In this experiment, we recovered shape and motion under weak calibration. Objects in the experiment moves to the same direction and the shape and motion can be represented in 4-dimensional space. The motion of the objects is indicated in Fig. 6 and input images are indicated in Fig. 7. In this experiment, only a trifocal tensor which represents relative relationship among camera is calibrated. By using the shape and motion recovery method described in 3.3, the information and recovered, and then each pixel which are reprojections of recovered information are colored by motion information.
An upper image in Fig. 8 indicates recovered motion information. Color bars in the images indicate direction and speed for each pixel. In the figure, objects which move to right/left become green/purple, and static objects become grey. A lower image in the figure indicates estimation errors. In this figure, each pixel is colored by information of estimation error, and then, pixels become grey (which is center of color bar) when estimation error is small. All objects in the result image is colored by grey image, and then, the results indicate that our proposed method can recover motion information under weak calibration.

Recovery of non-rigid motion
In this experiment we recover motion of non-rigid object. In this scene, the shape of a 3D ball was transformed as shown in Fig.9. The motion of each point is recovered by the method described in 3.2 and each point is colored according to the recovered motion information. The size of 3D scene is cm cm cm   Fig. 10 shows recovered motion information, ground truth of motions and difference from ground truth. The recovered result is mostly identical to the ground truth of the motions. The results indicate that our proposed method recovers motions even if the object motion is not rigid motion.

V. CONCLUSIONS
In this paper, we proposed a novel shape and motion recovery method which requires no corresponding points. The proposed method carves high dimensional spaces which represents not only 3D shape but also 3D motions, and thus, the method can recover 3D scene which includes moving objects without explicit corresponding points. Furthermore, we showed that the proposed method can recover the scenes in the case where only weak calibration is available. Finally, some experimental results indicate that the proposed method can be applied to rigid and non-rigid object motions recovery.