Photorealism in Mixed Reality: A Systematic Literature Review

In Augmented Reality systems, virtual objects are combined with real objects, both three dimensional, interactively and at run-time. In an ideal scenario, the user has the feeling that real and virtual objects coexist in the same space and is unable to differentiate the types of objects from each other. To achieve this goal, research on rendering techniques have been conducted in recent years. In this paper, we present a Systematic Literature Review aiming to identify the main characteristics concerning photorealism in Mixed and Augmented Reality systems to find research opportunities that can be further exploited or optimized. The objective is to verify if exists a definition of photorealism in Mixed and Augmented Reality. We present a theoretical fundamental over the most used methods concerning realism in Computer Graphics. Also, we want to identify the most used methods and tools to enable photorealism in Mixed and Augmented Reality systems.


Introduction
In Augmented Reality systems, virtual objects are combined interactively, at run time, with real objects, both three dimensional. In recent years, research has been increasing in Rendering and Visualization to generate photorealistic scenes where real and virtual objects are perfectly combined, but implementing solutions that deliver real-time photo-realistic results is still a challenge. The study presented in this paper is organized as a Systematic Literature Review (SLR). SLR is a way to summarize and evaluate relevant works in a specific topic or area (Kitchenham & Charters, 2007). This SLR aims to identify the main characteristics concerning photorealism in Mixed Reality (MR) and Augmented Reality (AR) systems to find research opportunities that can be further exploited or optimized. Examples of SLR in the Augmented Reality field can be found for different contexts, like identify-ing advantages and disadvantages of its application in industrial maintenance (Palmarini et al., 2018) or identifying applications and trends of AR in Otolaryngology (Wong et al., 2018). This paper is organized as follows. In Section 2, we present a theoretical back-ground concerning Mixed and Augmented Reality and photorealism. We present basic definitions and discuss the main characteristics of each. In Section 3, we address the pro posed SLR, presenting planning, conducting, the selected paper's proposal, and analysis of results. Section 4 presents a discussion about the results. Finally, Section 5 presents the conclusions of this paper.

Fundamentals
In this section will be presented a theoretical contextualization over concepts addressed in this review. Milgram and Kishino (1994) proposed a taxonomy to classify the possible computer-generated environments. Their classification considers the amount of virtual and real elements constituting the observed world, forming the Reality-Virtuality Continuum, as shown in Figure 1. Over this classification, Mixed Reality is a broader concept that embraces all the configurations that mix virtual and real elements, including Augmented Reality. AR, in turn, is the part of Mixed Reality where the elements are mostly real, and a lower number of virtual elements are inserted in the world. According to Azuma's definition, AR combines real and virtual objects, both three-dimensional, interactively, and at runtime (Azuma, 1997). Figure 2 shows an example of an Augmented Reality system. In an ideal AR scenario, the user feels that real and virtual objects coexist in the same space and cannot differentiate the types of objects from each other. One area of research directly related to the AR field is Rendering and Visualization, whose amount of research has grown significantly in recent years (Kim et al., 2018). There are various works in this area ranging from shading and lighting techniques to global lighting techniques. Nevertheless, implementing solutions that deliver real-time photorealistic results is still a challenge.

Photorealism
According to Ferwerda (2003), in Computer Graphics, an image is classified as photorealistic if it is generated by a computer but is indistinguishable from a photograph of the same scene. However, this definition is supported by the idea that a photograph is realis-tic without explaining why this idea is considered valid. So, a way to define an image as photorealistic is to consider the concept of photometrically realistic. Still, according to Ferwerda (2003), photometry is the measurement of light energy perception by the human eye. This way, a photometrically realistic image is an image capable of producing the same visual response that the original scene would produce in the human eye. Few techniques can promote photorealism in image rendering, from the Ray tracing algorithm to newer methods with lower computational costs.

2.2.1.Ray tracing
The idea behind the Ray tracing algorithm was presented by Albrecht Dürer in 1525 using a grid layout to create an image. Later, in 1968, Arthur Appel showed the process of shooting rays into a scene to get pixel colors; this technique is Ray casting. In 1980, Turner Whitted created the Ray tracing algorithm using Ray casting to solve reflections, shadows, and refractions to get pixel colors more accurate. Ray tracing is a technique for image synthesis by creating a 2D picture of a 3D world (Glassner, 1989). To synthesizing a 2D image, it is necessary to know the information of the global illumination that affects the amount of light of each pixel. This information is stored in a "tree of rays" starting from the viewer to the first surface encountered, bouncing in other surfaces and light sources (Whitted, 2005). As shown in Figure 3, the rays are cast from the camera and pass through the pixel until it hits a 3D object or a defined limit, when it hits a 3D object the ray bounces in all directions. From the hit point, shadow rays cast toward the light source, and if these rays do not reach the light source, it creates shadows. If the object is reflexive, the rays continue their path in the reflective direction, and if the object is refractive, it maintains its way in the refracted direction (Whitted, 2005).

2.2.2.Path tracing
In 1986 James Kajiya proposed an integral equation which generalizes a variety of known rendering algorithms and also presented a new form of Ray tracing, called Path tracing (Kajiya, 1986). Unlike distributed Ray tracing, Path tracing shoots only a path with the rays chosen probabilistically (Figure 4), this cuts down immensely the amount of ray object intersections to be computed with much reflection and refraction and performs a speedup of Ray tracing. https: Online, accessed in 28/01/2020 Kajiya's paper describes there are two alternatives instead of choosing the ray randomly. First, keep track of the quantity of each type shot. Ensure the sample distribution of ray types nearly matches the desired distribution by varying the probability of each type. Second, let the ray types be chosen randomly only to scale the contribution of each ray type by the ratio of desired distribution to the resulting weighted sample distribution (Kajiya, 1986). Lafortune (Lafortune & Willems, 1998) presented a new Monte Carlo rendering algorithm called Bidirectional Path tracing that integrates the ideas of shooting and gathering power to create a photorealistic image. In Bidirectional Path tracing rays are shoot at the same time from a light source and the viewing point. Figure 5 shows that all hit points are connected using shadow rays, and the appropriate contributions are added to the flux of the pixel.

2.2.3.Radiosity
The origin of radiosity methods remounts from the 1950s in the engineering studies and applications involving heat transfer. Radiosity refers to a measure of radiant energy or the energy leaving a surface per unit area per unit time. In 1984, researchers at Fukuyama and Hiroshima Universities in Japan and Cornell University in the United States began to apply radiosity methods on image synthesis. Radiosity methods are applicable to solve interreflection of light between Lambertian diffuse surfaces. It is essentially a finite element method, i.e. an approach applicable to solving complicated integrals such as the rendering equation proposed by Kajiya (Kajiya, 1986). While in Ray tracing the illumination equation is evaluated for directions and locations determined by view and pixels of the image, radiosity solves the illumination equation for locations distributed over the surfaces of the scene. Because of this independence between unknowns and the position of the observer, radiosity methods are called view-independent techniques (Cohen & Wallace, 2012).

2.2.4.Differential Rendering
Created by Paul Debevec, Differential Rendering is a technique for simulating the light transport in mixed reality rendering. This technique uses measured scene radiance and global illumination to add new objects to light-based models with correct lighting (Debevec, 2008). New objects are illuminated with a high dynamic range image-based model of the scene. The light-based model is built from an approximate geometric model of the scene, and a light probe is used to estimate, at the location of the synthetic objects, the incident illumination. Three components are considered for computing the light in the scene: the distant scene, the local scene and the synthetic objects ( Figure 6). The distant scene is photometrically unaffected by the objects, which is the visible part of the environment affected by the synthetic object. The local scene has an estimated reflectance model information that interacts with the scene. The final rendering is created with a standard global illumination method that simulates the interaction of light amongst all three components (Debevec, 2008).

2.2.5.Differential Instant Radiosity
In 2010, Knecht et al. presented a new global illumination rendering system that is de-signed to calculate the mutual influence between real and virtual objects (Knecht et al., 2010). The global illumination solution aims to be perceptually plausible without the ambition to be physically accurate. Besides calculating the influence between real and virtual objects, Knecht is also able to relight the scene by virtual light sources. His method is based on Debevec's extension of Differential Rendering Debevec (2008). In addition, it is based on the Instant Radiosity approach from Keller (1997) combined with imperfect shadow maps (Ritschel et al., 2008).

Systematic Literature Review
A Systematic Literature Review is a secondary form study that allows us to identify, evaluate and interpret the available research in a specific subject or topic area (Kitchenham & Charters, 2007). Another objective is to identify research possibilities in the area of interest. In this study's conduction, three steps were followed: planning the review, con-ducting the review, and analyzing the results. Each of these steps will be detailed in the next subsections.

Planning the Review
In this work, the proposed SLR planning was guided by the suggestions presented by Neiva and Silva (2016). In the definition of the research questions, three were proposed to guide this re-view, and we aim to obtain their answers in this work. They are:  Question 1: What is considered photorealism in the context of augmented and mixed reality?  Question 2: What techniques have been used to achieve photorealism in augmented reality applications?  Question 3: What frameworks of Ray tracing are used in augmented reality ap-plications?
From the research questions, we extracted the keywords realism, photorealism, Ray tracing, Path tracing, global illumination, total illumination, augmented reality, and mixed reality. Combining these keywords through the logical operators 'AND' and 'OR' (AND connecting the keywords and OR connecting the synonymous) and applying precedence using parenthesis, we build the search string (realism OR photorealism OR "ray tracing" OR "path tracing" OR "global illumination" OR "total illumination") AND ("augmented reality" OR "mixed reality").
Five bases were chosen to perform the search, considering its relevance in the indexing of Computer Science works and related areas. The chosen bases were IEEEXplore, Scopus, Science Direct, Web of Science, and ACM Digital Library. The results of the search were analyzed in three phases. In the first two, they were selected or removed according to the inclusion and exclusion criteria. In the third, they were evaluated by quality criteria. In the first phase, the papers were analyzed only over title and abstract considering the inclusion criteria:  The work intends to render virtual models using photorealistic techniques.  The proposed solution considers the change in lighting in the real environment.  The work proposes a method for consistent lighting in mixed reality.
Moreover, the exclusion criteria were:  The record is a collection of papers (proceedings of a conference).  The work does not present an augmented reality solution or mixed reality solution. Main focus is to analyze the impacts or perception of realism by the user.  The work only points out problems that lead to a lack of realism and mentions solutions without deepening.  The paper focuses only on shadow projections.
In the second phase introduction and conclusion of the papers were analyzed, and some papers were excluded according to the following exclusion criteria:  The work is a short paper (less than four pages).  The proposed solution is for outdoor environments.  The proposed solution is for head-up displays.
The exclusion criteria were defined based on joint analysis and discussion be-tween the authors of this study. They were designed inspired by other computer graphics reviews, considering the prior knowledge of each of the authors about the research questions. Finally, for the third analysis, some quality criteria were defined and scored for each paper. The maximum aggregated score for a paper was 10 points. The punctuated items and their scores were:  C1. Is the research aim clearly specified? (ranging from 0 to 2)  C2. Are the methods concerning photorealism clearly described and justified? (ranging from 0 to 2)  C3. Are the results presented clearly and reliably? (ranging from 0 to 2)  C4. Are the results based on experiments? (ranging from 0 to 1)  C5. Are the conclusions well-founded? (ranging from 0 to 2)  C6. Does the method apply to more than one reflection type (diffuse and glossy)? (ranging from 0 to 1)

Conducting the Review
The conduction of the review was performed following the steps presented in Figure 7. The review was carried out in the second half of 2019. As no time limit was established, the results obtained include publications before the second half of 2019. As most of the analyzes were performed by one of the authors, there may be a bias in the analysis. To mitigate this bias, in cases of doubt about the adequacy to some criteria, the paper was taken to a joint analysis with the other authors for a decision on the selection or exclusion of the same. Aiming to update the review, we executed a new search in Feb 2021 to get the articles published between the end of 2019 and the beginning of 2021. In the search step, we use the search string in the bases IEEEXplore, Scopus, Science Direct, Web of Science and, ACM Digital Library, obtaining a total of 2658 results from conferences and journals. Table 1 shows the amount and the type of the selected results per base. The results were downloaded and grouped in a database. In this process, the JabRef tool was used for the first removal of duplicates. After this removal, 1933 papers remained. A second removal was performed manually, resulting in 1501 papers. The papers were analyzed in the first phase, where only the title and abstract were considered. After this analysis, considering the phase's inclusion and exclusion criteria, 90 papers were selected for phase 2. In phase 2, the corresponding exclusion criteria were applied considering the introduction and conclusion of the papers. In this phase, 45 papers were excluded, and 45 papers were approved for the third phase. Table 2 summarizes the results for phases 1 and 2. In phase 3, the papers were completely read and scored considering each quality criteria. It was decided that those papers with a grade greater than or equal to 8 would be used to extract the answers. Initially, 23 papers had a grade greater than or equal to 8 and were selected to extract the responses. Table 3 shows the selected papers with their respective scores. After the complete read, it was noted that (

The Selected Papers Proposals
This review showed that even considering only the selected papers, there is a wide range of different techniques aiming to enable photorealism. This way, in this section, we will present an overview of the 22 selected papers, which were considered relevant to the extraction of answers. In Kan and Kaufmann (2013), the proposal was developed aiming at realistic interior design through an Augmented Reality application. However, it was the only article to explain a practical application of the solution. In the following papers, a specific application area was not presented. A review and a classification of existing methods were presented in Grosch et al. (2007). This way, a specific proposal was not presented.
In Pessoa et al. (2010), a method for adaptive color-mapping was presented. The objective was to solve the mismatch of colors between the image captured by the camera and the virtual object so that the virtual object appears as if the observing camera captured them. Some works emphasized estimating the illumination condition of the real scene. For example, in Nunes et al. (2017), the goal was to estimate the light probe that represents the lighting of a real environment from a single image of it. The work from Rohmer et al. (2014) presents a non-invasive system, using arbitrary scene geometry as a light probe for photometric registration, and a general AR rendering pipeline supporting realtime global illumination techniques was also introduced. The paper of Kan and Kaufmann (2012) presents an approach for probeless light estimation and coherent rendering of Augmented Reality in dynamic scenes. The paper of Dos Santos et al. (2012) presented a method for interactive illumination of virtual objects, which are placed in real camera images of rooms under time-varying lighting. In Gruber et al. (2012), a high-quality rendering system for AR is presented. This system uses Ray tracing-based algorithms that achieve a high degree of visual realism and visual coherence. In Gruber et al. (2015), a graphics rendering pipeline applied to Augmented Reality was introduced, based on a real-time Ray tracing paradigm. In Gierlinger et al. (2010), a method for high-quality production rendering in an offline process was proposed. LeGendre et al. (2019) proposed a method for indoor and outdoor lighting inference using a Low Dynamic Range (LDR) image captured by a mobile device. The visual coherence between real and virtual objects was also emphasized. In Franke (2013), a new representation for light transfer from virtual to real objects was proposed. This representation provides lowfrequency shadows and can handle virtual and real light in one unified representation. In Jacobs and Loscos (2006), the main objective was enabling glossy reflections between real and virtual objects. The paper from Agusanto et al. (2003) presented a global illumination rendering system designed to calculate the mutual influence between real and virtual objects. The global illumination solution aimed to be perceptually plausible without the ambition to be physically accurate.
Part of the works emphasized Augmented Reality systems. The Marques et al. (2018) work explores an image-based and hardware-based approach to improve photorealism for rendering synthetic objects in augmented reality. Knecht et al. (2012) presented a distributed illumination approach for AR with consistent illumination of virtual objects with direct light, indirect light, and shadows of primary and strong secondary lights. The paper from Knecht et al. (2011) presents a solution for the photorealistic rendering of synthetic objects into real dynamic scenes in Augmented Reality applications. The work presented in Rohmer et al. (2017) proposed a method for visually coherent interactive AR for mobile devices. Tuceryan et al. (2019) propose a method to estimate the incident light from the real environment, simulate reflected light, and present virtual objects visually coherent with the lightning condition. The work from Pereira et al. (2020) proposes a method to integrate real-time Ray tracing and Augmented Reality to enable photorealistic effects for some materials. Zhang et al. (2019) combine a 3D reconstruction of the scene, the illumination estimation throughout a method named photon emission hemispherical model, and a Ray tracing to render visually coherent virtual objects. In Gierlinger et al. (2010), a real-time rendering engine specifically tailored to Mixed Reality visualization needs was presented. The renderer utilizes pre-computed radiance transfer to calculate dynamic softshadows, high dynamic range images, and image-based lighting to capture incident real-world lighting, approximate bidirectional texture functions to render materials self-shadowing, and frame post-processing filters.

Analysis of Results
The analysis of results was based on reading the selected papers in phase 3 and extracting the answers to the research questions in this reading. Not every paper contributed to answering all the questions, but each paper contributed to answering at least one of them.
For Question 1: What is considered photorealism in the context of augmented and mixed reality? By Schwandt and Broll (2016) photorealism is a visual representation where realistic interactions between surfaces and light are perceived, and close and medium-range reflections are essential for a realistic representation of virtual objects in this context. Also, according to Jacobs and Loscos (2006), the represented scene must have a consistent shadow configuration, its virtual objects must look natural, and the illumination of these virtual objects needs to resemble the illumination of the real objects. By Agusanto et al. (2003), a common illumination for virtual and real objects gives con sistent looks between them but also interreflections and shadows. Finally, by S. Pessoa et al. (2010), the scene is visually compelling, although not necessarily physically correct.
For Question 2: What techniques have been used to achieve photorealism in aug mented reality applications?
We have grouped some techniques, requirements, and tools used to try to identify trends. Concerning the geometry knowledge of the real scene, some works perform a 3D reconstruction during the process using a RGB-D camera. The works Park et al. (2016); Schwandt and Broll (2016) A perceived trend was the use of Differential Rendering by Debevec (2008) as in Kan and Kaufmann (2013); Rohmer et al. (2017); Franke (2013); Kan and Kaufmann (2012); Gruber et al. (2015) and Gruber et al. (2012). In Knecht et al. (2011) and Knecht et al. (2012) a specific version, the Differential Instant Radiosity was used (Knecht et al., 2010). Instant Radiosity by Keller (1997) alone was used in Park et al. (2016) and another technique for rendering was used in Gierlinger et al. (2010), the Precomputed Radiance Transfer (Sloan et al., 2002). The Ray tracing algorithm was used only by Kaufmann (2013, 2012); Dos Santos et al. (2012); Pereira et al. (2020) and Zhang et al. (2019).
For Question 3: What frameworks of Ray tracing are used in augmented reality applications? We identified only one framework for Ray tracing, the OptiX Engine 1 used by Kaufmann (2013, 2012) and Pereira et al. (2020). Some authors have implemented their own ray tracer. This is the case of Dos Santos et al. (2012) and Zhang et al. (2019).

Discussion
At first, the answer to the first research question may seem relatively trivial. However, when we propose this question, we aimed to identify a closed definition with minimum requirements about photorealism in Mixed and Augmented Reality contexts. Through this review, it was possible to observe that there is no commonly accepted closed definition of photorealism in the context of Mixed and Augmented Reality. However, some points are common to all definitions and perceived as a goal in most of the analyzed papers, such as the existence of visual coherence in the interaction between light, real and virtual objects. Shadows and reflections are elements that contribute to the increase of this coherence. Papers analyzed are distributed over a considerably long period (15 years) and it was possible to see a great variety of techniques to implement photorealism in Mixed and Augmented Reality in this period. It is interesting to note that of the 22 selected works, only 8 of them were published in the last five years. Some hypotheses can be drawn from this observation. The first is that the criteria of our review may have limited the scope of the search. The second is that some hardware limitations may have hindered the advance ment of state of the art in recent years since photorealism demands high computational efforts even without Mixed Reality contexts. The most common type of publication we identified in this SLR is conference paper, with 72.22% (13/18), and then journals, with 27.77% (5/18). Figure 8 show the conferences and journals from which the studies were selected. As shown in Figure 9, many works used environment geometry information, which acquisition can add high computational cost. Most of the works aimed to propose solutions that were viable in real-time and most of the techniques prioritized frame rate over visualization quality. A similarity that was noticed among many works was Differential Rendering, being used by many of them. Ray tracing algorithms, however, were used only in five works. It was also noted that few Ray tracing frameworks were mentioned. This may be related to the difficulty of implementing Ray tracing in real-time due to hardware limitations that existed until recently. Possibly at this time, new frameworks are being developed and used and will become popular in the coming years.

Conclusion
In this paper, we present a Systematic Literature Review on photorealism in Mixed and Augmented Reality. The main objectives were to identify what is considered photorealism in the MR and AR context, what techniques are used to make it possible, and which Ray tracing frameworks have been used. Through this review, it was possible to realize that there is no closed definition of what is photorealism in Mixed Reality, but we point out its main elements. It was also noticed the use of several different techniques by different works and in some cases, the same technique applied in different works. Finally, only one Ray tracing framework was identified in this review. As a research area that is growing in recent years, new solutions are expected to emerge, and costly computational solutions such as global lighting will become viable in real-time. As future work, we intend to evaluate the use of modern Ray tracing frame-works in the construction of photorealistic Mixed Reality applications involving diffuse and glossy reflections.

Acknowledgment
This study was financed in part by the Coordenaçaõ de Aperfeiçoamento de Pessoal de Nível Superior -Brasil (CAPES) -Finance Code 001.