Background Info
Background Info and Timeline of Exploration of Neural Rendering
In 2020, during the 10th anniversary of the film Inception (2010), I started thinking again on how to implement the dream device that allows people to create realistic environments in a VR way. I revisited this thinking since Inception originally released in 2010 when in 2011 I brainstormed a 4D motion chair with a 3D dome screen to be a virtual reality device, when VR was an unfamiliar concept. After seeing the movie a few times in a VR theater, I thought about it much more and thought about how Lightfields could be an option. At the time I was reading about Paul Debevec’s research on volumetric lightfield video, which I found to have great potential in capturing and viewing real environments. I also tried the Welcome to Lightfields demo from Google which showed volumetric captures from a GoPro, viewable in VR. I further thought about how I could create a light field system similar to Unity, where a user could create, combine, and visualize lightfields within VR. Eventually, I imagined possible research scenarios in the future about different ways that lightfields could be edited and imagined futuristic ways to create synthetic photo realistic environments with natural language. The following year in 2021, I started research at UC Berkeley working on an asymetrical communication system for VR and non-VR users, and brainstormed a way to incorporate lightfields, but that was infeasible for the project. I imagined projects where large lightfield displays would surround a user and would create a VR effect without an HMD (projectors would be mounted to relight the person based on the environment).
During the summer of 2021, Unreal Engine 5 beta released and after seeing the potential and possibilities with the Quixel Bridge Scans, I imagined possible scenarios of the future of virtual production. The previous year, I dabbled with creating a custom virtual production pipeline with Unity and Blender, but with Unreal 5 and the introduction of MetaHumans, I wanted to explore creating detailed virtual scenes and even try to create film-grade shots. For an experiment, I created a MetaHuman of myself and deepfaked myself onto it using DeepFaceLab. The results were not perfect, but I was impressed that it looked better than I expected. I then created a MetaHuman resembling Indiana Jones and deepfaked his face onto it, and the results were pretty decent (Indiana Jones 5 was filming at the same time as I was training the deep fake). Due to the limitations of my PC, I was unable to properly implement the Quixel megascans with the MetaHuamn since my PC would crash often and was unable to render at a very good quality. However, I would think up of work-arounds like using SparkAR to overlay an Indy hat onto the metahuman, or using Unity to layer image/video planes.
In early 2021, I read about OpenAI’s Dall-E (the first version) imagined a scenario where a user could just think or say what was imagaineg, and the system would create it. I was kind of surprised that it didn’t gain much attention since I found it very interesting of the possibilities. There was an interactive demo where it would complete half of an image of a cube rotated. Based on primarily this example demo, I started imagining the possibility of creating virtual objects with this system and a UI to enable this. I further imagined a way to combine light fields and Dall-E to create an Inception like experience. I imagined a scenario where I was in Soda Hall (a CS building at UC Berkeley) and being able to manipulate or deform my surroundings as a lightfield. I started implementing a demo where I rendered a lightfield from a Doctor Strange Portal I made in Houdini and compositied it into a lightfield of my room, and in the portal was another lightfield of the outside. During 2020 and 2021 I would try to capture light fields with my 360 camera in the hopes of using that footage in the future to create a lighfield (unknowingly I would be capturing data to create NeRFs in the future).
In 2022, I started working on my website (at first it was just an A-Frame virtual environment where eventually The Desk was to be there). I took CS 184, the graphics class at my school, which was taught by Ren Ng, who I knew about from Lytro (a lightfield camera company). I really liked the class since I really like all things 3D and I recognized many things throughout the lectures and it was great to learn more about these topics. Ren Ng and some of the TAs in the class were co-authors of the fist NeRF paper in 2020 and the TAs gave an overview of how NeRFs worked. After learning more about NeRFs I realized that they were like a spiritual successor to Light fields and would have a lot of features that I would find really useful. Eventually NVidia Instant NeRF launched and during summer 2022, I created dozens of NeRFs with it and tried to capture 360 NeRFs as well.
I eventually thought about how to neurally complete partially captured NeRFs with diffusion models such as Dall-e. One of the TAs had created a system to synthetically create NeRFs (DreamFields) and I wanted to extend this into neurally completing nerfs in the sense that if I only captured part of a room or car, I want the system to approximate the rest of the car or room with the diffusion model. Over the summer I drafted a mockup of a VR system that could allow a user to create a synthetic photorealistic environment by capturing, editing, and placing NeRFs all within VR. In this, the a guided AR or passthrough VR/MR experience could guide a user around a room to capture the environment in a bounding box, and the user could import this environment into the editor where the user could either combine existing NeRFs or create new synthetic NeRFs on the spot through voice or text. The system could also intelligently use natural language commands or outpaint/inpaint parts of the room. Dall-e 2 had launched just around that time, and I experimented with Neural completion on 2D images and was convinced that in its current state, diffusion models could achieve neural NeRF completion. Soon after, Stable Diffusion and other diffusion models released publically and interest in NeRFs, and diffusion models skyrocketed. I was excited that now I have the APIs and ability to create this system, and am currently exploring ways to do this (possibly using Google’s MobileNeRF or Luma Labs AI NeRF to run VR NeRFs). I find it cool and interesting that the things I envisioned in the last few years are now achievable in the now, rather than in 5-10 years I believed they would be in. I really want to build this vision soon, and am sure that others will create it too, but I look forward to the future.