Enhancing Depth Estimation in Adverse Lighting Scenarios for Autonomous Driving
The accurate determination of object depth from cameras or sensors mounted on autonomous vehicles is a significant challenge in the field of autonomous driving. While monocular or stereo cameras have been extensively used to improve depth estimation, recent advances in deep neural networks have significantly increased their accuracy. However, these state-of-the-art methods do not consider two special cases: a) radiometric differences in stereo images. b) monocular depth estimation in dynamic scenes. Meanwhile, most of them are mainly designed for daytime scenarios, limiting their effectiveness in low-light or nighttime conditions. The poor performance of neural network-based depth estimation in nighttime scenarios is primarily due to several factors: a) the lack of precise ground truth depth data for training, b) the decreased visibility of objects in images due to adverse lighting conditions, and c) elevated image noise resulting from insufficient light sources during camera capture. Addressing these challenges is crucial for improving the accuracy and practicality of depth estimation, allowing deep neural networks to function effectively in both day and night scenarios. In the first part of my research, I consider the radiometric differences in stereo images from the viewpoint of the Bidirectional Reflectance Distribution Function (BRDF).I propose a novel approach for removing these radiometric differences to perform stereo matching effectively. The approach estimates irradiance images based on the Bidirectional Reflectance Distribution Function (BRDF) which describes the ratio of radiance to irradiance for a given image. I demonstrate that to compute an irradiance image I only need to estimate the light source direction and the object's roughness. I consider an approximation that the dot product of the unknown light direction parameters follows a Gaussian distribution and I use that to estimate the light source direction. The object's roughness is estimated by calculating the pixel intensity variance using a local window strategy. By applying the above steps independently on the original stereo images, I obtain the illumination invariant irradiance images that can be used as input to stereo matching methods. Experiments conducted on well-known stereo estimation datasets demonstrate that my proposed approach significantly reduces the error rate of stereo matching methods. In the second part of my research, I propose a novel method for monocular depth estimation in dynamic scenes. I first explore the arbitrariness of object's movement trajectory in dynamic scenes theoretically. To overcome the arbitrariness, I assume that points move along a straight line over short distances and then summarize it as a triangular constraint loss in two dimensional Euclidean space. This triangular loss function is used as part of my proposed pixel movement prediction network, PMPNet, to estimate a dense depth map from a single input image. To overcome the depth inconsistency problem around the edges, I propose a deformable support window module that learns features from different shapes of objects, making depth value more accurate around edge area. The proposed model is trained and tested on two outdoor datasets - KITTI and Make3D, as well as an indoor dataset - NYU Depth V2. The quantitative and qualitative results reported on these datasets demonstrate the success of my proposed model when compared against other approaches. Ablation study results on the KITTI dataset also validate the effectiveness of the proposed pixel movement prediction module as well as the deformable support window module. In the third part of my research, I propose a self-supervised model to address the lack of ground-truth data for depth estimation. To achieve this, I introduce a novel prior called the red channel attention prior, which models the relationship between each channel in the image using Rayleigh scattering. This prior generates an attention map that is used in an attention mechanism, giving the red channel in each pixel greater weight in the neural network to improve the final accuracy of depth estimation. In my fourth study, I have reevaluated the connection between image enhancement and monocular depth estimation, and have proposed a new approach called "enhancement parameter prior." This approach leverages the inverse relationship between depth and visual quality to create a self-supervised signal that can effectively train depth networks. However, simply enhancing the brightness of each pixel may improve image clarity for human vision but not necessarily for neural networks. This results in minimal improvement in depth estimation. To address this issue, I have taken inspiration from and proposed the use of the Gaussian Cumulative Distribution-Curve (GCD-Curve) for image enhancement. By using the enhancement parameter map, red channel attention, and geometry constraints between sequential inputs, my proposed monocular depth estimation network can effectively estimate the depth value from a single image without ground truth labels. To evaluate the effectiveness of my proposed model, I have tested it on four datasets: RobotCar-Night, nuScenes-Night, RobotCar-Day, and KITTI. The results, both quantitative and qualitative, demonstrate the success of my approach when compared to 14 other approaches. In my fifth study, I present a novel approach to training an effective image denoising model using only noisy images. The recent advancements in neural networks have greatly contributed to the field of image denoising, but the need for large amounts of noisy-clean image pairs for supervision remains a constraint. To overcome this limitation, my method leverages the concept of Gaussian distribution and physics-based noise modeling to generate training pairs from a single noisy image. First, I show that a neural network can denoise an image using two different images, as long as the difference between them follows a Gaussian distribution. Second, I propose a physics-based sub-sampling strategy to generate the training image pairs. This strategy models the sub-sampling probability based on the physics of the camera pipeline, ensuring that the paired pixels are neighbors and have the same probability distribution as the original noisy image. Finally, the denoising network is trained on the sub-sampled training pairs, leading to improved performance, even in adverse lighting conditions such as nighttime scenarios. Additionally, by adhering to the physics-based noise modeling, my model has a wider range of real-world applicability. Furthermore, I propose an original and novel theory to prove mathematically that this self-supervised technique is effective in nighttime scenarios. This new theorem enables the neural network to be perceived as having a solid foundation rather than being a black box. Overall, my research makes several contributions to the field of computer vision: A Computer Graphics perspective is provided for removing the radiometric differences in stereo images by modeling it with the Bidirectional Reflectance Distribution Function (BRDF). Irradiance image estimation is proposed for radiometric difference removal, which is robust to lighting conditions and camera exposure. The light source direction is approximated using a Gaussian distribution and object roughness is estimated using local window-based pixel intensity variance. I propose a novel deep neural network architecture called PMPNet. It consists of a Pixel Movement Prediction Module which provides two pixel movement predictions and a third straight-line prediction. The relation between pixel movements and straight-line is summarized into a novel Triangular Constraint Loss Function. I propose a novel deep neural network for accurate monocular depth estimation, especially in low-light conditions, using two priors - enhancement parameters from image enhancement and red channel attention from Rayleigh scattering. I propose the use of the Gaussian Cumulative Distribution-Curve (GCD-Curve) to enhance the input image, resulting in improved depth estimation performance. I propose a novel and effective self-supervised image denoising model based on the simple U-Net architecture. This model can be trained using training image pairs generated from a single noisy input image, thereby reducing the need for large amounts of noisy-clean image pairs for supervision. I provide a solid theoretical foundation for the proposed image denoising model, demonstrating the validity of my self-supervised image denoising framework, especially in low-light conditions.