Soundwise: Synthetic Acoustic Signals from Video Streams for Augmented Human Perception with Deep Learning

Date
2022
Authors
Ghose, Sanchita
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

The two most essential perceptual modalities of humans are vision and hearing capabilities. In everyday life, people have to analyze enormous audio and visual information in order to deal with multiple multisensory events which necessitates the development of research in the area of audiovisual learning (AVL) through vigorous artificial intelligence technologies. Learning the coherence between audio-visual signals is a very challenging task, however, researchers are considering these correlation challenges focusing on leveraging these two modalities to improve the performance of previously addressed single-modality tasks. Since, sound plays a crucial role to perceive the inherent action information of most of the visual scenarios of the real world, auditory guidance can assist a person or a device to analyze the surrounding events more effectively. This research work is focused on synthesizing both content and temporally aligned sound from natural videos. In this research, we propose novel visuals-to-sound generating deep learning systems capable to serve in diverse multimodal applications developing interactive intelligence. This research will also address the prevailing gaps in multisensory research fields that can be eliminated by our proposed auto sound generation techniques impacting different multimodal learning applied territories.

Description
This item is available only to currently enrolled UTSA students, faculty or staff.
Keywords
Artificial Intelligence, Computer Vision, Deep Learning, IoT, Internet of things, Multimedia Application, Sound Generation
Citation
Department
Electrical and Computer Engineering