Soundwise: Synthetic Acoustic Signals from Video Streams for Augmented Human Perception with Deep Learning

dc.contributor.advisorPrevost, Jeff
dc.contributor.authorGhose, Sanchita
dc.contributor.committeeMemberBrowning, JoAnn
dc.contributor.committeeMemberQian, Chunjiang
dc.contributor.committeeMemberKudithipudi, Dhireesha
dc.creator.orcidhttps://orcid.org/0000-0003-0883-5718
dc.date.accessioned2024-02-09T21:11:40Z
dc.date.available2024-02-09T21:11:40Z
dc.date.issued2022
dc.descriptionThis item is available only to currently enrolled UTSA students, faculty or staff. To download, navigate to Log In in the top right-hand corner of this screen, then select Log in with my UTSA ID.
dc.description.abstractThe two most essential perceptual modalities of humans are vision and hearing capabilities. In everyday life, people have to analyze enormous audio and visual information in order to deal with multiple multisensory events which necessitates the development of research in the area of audiovisual learning (AVL) through vigorous artificial intelligence technologies. Learning the coherence between audio-visual signals is a very challenging task, however, researchers are considering these correlation challenges focusing on leveraging these two modalities to improve the performance of previously addressed single-modality tasks. Since, sound plays a crucial role to perceive the inherent action information of most of the visual scenarios of the real world, auditory guidance can assist a person or a device to analyze the surrounding events more effectively. This research work is focused on synthesizing both content and temporally aligned sound from natural videos. In this research, we propose novel visuals-to-sound generating deep learning systems capable to serve in diverse multimodal applications developing interactive intelligence. This research will also address the prevailing gaps in multisensory research fields that can be eliminated by our proposed auto sound generation techniques impacting different multimodal learning applied territories.
dc.description.departmentElectrical and Computer Engineering
dc.format.extent106 pages
dc.format.mimetypeapplication/pdf
dc.identifier.isbn9798438751144
dc.identifier.urihttps://hdl.handle.net/20.500.12588/3542
dc.languageen
dc.subjectArtificial Intelligence
dc.subjectComputer Vision
dc.subjectDeep Learning
dc.subjectIoT
dc.subjectInternet of things
dc.subjectMultimedia Application
dc.subjectSound Generation
dc.subject.classificationArtificial intelligence
dc.subject.classificationElectrical engineering
dc.subject.classificationComputer engineering
dc.titleSoundwise: Synthetic Acoustic Signals from Video Streams for Augmented Human Perception with Deep Learning
dc.typeThesis
dc.type.dcmiText
dcterms.accessRightspq_closed
thesis.degree.departmentElectrical and Computer Engineering
thesis.degree.grantorUniversity of Texas at San Antonio
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Ghose_utsa_1283D_13608.pdf
Size:
5.31 MB
Format:
Adobe Portable Document Format