Attention-based Audio Driven Facial Animation

dc.contributor.advisorRad, Paul
dc.contributor.authorZand, Neda
dc.contributor.committeeMemberDesai, Kevin
dc.contributor.committeeMemberQuarles, John
dc.creator.orcidhttps://orcid.org/0000-0002-1578-3947
dc.date.accessioned2024-04-09T15:56:34Z
dc.date.available2024-08-15
dc.date.available2024-04-09T15:56:34Z
dc.date.issued2022
dc.descriptionThis item is available only to currently enrolled UTSA students, faculty or staff. To download, navigate to Log In in the top right-hand corner of this screen, then select Log in with my UTSA ID.
dc.descriptionThe full text of this item is not available at this time because the author has placed this item under an embargo until August 15, 2024.
dc.description.abstractIn the virtual world, the human digital twin is the digital representation of the real-world counterpart or twin. By using these digital equivalent, the performance of products can be predicted in advance, allowing them to be designed and manufactured more efficiently. Compare to other digital twins, generating human digital twin is more critical because it needs to be believable and as a result, the numerous elements that must be roughly estimated in order to produce convincing facial movements. Extra difficulty will be added if we generate facial animation based on some input signals like audio signals. It will be converted to a multi module problem to handle audio and image. Synthesizing facial movement is widely used in a variety applications such as healthcare (surgical planning and facial tissue surgical simulation, facial therapy and prosthetics), game industry (facial animation, real time sequencing and face and audio synchronization), video teleconferencing (mapping individual photographs to canonical representations of the face), and social robots (facial animation in interactive robots). This diverse application add extra difficulty to the facial animation in computer graphic and computer vision. Modeling and animating convincing characters (2-D or 3-D human or non-human) is the key to all of these applications. Many theoretical methods have been investigated and put into practice in order to produce the most precise animations that can successfully represent facial human animations. While many of these techniques like key-framing or performance capture, may produce realistic facial animation, they are either time consuming or difficult to alter. Novel methods that make use of machine learning has shown a huge improvements in both accuracy and time efficiency. My thesis provides a thorough review of existing literature in this area with a special focus on deep learning methods. Additionally, I propose a novel attention-based deep learning model to generate audio driven facial animation. I have used an encoder decoder architecture to encode the audio features and map these features to the 3-D facial movements. To aim this, I have used convolutional neural networks for the encoder part and added spatial and channel attention module. The key idea for adding attention module was to focus on more relevant features. Both spatial and channel attention help the convolutional layers to focus on more relevant features and simultaneously suppressing less important ones. The proposed model has been trained on the VOCASET dataset and the 3-D result has been visualized in Omniverse. The attention module has improved the lip synchronization as it was expected. The result has been recorded as a supplementary video.
dc.description.departmentComputer Science
dc.format.extent72 pages
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/20.500.12588/6355
dc.language.isoen
dc.subject3D facial animation
dc.subjectcomputer vision
dc.subjectdata science
dc.subjectdeep learning
dc.subjectfacial animation
dc.subjectlip synchronization
dc.subject.classificationComputer science
dc.subject.classificationArtificial intelligence
dc.titleAttention-based Audio Driven Facial Animation
dc.typeThesis
dc.type.dcmiText
dcterms.accessRightspq_closed
local.embargo.terms2024-08-15
thesis.degree.departmentComputer Science
thesis.degree.grantorUniversity of Texas at San Antonio
thesis.degree.levelMasters
thesis.degree.nameMaster of Science

Files