Development of a three dimensional kinematic model of lip and jaw coordination during speech




Samaan, Michael

Journal Title

Journal ISSN

Volume Title



The neuromuscular control of speech production is a highly complex process because it requires coordination of multiple systems (i.e., respiration, phonation, resonation and articulation). Understanding motor speech control is critical in that speech disorders are common. For instance, developmental speech disorders occur in as high as 19% of the population and disorders of speech occur in over 60% of people who suffer from neurological disease (Enderly et al, 1986). The goal of this research is to develop a model that can be used to further understand motor speech control and its disorders in order to enhance differential diagnosis of speech disorders and provide a framework for understanding changes associated with treatment. The novelty of this model involves its ability to study speech in 3-dimensions instead of the usual 2D, used by many previous researchers.

The fluent production of speech requires the precise motor actions or movements of the articulators (i.e., lips, tongue, jaw). The Optotrak motion-capture system (Northern Digital, Inc) uses infrared light emitting diodes that allow for study of movements in 3-dimensional (3D) space in real time. 3D movements of speech were collected, synchronized with an audio file using a microphone, and analyzed the coordination of the upper lip (obicularis superior), lower lip (obicularis inferior) and jaw (masseter). Both the kinematic and audio data were modeled interactively using routines developed in MATLAB programming language (MathWorks, v7.1.0.246 (R14)). This model allows users to quantify and visualize the articulator movement as well as other facial aspects of speech. One of the strengths of the model is that the researcher can select certain phrases and isolate their effect on the speech articulators. This potentially allows for understanding of the effects of the disorder on the production of certain phonemes and can lead to potential methods of rehabilitation. The model also allows for the simultaneous plotting of kinematic data (i.e. displacement and velocity) with the subject's audio file, synchronizing the effects of the phonemes on facial features such as the lips and jaw. The 3D model was validated with previous work on articulator coordination and on lower lip displacement profiles in healthy subjects. In particular, the model data collected were compared to an index of speech coordination, the spatial-temporal index (STI) developed by Ann Smith at Purdue University (e.g., Smith et al, 1995; Smith et al, 2000). In this study the effects of speech rate changes on coordination of the articulators was assessed using the STI. As well, the study was conducted to assess the aperture of the lips as a measure that likely captures important information about articulatory coordination and one that has received little exploration in the existing literature. The model is also capable of providing more typical kinematic information such as movement velocity, duration and displacement.

Through the use of the Root Mean Square (RMS), to compare displacement profiles in each of the three directions (x, y and z), the speech model was validated. In Lucero's data, the measured range of the x, y and z respectively are 3mm, 12mm, and 8mm, and my data ranges for x, y and z are 5mm, 14mm and 9mm for "flea" and 5mm, 13mm and 7mm for "blip", respectively. The results obtained with my speech model correlates to Lucero's conclusions that the lower lip was found to have a strong vertical motion, suggesting a primarily one dimensional motion of the lips (Lucero et al, 1999; Lucero et al, 2005). The normal, fast and slow average STI values of all 20 subjects were 18.4 (SD=4.45), 19.9 (SD=3.78), 22.7 (SD=3.95), respectively. A repeated-measures ANOVA was also done to examine rate as the within-subjects factor in order to validate the model. The ANOVA provided the same results as Wohlert et al (1998) and thereby validating the model.

New parameters were calculated and were analyzed. These new parameters were upper lip-lower lip (UL LOL) distance, minimum and maximum LOL velocity, total aperture, right lip (RL) and left lip (LEL) angles. These parameters provided new insight into the effects of speech on the articulators. After performing an ANOVA, various significances were found between the parameters. The UL LOL distance was found to be significant when stating the phrases from Appendix A yet was insignificant across the three rates of speech for the sentence "Buy Bobby a Puppy".

The work presented in this thesis has been validated and tested using normal subjects. It is an improvement on the current speech models, due to its simple yet quantitative methods of analysis. The model was developed to be a graphical user interface, in order to simplify the use of the model on clinicians and other researchers.


This item is available only to currently enrolled UTSA students, faculty or staff. To download, navigate to Log In in the top right-hand corner of this screen, then select Log in with my UTSA ID.


Articulator, Jaw, Kinematics, Lip, Rate of Speech, Speech



Biomedical Engineering