JURSW Volume 8

Permanent URI for this collectionhttps://hdl.handle.net/20.500.12588/1747

Browse

Now showing 1 - 5 of 5

Can We Predict Big 5 Personality Traits from Demographic Characteristics?
(UTSA Office of Undergraduate Research, 2022-12) Woods, Ethan; Han, David
Here we aim to predict the Big Five personality traits based on the demographic information using a generalized linear model. Data was obtained from openpsychometrics.org, pre-processed in MS Excel, and imported to R for statistical analysis. First, it was attempted to predict each individual response item using an ordinal regression model. It was however found to be not viable, even after various weightings were applied to the demographic data. The response variables were then aggregated to form five categories, one for each personality trait: conscientiousness, agreeableness, neuroticism, openness to experience, and extraversion. We then applied a dimension reduction technique to the country variable as well as the race variable in order to achieve an adequate model fit. It was determined that although the demographic information could be useful, precise prediction of the Big Five traits require other information that was not captured in the dataset.
An expectation-maximization algorithm for estimating the parameters of the correlated binomial distribution
(UTSA Office of Undergraduate Research, 2022-12) Bennett, Andrea; Wang, Min
The correlated binomial (CB) distribution was proposed by Luceño (Computational Statistics & Data Analysis 20, 1995, 511–520) as an alternative to the binomial distribution for the analysis of the data in the presence of correlations among events. Due to the complexity of the mixture likelihood of the model, it may be impossible to derive analytical expressions of the maximum likelihood estimators (MLEs) of the unknown parameters. To overcome this difficulty, we develop an expectation-maximization algorithm for computing the MLEs of the CB parameters. Numerical results from simulation studies and a real-data application showed that the proposed method is very effective by consistently reaching a global maximum. Finally, our results should be of interest to senior undergraduate or first-year graduate students and their lecturers with an emphasis on the interested applications of the EM algorithm for finding the MLEs of the parameters in discrete mixture models.
Meta-analysis of Odds Ratios from Heterogeneous Clinical Studies
(UTSA Office of Undergraduate Research, 2022-12) Song, Mina; Belle, Macy; Han, David
Many systematic reviews of randomized clinical trials require meta-analyses of odds ratios. A conventional method estimates the overall odds ratios via weighted averages of the logarithm of individual odds ratios. However, this approach has several deficiencies due to the underlying assumptions and approximations. The goal of this study is to understand and quantify the methodological pitfalls in conducting a meta-analysis of odds ratios. The fixed-effect and random-effect models of pooled odds ratios are compared by applying to a meta-analysis of SNP studies. A popular statistical software R is used for the analysis along with SPSS and SAS. It is found that the point estimates and confidence intervals for the overall log odds ratio can differ substantially between the traditional and alternative methods, which would affect the resulting statistical inferences. It is recommended that for producing reliable results, the traditional methods for meta-analysis of odds ratios should be discouraged.
Performance of Machine Learning Algorithms for Heart Disease Prediction: Logistic Regressions Regularized by Elastic Net, SVM, Random Forests, and Neural Networks
(UTSA Office of Undergraduate Research, 2022-12) Ikpea, Obehi Winnifred; Han, David
Heart disease, a medical condition caused by plaque buildup in the walls of the arteries, is the leading cause of death in the U.S. and worldwide. About 697,000 people suffer from this condition in the U.S. alone. This research project aims to assess and compare the performance of several classification algorithms for predicting heart disease so that the method can be considered as a clinical indicator of cardiovascular health. These methods include multiple logistic regression regularized with or without elastic nets, support vector machine, random forest, and artificial neural networks. A low prevalence of the disease is reflected in the data imbalance, and an oversampling technique is also suggested to deal with the computational challenges posed by this data imbalance.
Policy-Guided Susceptible-Infected-Recovered Modeling of the COVID-19 Spread in Texas
(UTSA Office of Undergraduate Research, 2022-12) Woods, Ethan; Han, David
The goal of this research was to create an SIR model for the Texas COVID-19 cases based on the state data from March of 2020 through October of 2020, and to investigate the impact of public policies on the transmission of COVID. The data was pre-processed using Excel; some basic time series graphs were produced in Excel as well. All other data analysis, including the production of all graphs relating to the SIR model, was performed in R. Difficulty in estimating the model parameters by the maximum likelihood method was encountered due to the short durations between the implementation dates of various policies designed to curb the spread of COVID-19. Examining the estimate trends of beta, gamma, and R0, a stabilizing pattern for R0 was observed over time, which would require further investigations to understand the epidemiology of COVID-19 in Texas.

Browse

Browsing JURSW Volume 8 by Department "Management Science and Statistics"