Carlos Alvarez College of Business Faculty Research

Permanent URI for this collectionhttps://hdl.handle.net/20.500.12588/251

Browse

Now showing 1 - 20 of 37

A Conway–Maxwell–Poisson Type Generalization of Hypergeometric Distribution
(2023-02-02) Roy, Sudip; Tripathi, Ram C.; Balakrishnan, Narayanaswamy
The hypergeometric distribution has gained its importance in practice as it pertains to sampling without replacement from a finite population. It has been used to estimate the population size of rare species in ecology, discrete failure rate in reliability, fraction defective in quality control, and the number of initial faults present in software coding. Recently, Borges et al. considered a COM type generalization of the binomial distribution, called COM–Poisson–Binomial (CMPB) and investigated many of its characteristics and some interesting applications. In the same spirit, we develop here a generalization of the hypergeometric distribution, called the COM–hypergeometric distribution. We discuss many of its characteristics such as the limiting forms, the over- and underdispersion, and the behavior of its failure rate. We write its probability-generating function (pgf) in the form of Kemp’s family of distributions when the newly introduced shape parameter is a positive integer. In this form, closed-form expressions are derived for its mean and variance. Finally, we develop statistical inference procedures for the model parameters and illustrate the results by extensive Monte Carlo simulations.
A Study on the X¯ and S Control Charts with Unequal Sample Sizes
(2020-05-02) Park, Chanseok; Wang, Min
The control charts based on X¯ and S are widely used to monitor the mean and variability of variables and can help quality engineers identify and investigate causes of the process variation. The usual requirement behind these control charts is that the sample sizes from the process are all equal, whereas this requirement may not be satisfied in practice due to missing observations, cost constraints, etc. To deal with this situation, several conventional methods were proposed. However, some methods based on weighted average approaches and an average sample size often result in degraded performance of the control charts because the adopted estimators are biased towards underestimating the true population parameters. These observations motivate us to investigate the existing methods with rigorous proofs and we provide a guideline to practitioners for the best selection to construct the X¯ and S control charts when the sample sizes are not equal.
Application of the Cox Proportional Hazards Model for the Quantitative Analysis of LC-MS Proteomics Data
(Office of the Vice President for Research, 2019) Arreola, Ivan; Han, David
Along with quantitative, analytical genomics, proteomics continues to be a growing field for determining the gene and cellular functions at the protein level. As the liquid chromatography mass spectrometryphy (LC-MS) experiments produce protein peak intensities data, statistical and computational techniques are required to conduct quantitative analytical proteomics. The LC-MS proteomics data often have large quantities of missing peak intensities due to censoring of the low-abundance spectral features. Because of this, the observed peak intensities from the LC-MS method are all positive, skewed, and often left-censored. The classical survival analysis methods are ideal to detect differentially expressed proteins among different groups. These methods include the non-parametric rank sum (RS) tests such as the Kolmogorov-Smirnov (KS) and Wilcoxon-Mann-Whitney (WMW) tests, parametric surivival models such as the accelerated failure time (AFT) model with popular lifetime distributions; log-normal (LN), log-logistic (LL), and Weibull (W) for modeling the peak intensity data. As an alternative approach, here we propose the Cox proportional hazards (PH) method, a popular semi-parametric model for modeling survival data. The proposed regression-based method allows for leniency on the hazard function by alleviating the requirements of distribution-specific hazard functions. With the hopes of gaining more insightful biological information for cellular functions at the protein level, the statistical properties of each method are investigated through a simulation study and an application to the Type I diabetes dataset.
Big Data Analytics, Data Science, ML&AI for Connected, Data-driven Precision Agriculture and Smart Farming Systems: Challenges and Future Directions
(Association for Computing Machinery, 2023-05-09) Han, David; Rodriguez, Mia
Big data and data scientific applications in the modern agriculture are rapidly evolving as the data technology advances and more computational power becomes available. The adoption of big data has enabled farmers and producers to optimize their agricultural activities sustainably with cutting-edge technologies, resulting in eco-friendly and efficient farming. Wireless sensor networks and machine learning have had a direct impact on smart and precision agriculture, with deep learning techniques applied to data collected via sensor nodes. Additionally, internet of things, drones, and robotics are being incorporated into farming techniques. Digital data handling has amplified the information wave, and information and communication technology have been used to deliver benefits to both farmers and consumers. This work highlights the technological implications and challenges that arise in data-driven agricultural practices as well as the research problems that need to be solved.
Classical and Bayesian Inference of a Progressive-Stress Model for the Nadarajah–Haghighi Distribution with Type II Progressive Censoring and Different Loss Functions
(2022-05-08) Alotaibi, Refah Mohammed; Alamri, Faten S.; Almetwally, Ehab M.; Wang, Min; Rezk, Hoda
Accelerated life testing (ALT) is a time-saving technology used in a variety of fields to obtain failure time data for test units in a fraction of the time required to test them under normal operating conditions. This study investigated progressive-stress ALT with progressive type II filtering with the lifetime of test units following a Nadarajah–Haghighi (NH) distribution. It is assumed that the scale parameter of the distribution obeys the inverse power law. The maximum likelihood estimates and estimated confidence intervals for the model parameters were obtained first. The Metropolis–Hastings (MH) algorithm was then used to build Bayes estimators for various squared error loss functions. We also computed the highest posterior density (HPD) credible ranges for the model parameters. Monte Carlo simulations were used to compare the outcomes of the various estimation methods proposed. Finally, one data set was analyzed for validation purposes.
Clinical and Quality of Life Benefits for End-Stage Workers' Compensation Chronic Pain Claimants following H-Wave(R) Device Stimulation: A Retrospective Observational Study with Mean 2-Year Follow-Up
(2023-02-01) Trinh, Alan; Williamson, Tyler K.; Han, David; Hazlewood, Jeffrey E.; Norwood, Stephen M.; Gupta, Ashim
Previously promising short-term H-Wave(R) device stimulation (HWDS) outcomes prompted this retrospective cohort study of the longer-term effects on legacy workers' compensation chronic pain claimants. A detailed chart-review of 157 consecutive claimants undergoing a 30-day HWDS trial (single pain management practice) from February 2018 to November 2019 compiled data on pain, restoration of function, quality of life (QoL), and polypharmacy reduction into a summary spreadsheet for an independent statistical analysis. Non-beneficial trials in 64 (40.8%) ended HWDS use, while 19 (12.1%) trial success charts lacked adequate data for assessing critical outcomes. Of the 74 final treatment study group charts, missing data points were removed for a statistical analysis. Pain chronicity was 7.8 years with 21.6 ± 12.2 months mean follow-up. Mean pain reduction was 35%, with 89% reporting functional improvement. Opioid consumption decreased in 48.8% of users and 41.5% completely stopped; polypharmacy decreased in 36.8% and 24.4% stopped. Zero adverse events were reported and those who still worked usually continued working. An overall positive experience occurred in 66.2% (p < 0.0001), while longer chronicity portended the risk of trial or treatment failure. Positive outcomes in reducing pain, opioid/polypharmacy, and anxiety/depression, while improving function/QoL, occurred in these challenging chronic pain injury claimants. Level of evidence: III
Comparison of Gene Set Analysis with Various Score Transformations to Test the Significance of Sets of Genes
(Office of the Vice President for Research, 2018) Arreola, Ivan; Han, David
Microarray analysis can help identify changes in gene expression which are characteristic to human diseases. Although genomewide RNA expression analysis has become a common tool in biomedical research, it still remains a major challenge to gain biological insight from such information. Gene Set Analysis (GSA) is an analytical method to understand the gene expression data and extract biological insight by focusing on sets of genes that share biological function, chromosomal regulation or location. Thing systematic mining of different gene-set collections could be useful for discovering potential interesting gene-sets for further investigation. Here, we seek to improve previously proposed GSA methods for detecting statistically significant gene sets via various score transformations.
Comparison of Regression Methods to Identify Differential Expression in RNA-Sequencing Count Data from the Serial Analysis of Gene Expression
(Office of the Vice President for Research, 2019) Arreola, Ivan; Han, David
Comparative RNA-sequencing analysis for the Serial Analysis of Gene Expression (SAGE) can help identify changes in gene expression which are characteristic to human diseases. Since the RNA-sequencing experiment measures gene expressions in the form of counts, usually with a large degree of skewness, the analysis methods based on continuous probability distributions are generally inappropriate for modeling this type of data. Currently, the parametric regression techniques for solving this problem are based on the well-known discrete probability distributions such as Poisson and negative binomial. In order to overcome this modeling challenge with higher flexibilities to account for a wide range of dispersion levels, here we introduce an alternative Generalized Linear Model (GLM) based on the Conway-Maxwell-Poisson distribution, also known as COM-Poisson or CMP distribution. The CMP regression model generalizes the standard Poisson and negative binomial regressions, and it is suitable for fitting count data with varying degrees of over- and under-dispersions. Using simulated and real SAGE datasets, the performance of the proposed method is assessed in comparison to the Poisson- and negative binomial-based regression models.
A Density Peak Clustering Algorithm Based on the K-Nearest Shannon Entropy and Tissue-Like P System
(Hindawi, 2019-07-31) Jiang, Zhenni; Liu, Xiyu; Sun, Minghe
This study proposes a novel method to calculate the density of the data points based on K-nearest neighbors and Shannon entropy. A variant of tissue-like P systems with active membranes is introduced to realize the clustering process. The new variant of tissue-like P systems can improve the efficiency of the algorithm and reduce the computation complexity. Finally, experimental results on synthetic and real-world datasets show that the new method is more effective than the other state-of-the-art clustering methods.
A DNA algorithm for the job shop scheduling problem based on the Adleman-Lipton model
(Public Library of Science (PLOS), 2020-12-02) Tian, Xiang; Liu, Xiyu; Zhang, Hongyan; Sun, Minghe; Zhao, Yuzhen
A DNA (DeoxyriboNucleic Acid) algorithm is proposed to solve the job shop scheduling problem. An encoding scheme for the problem is developed and DNA computing operations are proposed for the algorithm. After an initial solution is constructed, all possible solutions are generated. DNA computing operations are then used to find an optimal schedule. The DNA algorithm is proved to have an O(n2) complexity and the length of the final strand of the optimal schedule is within appropriate range. Experiment with 58 benchmark instances show that the proposed DNA algorithm outperforms other comparative heuristics.
An Extended Clustering Membrane System Based on Particle Swarm Optimization and Cell-Like P System with Active Membranes
(Hindawi, 2020-01-31) Wang, Lin; Liu, Xiyu; Sun, Minghe; Qu, Jianhua
An extended clustering membrane system using a cell-like P system with active membranes based on particle swarm optimization (PSO), named PSO-CP, is designed, developed, implemented, and tested. The purpose of PSO-CP is to solve clustering problems. In PSO-CP, evolution rules based on the standard PSO mechanism are used to evolve the objects and communication rules are adopted to accelerate convergence and avoid prematurity. Subsystems of membranes are generated and dissolved by the membrane creation and dissolution rules, and a modified PSO mechanism is developed to help the objects escape from local optima. Under the control of the evolution-communication mechanism, the extended membrane system can effectively search for the optimal partitioning and improve the clustering performance with the help of the distributed parallel computing model. This extended clustering membrane system is compared with five existing PSO clustering approaches using ten benchmark clustering problems, and the computational results demonstrate the effectiveness of PSO-CP.
Fuzzy Reasoning Numerical Spiking Neural P Systems for Induction Motor Fault Diagnosis
(2022-09-28) Yin, Xiu; Liu, Xiyu; Sun, Minghe; Dong, Jianping; Zhang, Gexiang
The fuzzy reasoning numerical spiking neural P systems (FRNSN P systems) are proposed by introducing the interval-valued triangular fuzzy numbers into the numerical spiking neural P systems (NSN P systems). The NSN P systems were applied to the SAT problem and the FRNSN P systems were applied to induction motor fault diagnosis. The FRNSN P system can easily model fuzzy production rules for motor faults and perform fuzzy reasoning. To perform the inference process, a FRNSN P reasoning algorithm was designed. During inference, the interval-valued triangular fuzzy numbers were used to characterize the incomplete and uncertain motor fault information. The relative preference relationship was used to estimate the severity of various faults, so as to warn and repair the motors in time when minor faults occur. The results of the case studies showed that the FRNSN P reasoning algorithm can successfully diagnose single and multiple induction motor faults and has certain advantages over other existing methods.
Genetic Addiction Risk and Psychological Profiling Analyses for "Preaddiction" Severity Index
(2022-10-27) Blum, Kenneth; Han, David; Bowirrat, Abdalla; Downs, Bernard William; Bagchi, Debasis; Thanos, Panayotis K.; Baron, David; Braverman, Eric R.; Dennen, Catherine A.; Gupta, Ashim; Elman, Igor; Badgaiyan, Rajendra D.; Llanos-Gomez, Luis; Khalsa, Jag; Barh, Debmalya; McLaughlin, Thomas; Gold, Mark S.
Since 1990, when our laboratory published the association of the DRD2 Taq A1 allele and severe alcoholism in JAMA, there has been an explosion of genetic candidate association studies, including genome-wide association studies (GWAS). To develop an accurate test to help identify those at risk for at least alcohol use disorder (AUD), a subset of reward deficiency syndrome (RDS), Blum's group developed the genetic addiction risk severity (GARS) test, consisting of ten genes and eleven associated risk alleles. In order to statistically validate the selection of these risk alleles measured by GARS, we applied strict analysis to studies that investigated the association of each polymorphism with AUD or AUD-related conditions, including pain and even bariatric surgery, as a predictor of severe vulnerability to unwanted addictive behaviors, published since 1990 until now. This analysis calculated the Hardy–Weinberg Equilibrium of each polymorphism in cases and controls. Pearson's χ2 test or Fisher's exact test was applied to compare the gender, genotype, and allele distribution if available. The statistical analyses found the OR, 95% CI for OR, and the post risk for 8% estimation of the population's alcoholism prevalence revealed a significant detection. Prior to these results, the United States and European patents on a ten gene panel and eleven risk alleles have been issued. In the face of the new construct of the "preaddiction" model, similar to "prediabetes", the genetic addiction risk analysis might provide one solution missing in the treatment and prevention of the neurological disorder known as RDS.
GPU-Based Parallel Particle Swarm Optimization Methods for Graph Drawing
(Hindawi, 2017-07-30) Qu, Jianhua; Liu, Xiyu; Sun, Minghe; Qi, Feng
Particle Swarm Optimization (PSO) is a population-based stochastic search technique for solving optimization problems, which has been proven to be effective in a wide range of applications. However, the computational efficiency on large-scale problems is still unsatisfactory. A graph drawing is a pictorial representation of the vertices and edges of a graph. Two PSO heuristic procedures, one serial and the other parallel, are developed for undirected graph drawing. Each particle corresponds to a different layout of the graph. The particle fitness is defined based on the concept of the energy in the force-directed method. The serial PSO procedure is executed on a CPU and the parallel PSO procedure is executed on a GPU. Two PSO procedures have different data structures and strategies. The performance of the proposed methods is evaluated through several different graphs. The experimental results show that the two PSO procedures are both as effective as the force-directed method, and the parallel procedure is more advantageous than the serial procedure for larger graphs.
Identification of Tomato Disease Types and Detection of Infected Areas Based on Deep Convolutional Neural Networks and Object Detection Techniques
(Hindawi, 2019-12-16) Wang, Qimei; Qi, Feng; Sun, Minghe; Qu, Jianhua; Xue, Jie
This study develops tomato disease detection methods based on deep convolutional neural networks and object detection models. Two different models, Faster R-CNN and Mask R-CNN, are used in these methods, where Faster R-CNN is used to identify the types of tomato diseases and Mask R-CNN is used to detect and segment the locations and shapes of the infected areas. To select the model that best fits the tomato disease detection task, four different deep convolutional neural networks are combined with the two object detection models. Data are collected from the Internet and the dataset is divided into a training set, a validation set, and a test set used in the experiments. The experimental results show that the proposed models can accurately and quickly identify the eleven tomato disease types and segment the locations and shapes of the infected areas.
Identifying Flow Patterns in a Narrow Channel via Feature Extraction of Conductivity Measurements with a Support Vector Machine
(2023-02-08) Yang, Kai; Liu, Jiajia; Wang, Min; Wang, Hua; Xiao, Qingtai
In this work, a visualization experiment for rectangular channels was carried out to explore gas–liquid two-phase flow characteristics. Typical flow patterns, including bubble, elastic and mixed flows, were captured by direct imaging technology and the corresponding measurements with fluctuation characteristics were recorded by using an electrical conductivity sensor. Time-domain and frequency-domain characteristics of the corresponding electrical conductivity measurements of each flow pattern were analyzed with a probability density function and a power spectral density curve. The results showed that the feature vectors can be constructed to reflect the time–frequency characteristics of conductivity measurements successfully by introducing the quantized characteristic parameters, including the maximum power of the frequency, the standard deviation of the power spectral density, and the range of the power distribution. Furthermore, the overall recognition rate of the four flow patterns measured by the method was 93.33% based on the support vector machine, and the intelligent two-phase flow-pattern identification method can provide a new technical support for the online recognition of gas–liquid two-phase flow patterns in rectangular channels. It may thus be concluded that this method should be of great significance to ensure the safe and efficient operation of relevant industrial production systems.
An Improved Apriori Algorithm Based on an Evolution-Communication Tissue-Like P System with Promoters and Inhibitors
(Hindawi, 2017-02-19) Liu, Xiyu; Zhao, Yuzhen; Sun, Minghe
Apriori algorithm, as a typical frequent itemsets mining method, can help researchers and practitioners discover implicit associations from large amounts of data. In this work, a fast Apriori algorithm, called ECTPPI-Apriori, for processing large datasets, is proposed, which is based on an evolution-communication tissue-like P system with promoters and inhibitors. The structure of the ECTPPI-Apriori algorithm is tissue-like and the evolution rules of the algorithm are object rewriting rules. The time complexity of ECTPPI-Apriori is substantially improved from that of the conventional Apriori algorithms. The results give some hints to improve conventional algorithms by using membrane computing models.
Influential Factors and the Realization Mechanism of Sustainable Information-Sharing in Virtual Communities from a Knowledge Fermenting Perspective
(SAGE, 2020-11-25) Zhang, Meng; Gao, Yang; Sun, Minghe; Bi, Datian
Little is known about sustainable information-sharing in virtual communities, although it is increasingly recognized as a useful information-sharing tool. The aim of this study is to explore the influential factors and the realization mechanism of sustainable information-sharing in virtual communities. Starting from the similarity between biological fermentation and the information-sharing process in virtual communities, the present study creatively introduces the knowledge fermenting theory used in the analysis. Six factors influencing sustainable information-sharing in virtual communities are first identified based on this theory, which include sharing bodies, interactive topics, communication mechanism, supporting technology, communication environment, and platform scale. The relations among these six factors are then analyzed using the Decision-Making and Trial Evaluation Laboratory (DEMATEL) method. The results indicate that the factor of sharing bodies has the strongest influence on other factors and the factor of interactive topics receives the most influences from the other factors. On this basis, the realization mechanism of sustainable information-sharing in virtual communities is elaborated from the following four aspects: the four stages of the information-sharing realization, the guide role of “strain,” the catalytic role of “enzyme,” and the effect of environment. The results indicate that sustainable information-sharing in virtual communities is a process of spiral evolution. Finally, recommendations are given to virtual community managers, users, and business firms.
Job-Related Performance and Quality of Life Benefits in First Responders Given Access to H-Wave® Device Stimulation: A Retrospective Cohort Study
(2022-10-08) Williamson, Tyler K.; Rodriguez, Hugo C.; Han, David; Norwood, Stephen M.; Gupta, Ashim
Current chronic pain treatments primarily target symptoms and are often associated with harmful side-effects and complications, while safer non-invasive electrotherapies like H-Wave® device stimulation (HWDS) have been less explored. The goal of this study is to evaluate first responder-reported effects of HWDS on job-related and quality-of-life measures. This is a retrospective cohort study where first responders were surveyed following voluntary use of HWDS regarding participant experience, frequency of use, job-related performance, and quality-of-life. Responses were analyzed using means comparison tests, while bivariate analysis assessed responses associated with HWDS usage. Overall, 92.9% of first responder HWDS users (26/28) reported a positive experience (p < 0.0001), with 82.1% citing pain reduction (p = 0.0013), while 78.6% indicated it would be beneficial to have future device access (p = 0.0046). Participants using H-Wave® were at least six times more likely to report higher rates of benefit (100% vs. 0%, p = 0.022), including pain reduction (91.3% vs. 8.7%, p = 0.021) and improved range-of-motion (93.3% vs. 69.2%, p = 0.044). Spending more time with family was associated with better job performance following frequent HWDS use (50% vs. 8.3%, p = 0.032). Repetitive first responder H-Wave® use, with minimal side effects and easy utilization, resulted in significant pain reduction, improvements in job performance and range-of-motion, and increased time spent with family, resulting in overall positive experiences and health benefits. Level of Evidence: III.
Load-Sharing Model under Lindley Distribution and Its Parameter Estimation Using the Expectation-Maximization Algorithm
(2020-11-22) Park, Chanseok; Wang, Min; Alotaibi, Refah Mohammed; Rezk, Hoda
A load-sharing system is defined as a parallel system whose load will be redistributed to its surviving components as each of the components fails in the system. Our focus is on making statistical inference of the parameters associated with the lifetime distribution of each component in the system. In this paper, we introduce a methodology which integrates the conventional procedure under the assumption of the load-sharing system being made up of fundamental hypothetical latent random variables. We then develop an expectation maximization algorithm for performing the maximum likelihood estimation of the system with Lindley-distributed component lifetimes. We adopt several standard simulation techniques to compare the performance of the proposed methodology with the Newton–Raphson-type algorithm for the maximum likelihood estimate of the parameter. Numerical results indicate that the proposed method is more effective by consistently reaching a global maximum.

Browse

Browsing Carlos Alvarez College of Business Faculty Research by Department "Management Science and Statistics"