College of Sciences
Permanent URI for this communityhttps://hdl.handle.net/20.500.12588/256
Browse
Browsing College of Sciences by Department "Computer Science"
Now showing 1 - 20 of 139
- Results Per Page
- Sort Options
Item A Federated Model for Scheduling in Wide-Area Systems(UTSA Department of Computer Science, 1996-02-06) Weissman, Jon B.; Grimshaw, Andrew S.In this paper a model for scheduling in wide-area systems is described. The model is federated and utilizes a collection of local site schedulers that control the use of their resources. The wide-area scheduler consults the local site schedulers to obtain candidate machine schedules. A set of issues and challenges inherent to wide-area scheduling are also described and the proposed model is shown to address many of these problems. A distributed algorithm for wide-area scheduling is presented and relies upon information made available about the resource needs of user jobs. The wide-area scheduler will be implemented in Legion, a wide-area computing system developed at the University of Virginia.Item A Framework for Composing Noninterferent Languages(UTSA Department of Computer Science, 2013-09) Gampe, Andreas; von Ronne, JefferySoftware is frequently implemented using multiple specialized languages for different tasks. Security type systems, one technique to secure software, have previously only been studied for programs written entirely in a single language. We present a framework that facilitates reasoning over composite languages. In it, guarantees of sufficiently related component languages can be lifted to the composed language. This can significantly lower the burden necessary to certify that such composite programs are safe. Our simple but powerful approach relies, at its core, on computability and security-level separability. Besides non-interference, we also show how the framework can accommodate notions of declassification. To demonstrate the applicability of this framework, we show that a standard secure while language from the literature satisfies all necessary requirements of a composition host, and sketch how to securely embed an expressive fragment of SQL.Item A Genetic Algorithm Using Triplet Nucleotide Encoding and DNA Reproduction Operations for Unconstrained Optimization Problems(2017-06-30) Zang, Wenke; Zhang, Weining; Zhang, Wenqian; Liu, XiyuAs one of the evolutionary heuristics methods, genetic algorithms (GAs) have shown a promising ability to solve complex optimization problems. However, existing GAs still have difficulties in finding the global optimum and avoiding premature convergence. To further improve the search efficiency and convergence rate of evolution algorithms, inspired by the mechanism of biological DNA genetic information and evolution, we present a new genetic algorithm, called GA-TNE+DRO, which uses a novel triplet nucleotide coding scheme to encode potential solutions and a set of new genetic operators to search for globally optimal solutions. The coding scheme represents potential solutions as a sequence of triplet nucleotides and the DNA reproduction operations mimic the DNA reproduction process more vividly than existing DNA-GAs. We compared our algorithm with several existing GA and DNA-based GA algorithms using a benchmark of eight unconstrained optimization functions. Our experimental results show that the proposed algorithm can converge to solutions much closer to the global optimal solutions in a much lower number of iterations than the existing algorithms. A complexity analysis also shows that our algorithm is computationally more efficient than the existing algorithms.Item A Kernel-Based Intuitionistic Fuzzy C-Means Clustering Using a DNA Genetic Algorithm for Magnetic Resonance Image Segmentation(2017-10-27) Zang, Wenke; Zhang, Weining; Zhang, Wenqian; Liu, XiyuMRI segmentation is critically important for clinical study and diagnosis. Existing methods based on soft clustering have several drawbacks, including low accuracy in the presence of image noise and artifacts, and high computational cost. In this paper, we introduce a new formulation of the MRI segmentation problem as a kernel-based intuitionistic fuzzy C-means (KIFCM) clustering problem and propose a new DNA-based genetic algorithm to obtain the optimal KIFCM clustering. While this algorithm searches the solution space for the optimal model parameters, it also obtains the optimal clustering, therefore the optimal MRI segmentation. We perform empirical study by comparing our method with six state-of-the-art soft clustering methods using a set of UCI (University of California, Irvine) datasets and a set of synthetic and clinic MRI datasets. The preliminary results show that our method outperforms other methods in both the clustering metrics and the computational efficiency.Item A Logical Framework for Sequence Diagram with Combined Fragments(UTSA Department of Computer Science, 2011-11) Shen, Hui; Robinson, Mark; Niu, JianweiGraphical representations of scenarios, such as UML Sequence Diagrams and Message Sequence Charts, serve as a well-accepted means for modeling the interactions among software systems and their environment through the exchange of messages. The Combined Fragments of UML Sequence Diagram permit various types of control flow among messages (e.g., interleaving and branching) to express an aggregation of multiple traces encompassing complex and concurrent behaviors. However, Combined Fragments increase the difficulty of Sequence Diagram comprehension and analysis. This paper introduces an approach to formalizing the semantics of Sequence Diagrams with Combined Fragments in terms of Linear Temporal Logic (LTL) templates. In all the templates, different semantic aspects are expressed as separate, yet simple LTL formulas that can be composed to define the semantics of basic Sequence Diagram, all the Combined Fragments, and nested Combined Fragments. Moreover, the formalization enables us to leverage the analytical powers of automated decision procedures for LTL formulas to determine if a collection of Sequence Diagrams is consistent, safe, and deadlock-free.Item A Network-Based Approach for Improving Annotation of Transcription Factor Functions and Binding Sites in Arabidopsis thaliana(2023-01-21) Najnin, Tanzira; Saimon, Sakhawat Hossain; Sunter, Garry; Ruan, JianhuaTranscription factors are an integral component of the cellular machinery responsible for regulating many biological processes, and they recognize distinct DNA sequence patterns as well as internal/external signals to mediate target gene expression. The functional roles of an individual transcription factor can be traced back to the functions of its target genes. While such functional associations can be inferred through the use of binding evidence from high-throughput sequencing technologies available today, including chromatin immunoprecipitation sequencing, such experiments can be resource-consuming. On the other hand, exploratory analysis driven by computational techniques can alleviate this burden by narrowing the search scope, but the results are often deemed low-quality or non-specific by biologists. In this paper, we introduce a data-driven, statistics-based strategy to predict novel functional associations for transcription factors in the model plant Arabidopsis thaliana. To achieve this, we leverage one of the largest available gene expression compendia to build a genome-wide transcriptional regulatory network and infer regulatory relationships among transcription factors and their targets. We then use this network to build a pool of likely downstream targets for each transcription factor and query each target pool for functionally enriched gene ontology terms. The results exhibited sufficient statistical significance to annotate most of the transcription factors in Arabidopsis with highly specific biological processes. We also perform DNA binding motif discovery for transcription factors based on their target pool. We show that the predicted functions and motifs strongly agree with curated databases constructed from experimental evidence. In addition, statistical analysis of the network revealed interesting patterns and connections between network topology and system-level transcriptional regulation properties. We believe that the methods demonstrated in this work can be extended to other species to improve the annotation of transcription factors and understand transcriptional regulation on a system level.Item A random walk based approach for improving protein-protein interaction network and protein complex prediction(UTSA Department of Computer Science, 2011-12) Lei, Chengwei; Ruan, JianhuaMotivation: Recent advances in high-throughput technology have dramatically increased the availability of protein-protein interaction (PPI) data and stimulated the development of many methods for predicting protein complexes, which are important in understanding the functional organization of PPI networks. However, automated protein complex prediction from PPI data alone is significantly hindered by the high level of noise, sparseness, and highly skewed degree distribution of PPI networks. Here we present a novel network topology-based algorithm to remove spurious interactions and recover missing ones by computational predictions, and to increase the accuracy of protein complex prediction by reducing the impact of hub nodes. The key idea of our algorithm is that two proteins sharing some high-order topological similarities, which are measured by a novel random walk-based procedure, are likely interacting with each other and may belong to the same protein complex. Results: Applying our algorithm to a yeast PPI network, we found that the interactions in the reconstructed network have higher biological relevance than in the original network, assessed by multiple types of information, including gene ontology, gene expression, essentiality, conservation between species, and known protein complexes. Comparison with existing methods shows that the network reconstructed by our method has the highest quality. Using two independent graph clustering algorithms, we found that the reconstructed network has resulted in significantly improved prediction accuracy of protein complexes. Furthermore, our method is applicable to PPI networks obtained with different experimental systems such as affinity purification, Y2H, and PCA, and evidence shows that the predicted edges are likely bona fide physical interactions.Item A randomized steiner tree approach for biomarker discovery and classification of breast cancer metastasis(UTSA Department of Computer Science, 2011-12) Jahid, Md. Jamiul; Ruan, JianhuaMotivation: DNA microarray has become an important tool to help identify biomarker genes for improving the prognosis of metastatic breast cancer - a leading cause of cancer-related deaths in women worldwide. Recently, pathway-level relationships between genes have been increasingly used to build more robust classification models which also can provide useful biological insights. Due to the unavailability of complete pathways, protein-protein interaction (PPI) network is becoming more popular to researcher and opens a way to investigate the developmental process of breast cancer. Here, a network-based method is proposed to combine microarray gene expression profiles and PPI network for biomarker discovery for breast cancer metastasis. The key idea is to identify a small number of genes to connect differentially expressed genes into a single component in a PPI network; these intermediate genes contain important information about the pathways involved in metastasis and have a high probability of being biomarkers. Results: We applied this approach on three breast cancer microarray datasets, and identified significant numbers of well-known biomarker genes for breast cancer metastasis. Those selected genes are enriched with biological processes and pathways related to carcinogenic process, and, importantly, have higher stability across different datasets than in previous studies. Furthermore, those genes significantly increased cross-data classification accuracy in breast cancer metastasis. The randomized Steiner tree based approach described here is a new way to discover biomarker genes for breast cancer, and improves the prediction accuracy of metastasis. The analysis is limited here only to breast cancer, but can be easily applied to other diseases.Item A Robust Personalized Classification Method for Breast Cancer Metastasis Prediction(2022-10-29) Adnan, Nahim; Najnin, Tanzira; Ruan, JianhuaAccurate prediction of breast cancer metastasis in the early stages of cancer diagnosis is crucial to reduce cancer-related deaths. With the availability of gene expression datasets, many machine-learning models have been proposed to predict breast cancer metastasis using thousands of genes simultaneously. However, the prediction accuracy of the models using gene expression often suffers from the diverse molecular characteristics across different datasets. Additionally, breast cancer is known to have many subtypes, which hinders the performance of the models aimed at all subtypes. To overcome the heterogeneous nature of breast cancer, we propose a method to obtain personalized classifiers that are trained on subsets of patients selected using the similarities between training and testing patients. Results on multiple independent datasets showed that our proposed approach significantly improved prediction accuracy compared to the models trained on the complete training dataset and models trained on specific cancer subtypes. Our results also showed that personalized classifiers trained on positively and negatively correlated patients outperformed classifiers trained only on positively correlated patients, highlighting the importance of selecting proper patient subsets for constructing personalized classifiers. Additionally, our proposed approach obtained more robust features than the other models and identified different features for different patients, making it a promising tool for designing personalized medicine for cancer patients.Item A Self-Supervised Learning Framework for Classifying Microarray Gene Expression Data(UTSA Department of Computer Science, 2006-10) Lu, Yijuan; Tian, Qi; Liu, Feng; Sanchez, Maribel; Wang, YufengIt is important to develop computational methods that can effectively resolve two intrinsic problems in microarray data: high dimensionality and small sample size. In this paper, we propose a self-supervised learning framework for classifying microarray gene expression data using Kernel Discriminant-EM (KDEM) algorithm. This framework applies self-supervised learning techniques in an optimal nonlinear discriminating subspace. It efficiently utilizes a large set of unlabeled data to compensate for the insufficiency of a small set of labeled data and it extends linear algorithm in DEM to kernel algorithm to handle nonlinearly separable data in a lower dimensional space. Extensive experiments on the Plasmodium falciparum expression profiles show the promising performance of the approach.Item A Verifiable, Control Flow Aware Constraint Analyzer for Bounds Check Elimination(UTSA Department of Computer Science, 2008-06) Niedzielski, David; Gampe, Andreas; von Ronne, Jeffery; Psarris, KleanthisThe Java programming language requires that out-of-bounds array accesses produce runtime exceptions. In general, this requires a dynamic bounds check each time an array element is accessed. However, if it can be proven that the array index does not exceed the bounds of the array, then the bounds check can be eliminated. We present a new algorithm based on extended Static Single Assignment (eSSA) form that builds a directed, weighted inequality graph representing control flow qualified, linear constraints among program variables derived from program statements. Our system then employs a customized depth-first search exploration algorithm to reason about relationships among variables, and provides a verifiable proof of its conclusions. This proof can be passed to and verified by a runtime system to minimize the run time performance impact of this analysis. Our system is novel in several respects. It takes into consideration both control flow and data flow when analyzing the constraint system; it handles general linear inequalities instead of simple difference constraints; and it provides verifiable proofs for its claims. We present experimental results which demonstrate that this method eliminates more bounds checks, and when combined with runtime verification, results in a lower runtime cost than prior work. Overall, this method improves performance by up to about 10% on the reported benchmarks.Item Achieving accurate and context-sensitive timing for code optimization(UTSA Department of Computer Science, 2008-01-18) Whaley, R. Clint; Castaldo, Anthony M.Key computational kernels must run near their peak efficiency for most high performance computing (HPC) applications. Getting this level of efficiency has always required extensive tuning of the kernel on a particular platform of interest. The success or failure of an optimization is usually measured by invoking a timer. Understanding how to build reliable and context-sensitive timers is one of the most neglected areas in HPC, and this results in a host of HPC software that looks good when reported in papers, but which delivers only a fraction of the reported performance when used by actual HPC applications. In this paper we motivate the importance of timer design, and then discuss the techniques and methodologies we have developed in order to accurately time HPC kernel routines for our well-known empirical tuning framework, ATLAS.Item Adaptive Discriminant Projection for Content-based Image Retrieval(UTSA Department of Computer Science, 2006-10) Yu, Jie; Tian, QiContent-based Image Retrieval (CBIR) is a computer vision application that aims at automatically retrieving images based on their visual content. Linear Discriminant Analysis and its variants have been widely used in CBIR applications because of their effectiveness in finding a projection that maps the original high-dimensional space to a low-dimensional one and preserves the most discriminant features. Those techniques assume images from certain class(es) are all visually similar and try to cluster them in the projected space. In this paper we show that the human high-level concept of semantic similarity between images may not arise only from the low-level visual similarity and consequently that assumption is inappropriate in many cases. We propose an Adaptive Discriminant Projection framework which could model different data distributions based on the clustering of different classes. To learn the best model fitting the real scenario, Boosted Adaptive Discriminant Projection is further proposed. Extensive experiments are designed to evaluate our methods and compare them to the state-of-the-art techniques on benchmark data set and real image retrieval applications. The results show the superior performance of our proposed methods.Item An Abstracting Transformation for Amino Acid Polymorphism(UTSA Department of Computer Science, 2012-02-21) Castaldo, Anthony M.At least within exons coding for proteins, we believe what is important is the sequence of amino acids produced, and because amino acids average about three possible codings, very different looking DNA sequences, S1 and S2, can code for precisely the same sequence of amino acids. This paper describes an abstracting transformation which replaces each letter in the DNA sequence with a single letter mask that can be matched instead, such that if S1 and S2 both code for the same sequence of amino acids their mask sequence will be identical. [...]Item An Attribute-Based Approach toward a Secured Smart-Home IoT Access Control and a Comparison with a Role-Based Approach(2022-01-25) Ameer, Safwa; Benson, James; Sandhu, RaviThe area of smart homes is one of the most popular for deploying smart connected devices. One of the most vulnerable aspects of smart homes is access control. Recent advances in IoT have led to several access control models being developed or adapted to IoT from other domains, with few specifically designed to meet the challenges of smart homes. Most of these models use role-based access control (RBAC) or attribute-based access control (ABAC) models. As of now, it is not clear what the advantages and disadvantages of ABAC over RBAC are in general, and in the context of smart-home IoT in particular. In this paper, we introduce HABACα, an attribute-based access control model for smart-home IoT. We formally define HABACα and demonstrate its features through two use-case scenarios and a proof-of-concept implementation. Furthermore, we present an analysis of HABACα as compared to the previously published EGRBAC (extended generalized role-based access control) model for smart-home IoT by first describing approaches for constructing HABACα specification from EGRBAC and vice versa in order to compare the theoretical expressiveness power of these models, and second, analyzing HABACα and EGRBAC models against standard criteria for access control models. Our findings suggest that a hybrid model that combines both HABACα and EGRBAC capabilities may be the most suitable for smart-home IoT, and probably more generally.Item An Elastic Mixed-Criticality Task Model and Early-Release EDF Scheduling Algorithms(UTSA Department of Computer Science, 2016-02) Su, Hang; Zhu, Dakai; Brandt, ScottMany algorithms have recently been studied for scheduling mixed-criticality (MC) tasks. However, most existing MC scheduling algorithms guarantee the timely executions of high-criticality (HC) tasks at the expense of discarding low-criticality (LC) tasks, which can cause serious service interruption for such tasks. In this work, aiming at providing guaranteed services for LC tasks, we study an Elastic Mixed-Criticality (E-MC) task model for dual-criticality systems. Specifically, the model allows each LC task to specify its maximum period (i.e., minimum service level) and a set of early-release points. We propose an Early-Release (ER) mechanism that enables LC tasks be released more frequently and thus improve their service levels at runtime, with both conservative and aggressive approaches to exploiting system slack being considered, which is applied to both EDF and preference-oriented earliest-deadline (POED) schedulers. We formally prove the correctness of the proposed ER-EDF scheduler on guaranteeing the timeliness of all tasks through judicious management of the early releases of LC tasks. The proposed model and schedulers are evaluated through extensive simulations. The results show that, by moderately relaxing the service requirements of LC tasks in MC task sets (i.e., by having LC tasks’ maximum periods in the E-MC model be 2 to 3 times of their desired MC periods), most transformed E-MC task sets can be successfully scheduled without sacrificing the timeliness of HC tasks. Moreover, with the proposed ER mechanism, the runtime performance of tasks (e.g., execution frequencies of LC tasks, response times and jitters of HC tasks) can be significantly improved under the ER schedulers when compared to that of the state-of-the-art EDF-VD scheduler.Item An Empirical Study of Continuous Integration Failures in TravisTorrent(UTSA Department of Computer Science, 2018-07-03) Hassan, Foyzul; Wang, XiaoyinContinuous Integration (CI) is a widely used development practice where developers integrate their work after submitting code changes at central repository. CI servers usually monitor central repository for code change submission and automatically build software with changed code, perform unit testing, integration testing and provide test summary report. If build or test fails developers fix those issues and submit the code changes. Continuous submission of code modification by developers and build latency time creates stalls at CI server build pipeline and hence developers have to wait long time to get build outcome. In this paper, we categorize CI failures from TravisTorrent data set and studied the frequency of different failures. Our study on 1,187 CI failures from TravisTorrent data set shows that 829 of the CI failures are test failures, compared with 252 build script errors, 49 JavaDoc errors, and 67 stylechecking errors. It also reveals that 10.8% of CI-failure repair commits contains only build script revisions, and 25.6% CI-failure repair commits involve at least one build script revision.Furthermore, we proposed build prediction model that uses TravisTorrent data set with build error log clustering and AST level code change modification data to predict whether a build will be successful or not without attempting actual build so that developer can get early build outcome result. With the proposed model we can predict build outcome with an average F-Measure over 87% on all three build systems (Ant, Maven, Gradle) under the cross-project prediction scenario.Item An Optimal Boundary-Fair Scheduling Algorithm for Multiprocessor Real-Time Systems(UTSA Department of Computer Science, 2009-06) Zhu, Dakai; Qi, Xuan; Mossé, Daniel; Melhem, RamiAlthough the scheduling problem for multiprocessor real-time systems has been studied for decades, it is still an evolving research field with many open problems. In this work, focusing on periodic real-time tasks, we propose a novel optimal scheduling algorithm, namely boundary fair (Bfair), which follows the same line of research as the well-known Pfair scheduling algorithms and can also achieve full system utilization. However, different from the Pfair algorithms that make scheduling decisions at every time unit to enforce proportional progress (i.e., fairness) for each task, Bfair makes scheduling decisions and enforces fairness to tasks only at tasks’ period boundaries. The correctness of the Bfair algorithm to meet the deadlines of all tasks’ instances is formally proved. The performance of Bfair is evaluated through extensive simulations. The results show that, compared to that of the Pfair algorithms, Bfair can significantly reduces the number of scheduling points (by upto 94%) and the time overhead of Bfair is comparable to that of the most efficient Pfair algorithm (i.e., P D^2). Moreover, by aggregating the time allocation of tasks for the time interval between consecutive period boundaries, the resulting Bfair schedule needs dramatically reduced number of context switches and task migrations, as low as 18% and 15%, respectively, when compared to those of Pfair schedules.Item ATLAS Installation Guide(UTSA Department of Computer Science, 2008-01-25) Whaley, R. ClintThis note provides a brief overview of ATLAS, and describes how to install it. It includes extensive discussion of common configure options, and describes why they might be employed on various platforms. In addition to discussing how to configure and build the ATLAS package, this note also describes how an installer can confirm that the resulting libraries are producing correct answers and running efficiently. Extensive examples are provided, including a full-length example showing the installation of both ATLAS and LAPACK on an example architecture.Item Augmented Stabilized Formulations with Fictitious Boundary Methods(UTSA Department of Computer Science, 2016-12-12) Ranjan, R.; Feng, Y.; Chronopoulos, A.Augmentation of the SUPS formulation was introduced earlier and successfully applied to solving incompressible Navier- Stokes equations for both two dimensional and three dimensional problems. Fictitious boundary methods (FBM) is a new methodology that aid the study of flow descriptions around solid obstacles on fixed Cartesian product grids that do not require body confirming meshes. FBM has been applied to lower order finite element and spectral/hp least squares techniques. We test the augmented stabilized finite element formulation introduced earlier, (ASUPS) to the fictitious boundary context and use it to solve incompressible flow problems. Utilizing the advantages of fictitious boundary methods we present solutions to flow around an array of two dimensional and three dimensional problems. In two dimensional flow computations we solve flow past a circular and elliptical shaped cylinders. For the ellipse shaped obstacles in a Newtonian flow field we examine the effects of varying boundary conditions and aspect ratios on the flow metrics. Finally we extend the procedures to solving two ellipse and two circular shaped obstacles facing the free stream. In three dimensional computations we examine incompressible flow around a three dimensional ellipse shaped obstacle at Reynolds number Re-200.