Hyper-sequence-graph mining

dc.contributor.advisorKorkmaz, Turgay
dc.contributor.advisorWang, Yufeng
dc.contributor.authorYu, Xinran
dc.contributor.committeeMemberKorkmaz, Turgay
dc.contributor.committeeMemberWang, Yufeng
dc.contributor.committeeMemberGibson, Matthew
dc.contributor.committeeMemberRuan, Jianhua
dc.contributor.committeeMemberTosun, Ali
dc.descriptionThis item is available only to currently enrolled UTSA students, faculty or staff. To download, navigate to Log In in the top right-hand corner of this screen, then select Log in with my UTSA ID.
dc.description.abstractThe era of Big Data has already begun with the promises of several potential benefits in various domains including education, healthcare, economy and so on. Understanding and fully utilizing these potential benefits requires to address many technical issues in various stages from data acquisition to interpretation. One of the key stages is how to extract useful information from real life datasets. Given that this is a very challenging problem even with small size datasets, the research community has been extensively investigating various data mining and searching algorithms in the context of various types of datasets. In this dissertation, a structure called hyper-sequence-graph is investigated. A hyper-sequence-graph represents both sequential relationships and non-sequential relationships in a dataset on one structure. Based on these two types of relationships, the focus will be on mining datasets and exploring useful information. From the sequential attribute site, we consider the well-known frequent pattern mining (FPM) problem and introduce a new form of pattern mining called super-sequence frequent pattern mining (SS-FPM). To solve SS-FPM problem, a directed weighted graph called sequence graph is generated form the given sequential dataset, and then a heuristic algorithm using adjacency matrix iteration technique is proposed. Based on SS-FPM, we applied the method to various real world datasets, such as ADLs, weblog dataset and bioinformatics dataset. At last, a SS-FPM based partial clustering method is investigated. From the non-sequential attribute site, we consider the grouping relationship that is readily available in a sequential dataset. To represent such relationship, hypergraphs are used and the focus is on how to search sub-structures in the large hypergraph. In this direction, we generalize the structure-based indexing from simple graphs to hypergraphs and propose an efficient verification method that can accelerate the sub-hypergraph matching process. Through experiments, the efficiency and effectiveness of the proposed solutions are shown.
dc.description.departmentComputer Science
dc.format.extent129 pages
dc.subjectSearching algorithm
dc.subjectSequence graph
dc.subject.classificationComputer science
dc.subject.lcshBig data
dc.subject.lcshData mining
dc.subject.lcshAlgorithms -- Data processing
dc.titleHyper-sequence-graph mining
thesis.degree.departmentComputer Science
thesis.degree.grantorUniversity of Texas at San Antonio
thesis.degree.nameDoctor of Philosophy


Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
1.57 MB
Adobe Portable Document Format