Hyper-sequence-graph mining
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The era of Big Data has already begun with the promises of several potential benefits in various domains including education, healthcare, economy and so on. Understanding and fully utilizing these potential benefits requires to address many technical issues in various stages from data acquisition to interpretation. One of the key stages is how to extract useful information from real life datasets. Given that this is a very challenging problem even with small size datasets, the research community has been extensively investigating various data mining and searching algorithms in the context of various types of datasets. In this dissertation, a structure called hyper-sequence-graph is investigated. A hyper-sequence-graph represents both sequential relationships and non-sequential relationships in a dataset on one structure. Based on these two types of relationships, the focus will be on mining datasets and exploring useful information. From the sequential attribute site, we consider the well-known frequent pattern mining (FPM) problem and introduce a new form of pattern mining called super-sequence frequent pattern mining (SS-FPM). To solve SS-FPM problem, a directed weighted graph called sequence graph is generated form the given sequential dataset, and then a heuristic algorithm using adjacency matrix iteration technique is proposed. Based on SS-FPM, we applied the method to various real world datasets, such as ADLs, weblog dataset and bioinformatics dataset. At last, a SS-FPM based partial clustering method is investigated. From the non-sequential attribute site, we consider the grouping relationship that is readily available in a sequential dataset. To represent such relationship, hypergraphs are used and the focus is on how to search sub-structures in the large hypergraph. In this direction, we generalize the structure-based indexing from simple graphs to hypergraphs and propose an efficient verification method that can accelerate the sub-hypergraph matching process. Through experiments, the efficiency and effectiveness of the proposed solutions are shown.