Explainability with Semantic Concept Composition and Zero-Shot Learning for Anomaly Detection




Bendre, Nihar Shrikant

Journal Title

Journal ISSN

Volume Title



Video analytics, an important research problem, has been well-studied within diverse research areas and application domains like anomaly detection, safety and explainability, etc. Millions of connected devices like connected cameras and streaming videos are introduced to smart cities every year, which are valuable source of information. However, such rich source of information is mostly left untapped. Furthermore, current video analytics towards anomaly systems cannot detect anomalies without human intervention and are heavily dependent on humans for such anomalies detection and prevention. These systems are susceptible to human failure like fatigue, length of monitoring capacity, process, retention and delivery to recollect information from multiple resources. In conjunction with this difficulty, traditional deep learning has black-box approaches thereby failing to provide a logical reason behind its decisions, thus raising concerns over its use in safety-critical applications like autonomous driving, health-care, etc. To overcome the following limitations, it is of utmost importance to have an `Explainable Intelligent System' which can process and highlight salient features by understanding the compositionality and filter out anomalous events from video input with minimum human intervention also providing human-like explanations to support its decision. For an intelligent system, the key is to have an comprehend and classify human behavior as normal or an anomaly. To achieve this, it is imperative to understand human behavior in a context of making compositionality exploitation an essential function in these systems. We propose a multi-modal (visual and textual) Artificial Intelligence system to exploit and provide explanations involved in decision making. The proposed system is designed, not only, to identify patterns with abundant data availability using traditional deep learning approaches but also, to abstract and learn new information in cases where data is limited or zero. We designed a Dynamic Neural Network (DMN) to better exploit compositionality of the system which can comprehend a textual query and thereby predicting an answer to the said query by dynamically assembling multiple relatively shallow deep learning modules. This further helps us achieve better knowledge and insight to understand complex questions and reason why the module provides a particular answer.


This item is available only to currently enrolled UTSA students, faculty or staff. To download, navigate to Log In in the top right-hand corner of this screen, then select Log in with my UTSA ID.


Cameras, Metadata, Datasets, Experiments, Optimization, Neural networks, Taxonomy, Batch processing, Algorithms, Confidence intervals, Ablation, Learning, Semantics, Computer science, Information technology, Video analytics, Compositionality, Detecting anomalies, Multi-modal, Decision making explanations



Electrical and Computer Engineering