YottaCloud: A cloud architecture for secure big image analytics
With the ever-increasing growth of cameras on smart phones and wearable devices, images have considerable value in our daily life. The past few years, secure image analytics has found its use where reliable security in storage and transmission of digital images is needed in many applications, such as medical imaging systems, mobile check deposit, electronic commerce, online photograph album, and military image communication. The amount of images being uploaded to clouds are rapidly increasing with Pinterest, Facebook, and Twitter users alone uploading over billion new photos every month. The real-time at scale data movement of images for processing lead to infrastructure bottlenecks and requires novel cloud network and storage architectures. With this exponential growth, the main question is how to store, secure, manage, analyze, and discover knowledge from the collected data in a timely manner. Future cloud models will need to exhibit a high degree of data and computation locality and parallelism required for real-time discovery of relevant data by performing lookup using content search over this massive image data sets.
This dissertation proposes an architecture for large scale cluster computing systems that can address emerging big data image processing workloads. In this dissertation, we sketch the idea of expanding the cloud storage capabilities from only storing images to also performing analytics by moving and executing user defined programs near the data inside of a storage cloud. The philosophy behind this approach is to package applications then move the application to the data, rather than moving data to where the application is located. Whereas early cluster computing systems, like Spark and MapReduce, handled streaming and batch processing, our architecture also enables streaming and batch processing, while keeping the scalability and data security of previous systems. However, unlike the specialized systems proposed for some of these workloads, our architecture allows storage and computation scheduling to be defined and configurable per workload topology, enabling rich new applications that intermix, for example, streaming and batch processing, or complex iterative machine learning analytics.
Finally, we explore the generality of the proposed architecture from both a theoretical modeling perspective and a practical perspective by developing Containerized Image Understanding and Machine Learning use cases such as image segmentation, classification, and encryption for image processing and recognition.