Add Thesis

Big Data Visualization: Methods and Software

Written by Anonymous

Paper category

Master Thesis


Business Administration>General




Master Thesis: We currently live in a data-driven world and face huge challenges and opportunities in the efficient and effective use of data. As of 2012, data was created at an alarming rate, doubling 2.5 exabytes approximately every 40 months. 2 Steady improvements in storage capacity and data collection methods have profoundly affected the way data is processed today. With the feasibility of using data in the decision-making process, the possibility of generating and storing data is evolving at a faster rate. Simply acquiring data is no longer a driving issue, but understanding the available data, using appropriate methods and models to generate knowledge from it, and integrating and using this knowledge in the future decision-making process is a daunting challenge. This gap in available information and the inherent knowledge that has not been extracted from it are fully in line with the quotation of Naisbitt (1982) nearly four years ago. Especially for enterprises and decision makers, timely fundamental analysis based on relevant data is made. However, when dealing with the diversity and quantity of today's data, traditional analysis methods have reached their limits. Today, companies are faced with massive amounts of data that provide an inherent value that can be obtained by using complex software and algorithms that are mainly used to process big data. One of the main goals of big data analysis is to detect and discover patterns in massive amounts of data. Visualizing massive amounts of data makes it easier for humans to use perception and cognitive abilities to obtain these data. The interdisciplinary approach of visual analysis (VA) addresses the dynamic interaction between domain experts’ knowledge, human perception and cognitive abilities, and machines including computing power. Visual analysis includes various highly related research fields, such as visualization, data mining, data management, and statistics. The core idea of ​​VA is to combine these research fields into one. Available commercial software solutions solve this information overload problem by providing the most advanced visualization technology, and enable users who do not know any programming language to effectively explore and visualize the corresponding data. 1.1 Research Questions This article aims at different visualization methods of large-dimensional data sets and decision support systems to create hybrid intelligence of domain experts and machines. 1.2 Outline of the paper The rest of the paper is structured as follows: Chapter 2 reviews the basic literature on visualization methods, decision-making processes, VA frameworks and corresponding model extensions. Chapter 3 distinguishes between open source VA toolkits and select self-service business intelligence applications dedicated to the visualization of large data sets. Chapter 4 reviews statistical methods for processing big data, especially high-dimensional data visualization. In particular, it explains dimensionality reduction techniques and clustering algorithms. Chapter 5 describes the application method of developing a Web-based hybrid intelligent VA application that is dedicated to amplifying decision-making by including domain experts in analysis to provide support. Chapter 6 provides the application of the developed solution in three general case studies. Chapter 7 describes the limitations of the application and further prospects for researchers and practitioners. Finally, Chapter 8 concludes this article. Living in a data-driven world provides many challenges for companies and research to effectively use potential insights and knowledge to incorporate them into decision-making processes. However, when executed in an orderly manner, the competitive advantages and possibilities of using this knowledge outweigh the challenges. This paper focuses on the possibility of integrating VA into the data-driven decision-making process to form a hybrid intelligent system that integrates the main advantages of humans and computers. The current state-of-the-art visualization and DSS methods provide a basis for further research. Initiated by the VA process model, the tight integration of domain experts and computers has been further studied. This development provides a more detailed view of how to effectively integrate domain experts through various interactive taxonomies. Leading software applications have solved the problem of integrating users, while ordinary statistical programming languages ​​do not have the ability to effectively and interactively process data. Two leading softwares are described, their specific functions, advantages and special disadvantages, and the possibility of statistical calculations. Read Less