Log File Processing – Changes, Challenges, and Chance
Log files are plain text files written by any modern computer system. Their content is determined by logging statements during software development. Thus, any kind of event can be collected during runtime. As a result, logs contain a wealth of precious information that is used for various analyses. However, current log processing methods assume static log structures and contents. Due to today’s focus on agile software development paradigms, source code and therein contained logging statements can constantly change. This leads to failing preprocessing and incorrect data points for subsequent analyses. If those failures are not detected, faulty information will be extracted which will result in erroneous insights. This dissertation targets successful log file processing from different angles using artificial intelligence approaches. First, missing data points induced by changing software are addressed. Using Magnetic Resonance Imaging (MRI) examination data as an example, new variables are added to the logging mechanism with growing requirements. As a consequence, logs from systems with earlier software version are missing the new features. Therefore, we propose classification techniques to learn feature correlations from complete data sets. Afterwards, these trained models are applied to impute data points where the respective feature is missing. We prove the effectiveness of a feed forward neural network by successfully determining the desired feature for more than 94% of the MRI exams. Furthermore, missing information can also originate from failing parsers. Despite availability within the raw logs, crucial data points might not be extracted correctly. Since software changes can entail log file structure adaptions, they can pose insurmountable challenges to existing log parsing methods. We investigate Hidden Markov Models and Deep Learning methods in order to process log files automatically and reliably despite those challenges. We study different log file data sets from various systems with manifold log changes and outperform state-of-the-art parsers. Thus, we propose a novel pipeline for flexible parsing. It contains a stateful Long Short-Term Memory (LSTM) network to model adaptability regarding log changes. We call it FlexParser and yield for all studied data sets an F1-Score of 92% and higher. Successfully parsed log files hold manifold chances to find actionable insights. Therefore, we process the parsing results of MRI log files in order to predict hardware failures. High image quality is crucial for medical diagnosis, however, directly depends next to the selected imaging parameters also on flawless hardware. Following, we train Deep Learning models on image features and respective recorded hardware conditions. Since available data points from failing components is limited, we employ different data augmentation techniques. Furthermore, we investigate Ensemble Learning to combine insights from different models and compare results with those achieved by time series methods. Concluding, we propose an Ensemble Learning pipeline which reliably detects hardware failures achieving an F1-Score of 94.14%.