Time Series Analysis and Machine Learning
Authors:Sophie D'Arcy, Nazli Dereli, Regie Felix
Mentor:Ambuj Singh, Professor of Computer Science and Biomolecular Science and Engineering, University of California Santa Barbara
The discipline of our research is computer science and biology, or bioinformatics. Time series is a sequence of data that is taken in consistent time intervals. One is able to analyze trends within the data and use them to predict what will happen in the future. This process is called time series analysis. We analyzed a dataset that expressed the amount of air passengers from 1949 to 1960 and developed a model (using R data mining) that correctly illustrated the data. Another type of analysis is categorizing the data via time series classification. Machine learning techniques, such as decision trees and artificial neural networks, are used for this type of classification. For this analysis, we used a UCI KDD dataset of EEG sensor values from 20 patients (10 alcoholic and 10 non-alcoholics) while they were looking at three different stimuli: one picture, two pictures that match, and two pictures that do not match. Our goal was to correctly classify the data so that by just the EEG results, the model would be able to predict the status of the patient. We used Java to preprocess our data, R Data Mining to modify or data, and Weka to classify the data. Our accuracy for both decision trees and ANN were low at first; we then tried revising our project by trying different sensors, increasing the number of data points, and different classifiers. This classification project is on-going; we are still trying more machine learning techniques to increase the accuracy of our model.