The Solution:
The process of feature selection includes not only deciding which attributes to use in the classifier, but also the number of time samples, used to make each decision, and whether to perform a Pre-processing transformation on these input time series. Some of the attributes are not strongly correlated with future drive failures and including these attributes can have a negative impact on classifier performance.
As it is computationally expensive to try all combinations of attribute values, we are using the fast-nonparametric reverse-arrangements test and attribute z-scores within Decision Sciences Factor (DSF) to identify potentially useful failure identification algorithms.
Highlights:
DSF can significantly improve the current performance of the hard drive failure prediction algorithms.
Increased accuracy of detection, benefitting users to back up their data.
DSF Insights gained, can be used in other areas where rare events must be forecast from noisy, nonparametric time series, such as in the prediction of rare diseases, electronic and mechanical device failures etc.