Studying the Effect of Class Imbalance in Ocean Turbine Fault Data on Reliable State Detection

Conference Paper

Title: Studying the Effect of Class Imbalance in Ocean Turbine Fault Data on Reliable State Detection

Author:

Duhaney, J.; Khoshgoftaar, T.; Napolitano, A.

Publication Date:

December 12, 2012

Event Name:

11th International Conference on Machine Learning and Applications (ICMLA 2012)

Publisher:

Institute of Electrical and Electronics Engineers

Affiliation:

Florida Atlantic University

Technology:

Current, Ocean Current

Language:

English

Document Access

Website:

External Link

Citation

Duhaney, J.; Khoshgoftaar, T.; Napolitano, A. (2012). Studying the Effect of Class Imbalance in Ocean Turbine Fault Data on Reliable State Detection. Paper presented at 11th International Conference on Machine Learning and Applications (ICMLA 2012)

@conference{Duhaney-2012-2412,
author = {Duhaney, J and Khoshgoftaar, T and Napolitano, A},
title = {Studying the Effect of Class Imbalance in Ocean Turbine Fault Data on Reliable State Detection},
year = {2012},
month = {dec},
series = {11th International Conference on Machine Learning and Applications (ICMLA 2012)},
publisher = {Institute of Electrical and Electronics Engineers},
url = {https://ieeexplore.ieee.org/document/6406674},
keywords = {Current, Ocean Current},
}

Export Citation to BibTex

TY - CONF
TI - Studying the Effect of Class Imbalance in Ocean Turbine Fault Data on Reliable State Detection
AU - Duhaney, J
AU - Khoshgoftaar, T
AU - Napolitano, A
T2 - 11th International Conference on Machine Learning and Applications (ICMLA 2012)
AB - Class imbalance is prevalent in many real world datasets. It occurs when there are significantly fewer examples in one or more classes in a dataset compared to the number of instances in the remaining classes. When trained on highly imbalanced datasets, traditional machine learning techniques can often simply ignore the minority class(es) and label all instances as being of the majority class to maximize accuracy. This problem has been studied in many domains but there is little or no research related to the effect of class imbalance in fault data for condition monitoring of an ocean turbine. This study makes the first efforts in bridging that gap by providing insight into how class imbalance in vibration data can impact a learner's ability to reliably identify changes in the ocean turbine's operational state. To do so, we empirically evaluate the performances of three popular, but very different, machine learning algorithms when trained on four datasets with varying class distributions (one balanced and three imbalanced) to distinguish between a normal and an abnormal state. All data used in this study were collected from the testbed for an ocean turbine and were under sampled to simulate the different levels of imbalance. We find here, as in other domains, that the three learners seemed to suffer overall when trained on data with a highly skewed class distribution (with 0.1% examples in a faulty/abnormal state while the remaining 99.9% were captured in a normal operational state). It was noted, however, that the Logistic Regression and Decision Tree classifiers performed better when only 5% of the total number of examples were representative of an abnormal state (the remaining 95% therefore indicating normal operation) than they did when there was no imbalance present.
DA - 2012/12//
PY - 2012
PB - Institute of Electrical and Electronics Engineers
UR - https://ieeexplore.ieee.org/document/6406674
LA - English
KW - Current
KW - Ocean Current
ER -

Export Citation to RIS

Abstract

Class imbalance is prevalent in many real world datasets. It occurs when there are significantly fewer examples in one or more classes in a dataset compared to the number of instances in the remaining classes. When trained on highly imbalanced datasets, traditional machine learning techniques can often simply ignore the minority class(es) and label all instances as being of the majority class to maximize accuracy. This problem has been studied in many domains but there is little or no research related to the effect of class imbalance in fault data for condition monitoring of an ocean turbine. This study makes the first efforts in bridging that gap by providing insight into how class imbalance in vibration data can impact a learner's ability to reliably identify changes in the ocean turbine's operational state. To do so, we empirically evaluate the performances of three popular, but very different, machine learning algorithms when trained on four datasets with varying class distributions (one balanced and three imbalanced) to distinguish between a normal and an abnormal state. All data used in this study were collected from the testbed for an ocean turbine and were under sampled to simulate the different levels of imbalance. We find here, as in other domains, that the three learners seemed to suffer overall when trained on data with a highly skewed class distribution (with 0.1% examples in a faulty/abnormal state while the remaining 99.9% were captured in a normal operational state). It was noted, however, that the Logistic Regression and Decision Tree classifiers performed better when only 5% of the total number of examples were representative of an abnormal state (the remaining 95% therefore indicating normal operation) than they did when there was no imbalance present.