Abstract
Accurate prediction of Significant Wave Height (SWH) is critical for the planning and development of renewable wave energy systems. However, conventional machine learning (ML) models often rely on large, comprehensive datasets to achieve acceptable accuracy, a condition that is often costly, time consuming, or impractical in real world coastal regions. This study presents a CatBoost ML model specifically designed to address these challenges. The model achieves high predictive accuracy using small datasets, making it particularly suitable for environments with limited data. Trained on historical buoy data from four Australian coastal locations, the model is deployed within a user-friendly graphical user interface (GUI), enabling intuitive SWH forecasting over various time horizons (from one month to one year). When validated against real buoy measurements, the model achieved accuracy levels between 93 % and 97 %, outperforming comparable approaches such as XGBoost, Random Forest, and Decision Tree algorithms. Beyond short term forecasting, the system also offers valuable insights into wave climate characterization, supporting the data driven selection of optimal sites for wave energy converter installation. This integrated framework holds significant promise for advancing marine renewable energy planning, coastal engineering, and oceanographic research in settings with sparse data availability.