International Journal of Environmental Science and Development

Citescore

1.6

Volume 16 Number 1 (2025)

Home > Articles > All Issues > 2025 > Volume 16 Number 1 (2025) >
IJESD 2025 Vol.16(1): 34-40
doi: 10.18178/ijesd.2025.16.1.1507

Prediction of Water Quality Index (WQI) Using Machine Learning

Kunyanuth Kularbphettong1,*, Nareenart Raksuntorn1, and Chongrag Boonseng2
1Faculty of Science and Technology, Suan Sunandha Rajabhat University, Bangkok, Thailand
2School of Engineering department, King Mongkut's Institute of Technology Ladkrabang, Bangkok, Thailand
Email: kunyanuth.ku@ssru.ac.th (K.K.); nareenart.ra@ssru.ac.th (N.R.); chongrag.bo@kmitl.ac.th (C.B.)
*Corresponding author
Manuscript received August 20, 2024; revised October 8, 2024; accepted October 15, 2024; published January 20, 2025

Abstract—The purpose of this project is to assess Water Quality Index (WQI) by using five machine learning techniques including the Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DF), Extreme Gradient Boosting (XGBoost), and Adaptive Boosting (AdaBoost). In this case, we are using Thailand as a base country for assessing water quality of the rivers and canals. The data set was collected from Bangkok Metropolitan Authority of Thailand during the period January 2018 to January 2021. The data set included 43,776 records and each record comprised 12 quantitative measurements related to water quality. Hence, they were used as feature inputs of the assessment model; for instance, pH, DO (Dissolved Oxygen), BOD (Biochemical Oxygen Demand), TP (Total Phosphorus), TCB (Total Coloniform Bacteria), FCB (Fecal Coloniform Bacteria), NO3-N (Nitrogen-Nitrogen), No2-N (Nitrogen-Suspended Solid), NH3-N (Ammonia-Nitrogen), TS (Total Solid), and Total Dissolved Solid (TDS). During the phase of preprocessing K-Nearest Neighbors (KNN) and Random Forest were employed to handle missing data and detecting outliers. KNN imputation was applied to address missing values, while Random Forest was implemented to eliminate outliers, so generating the dataset appropriate for model training. The effectiveness of each machine learning model was assessed employing four principal metrics: accuracy, precision, recall, and F1 score. The findings revealed that all five methodologies excelled in predicting WQI; however, the XGBoost model surpassed the others, attaining the highest values across all metrics, including an accuracy of 91%.

Keywords—water quality index, machine learning, data imputation, SVM, random forest, decision tree, XGBoost

[PDF]

Cite: Kunyanuth Kularbphettong, Nareenart Raksuntorn, and Chongrag Boonseng, "Prediction of Water Quality Index (WQI) Using Machine Learning," International Journal of Environmental Science and Development vol. 16, no. 1, pp. 34-40, 2025.

Copyright © 2025 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).