Home > Articles > All Issues > 2025 > Volume 16 Number 1 (2025) >
doi: 10.18178/ijesd.2025.16.1.1507
Prediction of Water Quality Index (WQI) Using Machine Learning
2School of Engineering department, King Mongkut's Institute of Technology Ladkrabang, Bangkok, Thailand
Email: kunyanuth.ku@ssru.ac.th (K.K.); nareenart.ra@ssru.ac.th (N.R.); chongrag.bo@kmitl.ac.th (C.B.)
*Corresponding author
Abstract—The purpose of this project is to assess Water
Quality Index (WQI) by using five machine learning techniques
including the Support Vector Machine (SVM), Random Forest
(RF), Decision Tree (DF), Extreme Gradient Boosting
(XGBoost), and Adaptive Boosting (AdaBoost). In this case, we
are using Thailand as a base country for assessing water quality
of the rivers and canals. The data set was collected from
Bangkok Metropolitan Authority of Thailand during the period
January 2018 to January 2021. The data set included 43,776
records and each record comprised 12 quantitative
measurements related to water quality. Hence, they were used
as feature inputs of the assessment model; for instance, pH, DO
(Dissolved Oxygen), BOD (Biochemical Oxygen Demand), TP
(Total Phosphorus), TCB (Total Coloniform Bacteria), FCB
(Fecal Coloniform Bacteria), NO3-N (Nitrogen-Nitrogen), No2-N
(Nitrogen-Suspended Solid), NH3-N (Ammonia-Nitrogen), TS
(Total Solid), and Total Dissolved Solid (TDS). During the phase
of preprocessing K-Nearest Neighbors (KNN) and Random
Forest were employed to handle missing data and detecting
outliers. KNN imputation was applied to address missing values,
while Random Forest was implemented to eliminate outliers, so
generating the dataset appropriate for model training. The
effectiveness of each machine learning model was assessed
employing four principal metrics: accuracy, precision, recall,
and F1 score. The findings revealed that all five methodologies
excelled in predicting WQI; however, the XGBoost model
surpassed the others, attaining the highest values across all
metrics, including an accuracy of 91%.
Keywords—water quality index, machine learning, data
imputation, SVM, random forest, decision tree, XGBoost
Cite: Kunyanuth Kularbphettong, Nareenart Raksuntorn, and Chongrag Boonseng, "Prediction of Water Quality Index (WQI) Using Machine Learning," International Journal of Environmental Science and Development vol. 16, no. 1, pp. 34-40, 2025.
Copyright © 2025 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).