Forecasting the cumulative cases of COVID-19 in four large Brazilian cities using machine learning approaches

Abstract

The Coronavirus disease 2019 (COVID-19) is a disease responsible for infecting millions of people since the first notification until nowadays. Developing efficient short-term forecasting models allow knowing the number of future COVID-19 cases. In this context, it is possible to develop strategic planning in the public health system to avoid deaths. In this paper, autoregressive integrated moving average (ARIMA), and machine learning approaches called cubist regression (CUBIST), k-Nearest Neighbor (kNN), support vector regression (SVR), and stacked generalization (STACK) are evaluated in the task of time series forecasting six-days-ahead of the COVID-19 cumulative confirmed cases in four Brazilian cities with high daily incidence. In the STACK approach, the kNN and SVR models are adopted as base-learners and CUBIST as meta-learner. The models' effectiveness is evaluated based on performance metrics including improvement index, mean absolute error, root mean squared error, and symmetric mean absolute percentage error. In most of the evaluated COVID-19 cases, the STACK reached a better performance regarding adopted criteria when compared with other models. In general, the developed models can generate accurate forecasting, achieving errors in a range of 0.28% - 1.62% in six-days-ahead. The ranking of the models in most scenarios is STACK, ARIMA, SVR, and kNN. The use of evaluated models is recommended to forecast and monitor the ongoing growth of COVID-19 cases, once these models can assist the managers in the decision-making support systems.

Publication
INnovation for Systems Information and Decision meeting (INSID)

Related