Skip to content
TimesFM: A Robust, Decoder-only Foundation Model for Time-Series Forecasting

TimesFM: A Robust, Decoder-only Foundation Model for Time-Series Forecasting

Time-series forecasting finds use in a plethora of domains, such as retail, finance, manufacturing, healthcare and natural sciences. For instance, in retail, it has been observed that enhancing the accuracy of demand forecasting can significantly cut down inventory costs and bloat revenue. While deep learning (DL) models have proven successful in forecasting diverse, multivariate, time-series data, they present several challenges.

Most DL architectures necessitate lengthy and involved training and validation sequences before a user can test the model on new time-series data. To counter this problem, Google researchers Rajat Sen and Yichen Zhou, introduced a foundation model for time-series forecasting called TimesFM in the study "A decoder-only foundation model for time-series forecasting". Unlike most DL architectures, TimesFM offers decent out-of-the-box forecasts on unseen time-series data, delivering preliminary results without any additional training.

This article walks through the mechanics of TimesFM, examining its working, its pretraining requirements, and comparative performance analysis. But before that, let's briefly understand the concept of foundation models in the context of time-series forecasting.

A foundation model aims to adapt to variable context i.e., what we observe, and horizon (what we query the model to forecast) lengths while possessing enough capacity to encode all patterns from a large pretraining dataset. The TimesFM model uses stacked transformer layers as its principal blocks. A group of contiguous time-points, referred to as a patch, is treated as a token for forecasting. The model is then tasked to forecast the (i+1)-th patch of time-points, given the i-th output.

But before any time-series forecasting can be done using TimesFM, or any other model for that matter, a large volume of legitimate time-series data is required. The authors found that synthetic data could help with the basics, and real-world data added real-world aspects for enhancing the learning ability of the model.

The team evaluated TimesFM zero-shot on unseen data present in popular time-series benchmarks. They noted that TimesFM performed better than most statistical methods like ARIMA (AutoRegressive Integrated Moving Average), ETS (Error, Trend, Seasonality) and could even match or surpass powerful DL models like DeepAR, PatchTST that were explicitly trained on the target time-series.

Disclaimer: The above article was written with the assistance of AI. The original sources can be found on Google AI.