Comprehensive Tutorial on Time Series Modelling and Forecasting (opens in new tab)

(kanoki.org)

279 pointsmin2bro6y ago48 comments

48 comments

34 comments · 10 top-level

riyadparvez6y ago· 9 in thread

Is there any other good resource on time series modeling and forecasting other than exponential smoothing and variants of ARIMA? Pretty much every tutotial on the web is on exponential smoothing and ARIMA or some lazy LSTM tutorials.

em5006y ago

Some good free textbooks are Rob Hyndman's online book https://otexts.com/fpp2/ and Brockwell and Davis' old textbook https://link.springer.com/book/10.1007/978-3-319-29854-2. They focus much on ARIMA and exponential smoothers, because most time series data are pretty small sized (a few dozens to at most a few thousand samples), so there's really not that much else that can do.

Most of Hyndman's textbook approaches (mostly ARIMA and various exponential smoothers) are implemented in his 'forecast' R package.

ARIMA and exponential smoothers tend to be a bit hard to get working well on daily data (they come from the era where most data was monthly or quarterly). A modern take on classical frequency domain Fourier regression is Facebook Prophet (https://facebook.github.io/prophet/) which tends to work pretty well if you have a few years of daily data( https://facebook.github.io/prophet/ )

claytonjy6y ago

FPP is great, but limited to the simplest possible timeseries: a single number recorded at evenly-spaced intervals.

Anyone know of good resources for multivariate, multimodal, irregular timeseries forecasting? I know some great practical tools and tutorials (prophet, fast.ai), but I'd love to inject some statistical knowledge like FPP offers.

1 more reply

pyromine6y ago

Furthering this, Hyndman is writing a new R package, 'fable' which is a 'tidy' approach to forecasting in R.

Reading his book at the very least will give a lot of insights to the standard of practice for people writing forecasting in the R world.

bradstewart6y ago

Another vote for Hyndman's book. It does a great job of explaining the what and why behind different modeling techniques.

madcaptenor6y ago

Seconding Hyndman. Thanks for the link to Brockwell and Davis, which I wasn't familiar with.

anthony_doan6y ago

If you're only talking about forecasting and not medical/inferences then most of statistic models are that and GARCH variation.

There are multivariate models but I don't know much about those. Most of the good resources are in the econometric domain. Multivariate time series within econometric, from what I've seen, is portfolio balancing.

For a general overview for statistic domain I would recommend:

https://otexts.com/fpp2/

For ARIMA I love this book:

Time Series Econometric by Levendis

For GARCH: Financial Modeling Under Non-Gaussian Distributions

If you want to learn more within statistic and time series in medical data: there is (1) longitudinal and (2) survival analysis. There are non linear time series but those are rare because most of our tools work within linear. There are also circular time series and temporal spatial statistic but I don't have any relevant knowledge in those to give you. I'm sure there are other that I don't know about within statistic.

Another interesting one is change point statistic https://en.wikipedia.org/wiki/Change_detection.

There is also a coursera course in time series that I've taken. I will post it here when I get off of work and better internet connection.

If you want an idea what forecast models out there you should read the papers from https://en.wikipedia.org/wiki/Makridakis_Competitions

There are 4 papers now and most of them are on statistical models which traditional dominating this domain. Datascience/ML models are slowing getting in there. M4 the best model was a highly tailor hybrid between ML/Stat technique the person who created it was employed by Uber and wrote an article about it.

The 5th competition m5 is currently underway and split into 2 contest. I'm eagerly waiting to read the paper on the results.

traK6Dcm6y ago

I can recommend this [0] book. It's focused on financial time series and trading, but the techniques covered in the book are generic enough to apply to all kinds of time series, you can just ignore the finance parts. If you search hard enough you can find the PDF for free online. The way they treat convolution operators and efficiently approximate them with fixed-size EMAs was quite interesting to me. It's definitely a bit dated, but that's some of its charm.

[0] https://www.amazon.com/Introduction-High-Frequency-Finance-R...

ne9xt6y ago

How has ML/AI impacted the HFT scene?

1 more reply

madhadron6y ago

Try looking under the name "signal processing" instead. The toolbox under "time series analysis" is usually a variation on the contents of the old book by Box.

latentdeepspace6y ago· 6 in thread

Everyone repeat after me: "we need a baseline model".

You should always try some "dumb" models first. You'd be surprised how hard is to beat (of course depends on your KPIs) a historical average model with a more sophisticated method.

_v7gu6y ago

Not to mention the plethora of issues that arise from trying to fit an ARIMA onto an AR(1) process... It's weird that people just jump into using insanely complicated models right off the bat.

laretluval6y ago

People love making complicated models. It’s fun and feels like you are having insights. Fitting a boring old ARIMA feels like work.

1 more reply

rmrfstar6y ago

But how am I going to get that VC money if I don't say "deep learning"?

powersnail6y ago

If NN (Neural Network) beats baseline, present the NN solution.

If baseline beats NN, present NN as the baseline, and say you have an algorithm even better than NN.

(Joke only.)

jointpdf6y ago

I mean...you can always appeal to “old school” AI. Just dig in to the old papers and use their words. Latent semantic analysis (LSA) is an example of a hard to beat baseline model for text:

“By inducing global knowledge indirectly from co-occurrence data in a large body of representative text, LSA acquired knowledge about the full vocabulary of English at a comparable rate to schoolchildren.” (http://www.stat.cmu.edu/~cshalizi/350/2008/readings/Landauer...)

2 more replies

bhl6y ago

Is there a library that automates benchmark models for a given dataset? That would be useful in helping people focus on the model they’re making.

platz6y ago· 3 in thread

To my amateur eyes, normally the method for dealing with 'time series' is really just finding ways to turn a non-stationary distribution into a stationary distribution, where you can then apply classic statistical methods on them. So you're just finding ways to factor out the time component in the data so you can use the standard non-time sensitive regression models on the transformed data.

It seems like it's very challenging to either have time as a first-class component in the model or somehow treat the data points as not independent. Indeed most models require independence so often it seems like we try to force the data to look that way by smoothing and transformations. You can assume this anytime an algorithm is asking you to provide 'Stationarity'. It just seems like trying to look for the keys (prediction) where the streetlight is (model distributions with nice calculation properties).

montecarl6y ago

The problem with time is that it is (typically) not a causal variable. If you are modeling the price of a stock for example, time is certainly not what is causing to go up or down! Yes it is true, that the price at time t+1 is highly correlated with the price at time t, but extrapolating outwards must require a more sophisticated model that includes the real causal variables.

platz6y ago

so then, discounting making time itself a causal variable, it seems like using methods that rely on stationary distributions still treat the data, after pre-processing, as i.i.d, rather than predicting values from their correlated history.

I'm interested in methods that don't "subtract" simple "trends" and "seasonality" from the data (which may work for bog-standard templates such as sales data but not what I'm interested in), and rather responds to sequential relationships in the data itself, that exploits exactly the correlations you describe directly.

3 more replies

proverbialbunny6y ago

>If you are modeling the price of a stock for example, time is certainly not what is causing to go up or down!

Actually, time is a valuable feature. Eg, if stock goes sideways too long day traders will get out of the trade even if it didn't go up to the levels they were looking for. Also, eg, if the market goes up a lot beyond a trader's expectations in a short amount of time, often time a trader will wait a little bit longer. Likewise, many of the popular indicators day traders use today to be profitable have time as a key ingredient, eg TD.

2 more replies

doctoboggan6y ago· 2 in thread

The readers interested in this article are probably able to give me good advice. I've been collecting stats daily on myself for the past year (weight, activity, calories consumed, sleep hours, etc) and I would love to be able to explore and extract interesting trends and relationships from the data.

Is there an easy tool where I can just drop in all the data and it presents me with some sort of dashboard? I would love it if the tool could identify and present interesting relationships (i.e. weight and calories consumed are strongly correlated)

Does anyone know if something like that exists? Or should I start rolling my own using python/pandas?

lowdose6y ago

https://github.com/arielf/weight-loss

doctoboggan6y ago

Thanks!

cakeofzerg6y ago· 2 in thread

Currently learning gluonTS, SEEMS GOOD

ranc1d6y ago

Adding link in case others are intested!

https://gluon-ts.mxnet.io/

yuy9106166y ago

woah. Very interesting lib. Thanks for sharing!

elteto6y ago· 1 in thread

What would be some good graduate programs (I'm thinking Master's level) in the US that specialize in time series modeling and forecasting? Any available online?

siegelzero6y ago

Penn State has a bunch of their graduate stats courses online [1]. I worked through some of their time series class [2] and found it to be pretty good quality.

[1] https://online.stat.psu.edu/statprogram/ [2] https://online.stat.psu.edu/statprogram/stat510

ngcc_hk6y ago· 1 in thread

Last time using it is 1981. Still relevant today in ML era?

tomrod6y ago

Yes.

splittingTimes6y ago

For the interested, here is an overview into neural forecasting from the folks at Amazon research:

Neural forecasting: Introduction and literature overview

https://arxiv.org/pdf/2004.10240.pdf

pupdogg6y ago

Your page has a bigger focus on Google ads than the subject matter itself.

leeoniya6y ago

coincidentally, i posted this not too long ago:

https://news.ycombinator.com/item?id=23045207

j / k navigate · click thread line to collapse

48 comments

34 comments · 10 top-level

riyadparvez6y ago· 9 in thread

em5006y ago

Most of Hyndman's textbook approaches (mostly ARIMA and various exponential smoothers) are implemented in his 'forecast' R package.

claytonjy6y ago

FPP is great, but limited to the simplest possible timeseries: a single number recorded at evenly-spaced intervals.

1 more reply

pyromine6y ago

Furthering this, Hyndman is writing a new R package, 'fable' which is a 'tidy' approach to forecasting in R.

Reading his book at the very least will give a lot of insights to the standard of practice for people writing forecasting in the R world.

bradstewart6y ago

Another vote for Hyndman's book. It does a great job of explaining the what and why behind different modeling techniques.

madcaptenor6y ago

Seconding Hyndman. Thanks for the link to Brockwell and Davis, which I wasn't familiar with.

anthony_doan6y ago

If you're only talking about forecasting and not medical/inferences then most of statistic models are that and GARCH variation.

For a general overview for statistic domain I would recommend:

https://otexts.com/fpp2/

For ARIMA I love this book:

Time Series Econometric by Levendis

For GARCH: Financial Modeling Under Non-Gaussian Distributions

Another interesting one is change point statistic https://en.wikipedia.org/wiki/Change_detection.

There is also a coursera course in time series that I've taken. I will post it here when I get off of work and better internet connection.

If you want an idea what forecast models out there you should read the papers from https://en.wikipedia.org/wiki/Makridakis_Competitions

The 5th competition m5 is currently underway and split into 2 contest. I'm eagerly waiting to read the paper on the results.

traK6Dcm6y ago

[0] https://www.amazon.com/Introduction-High-Frequency-Finance-R...

ne9xt6y ago

How has ML/AI impacted the HFT scene?

1 more reply

madhadron6y ago

Try looking under the name "signal processing" instead. The toolbox under "time series analysis" is usually a variation on the contents of the old book by Box.

latentdeepspace6y ago· 6 in thread

Everyone repeat after me: "we need a baseline model".

You should always try some "dumb" models first. You'd be surprised how hard is to beat (of course depends on your KPIs) a historical average model with a more sophisticated method.

_v7gu6y ago

Not to mention the plethora of issues that arise from trying to fit an ARIMA onto an AR(1) process... It's weird that people just jump into using insanely complicated models right off the bat.

laretluval6y ago

People love making complicated models. It’s fun and feels like you are having insights. Fitting a boring old ARIMA feels like work.

1 more reply

rmrfstar6y ago

But how am I going to get that VC money if I don't say "deep learning"?

powersnail6y ago

If NN (Neural Network) beats baseline, present the NN solution.

If baseline beats NN, present NN as the baseline, and say you have an algorithm even better than NN.

(Joke only.)

jointpdf6y ago

I mean...you can always appeal to “old school” AI. Just dig in to the old papers and use their words. Latent semantic analysis (LSA) is an example of a hard to beat baseline model for text:

2 more replies

bhl6y ago

Is there a library that automates benchmark models for a given dataset? That would be useful in helping people focus on the model they’re making.

platz6y ago· 3 in thread

montecarl6y ago

platz6y ago

3 more replies

proverbialbunny6y ago

>If you are modeling the price of a stock for example, time is certainly not what is causing to go up or down!

2 more replies

doctoboggan6y ago· 2 in thread

Does anyone know if something like that exists? Or should I start rolling my own using python/pandas?

lowdose6y ago

https://github.com/arielf/weight-loss

doctoboggan6y ago

Thanks!

cakeofzerg6y ago· 2 in thread

Currently learning gluonTS, SEEMS GOOD

ranc1d6y ago

Adding link in case others are intested!

https://gluon-ts.mxnet.io/

yuy9106166y ago

woah. Very interesting lib. Thanks for sharing!

elteto6y ago· 1 in thread

What would be some good graduate programs (I'm thinking Master's level) in the US that specialize in time series modeling and forecasting? Any available online?

siegelzero6y ago

Penn State has a bunch of their graduate stats courses online [1]. I worked through some of their time series class [2] and found it to be pretty good quality.

[1] https://online.stat.psu.edu/statprogram/ [2] https://online.stat.psu.edu/statprogram/stat510

ngcc_hk6y ago· 1 in thread

Last time using it is 1981. Still relevant today in ML era?

tomrod6y ago

Yes.

splittingTimes6y ago

For the interested, here is an overview into neural forecasting from the folks at Amazon research:

Neural forecasting: Introduction and literature overview

https://arxiv.org/pdf/2004.10240.pdf

pupdogg6y ago

Your page has a bigger focus on Google ads than the subject matter itself.

leeoniya6y ago

coincidentally, i posted this not too long ago:

https://news.ycombinator.com/item?id=23045207

j / k navigate · click thread line to collapse