Prophet: forecasting at scale (opens in new tab)

(research.fb.com)

520 pointsbenhamner9y ago110 comments

110 comments

79 comments · 33 top-level

confounded9y ago· 10 in thread

Worth noting Prophet is R/Python wrappers to some models with reasonable defaults, written in and fit by Stan, a probabilistic programming language, and Bayesian estimation framework.

Stan is amazing in that you can fit pretty much any model you can describe in an equation (given enough time and compute, of course)!

More on Stan here: http://mc-stan.org/

dragandj9y ago

... and if you like Clojure, you might try Bayadera, which has its own engine running the analysis on the GPU.

http://github.com/uncomplicate/bayadera

diab0lic9y ago

I'm pretty interested in this as I do most of my work on the JVM and I love trying this out on our stream processor at work.

Cloned and tried to build it but I'm getting an error regarding uncomplicate:commons:0.3.0-SNAPSHOT being unavailable on clojars. Is that something you currently have installed to your local maven repo? I don't see it here: https://clojars.org/repo/uncomplicate/commons/

I can get it to build with 0.2.2 but it is missing the "releaseable?" function.

In any case this looks awesome and I'll be keeping an eye on it / playing with it for some new projects.

EDIT: I was able to get it building by cloning your commons library and running "lein install". :)

2 more replies

mej109y ago

This looks like it could be awesome but it has almost no information about what its purpose is or how to use it.

1 more reply

bpicolo9y ago

Readme has neither useful docs, nor any link to docs. =/

treigerm9y ago

Do you know of any good beginner tutorials for Stan or probabilistic programming in general? All the examples that I found seemed quite complex and I was a bit overwhelmed by all the math. Which might also be a sign that I should brush up my math skills. What kind of math/stats should I revise to be able to better understand probabilistic programming?

jritchie9y ago

Probabilistic Programming & Bayesian Methods for Hackers [1] by Cameron Davidson-Pilon is exactly what you want, starting from a computational-first perspective, then introducing the maths later, although it uses PyMC rather than Stan. It's freely available as a set of Jupyter notebooks, as well as a printed edition.

[1] http://camdavidsonpilon.github.io/Probabilistic-Programming-...

muraiki9y ago

Doing Bayesian Data Analysis, Second Edition: A Tutorial with R, JAGS, and Stan. It is very approachable and also has lots of practice problems. It's not a math-heavy book at all.

Edit: I wouldn't recommend Probabilistic Programming and Bayesian Methods for Hackers. When I tried using it, I felt that too much was glossed over. The book that I recommend excels at conveying a strong intuition for how these various techniques work.

2 more replies

projectorlochsa9y ago

Probabilistic graphical models.

That's the foundation. The way you set up your model is by nodes and edges that specify the flow of influence (directed or undirected). Then it seems that there are general methods for inference and learning on any kind of graph one might pose.

For simple graphs (and simple is something one might want when modelling) the methods should be fairly effective.

Unfortunately, the biggest book on the subject that I know (Koller & Friedman) isn't accessible. Koller's course is also not that accessible.

multani-hn9y ago

Stan is nice but its GPL license is taboo in my corporate environment :( .

I am puzzled how they managed to release Prophet under BSD with such a dependency.

matthjensen9y ago

Stan has a BSD core. Prophet must avoid the GPLv3 interfaces.

2 more replies

rodionos9y ago· 7 in thread

I didn't know wikipedia page view counters are available for public usage.

The wikipediatrend R package relies on http://stats.grok.se/, which in turn relies on https://dumps.wikimedia.org/other/pagecounts-raw/ which has been deprecated.

The new dump is located at https://dumps.wikimedia.org/other/pageviews/

Data is available in hourly intervals.

* pageviews-20170227-050000

  en Peyton_Manning 58 0

[edit] There is a wikipedia-hosted OSS viewer for these logs, e.g. Swedish crime stats:

https://tools.wmflabs.org/pageviews/?project=en.wikipedia.or...

anton_tarasenko9y ago

BigQuery also has the public dataset of Wikipedia page views. Handy for quick SQL and sampling.

An intro by Felipe Hoffa (Google): https://www.reddit.com/r/bigquery/comments/3dg9le/analyzing_...

abbe989y ago

The Wikimedia foundation provides an public page view API for most Wikimedia projects:

https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI

rodionos9y ago

Thanks, that's a good resource. I'm surprised though. It seems that Top-1000 articles by monthly views are 90% about celebrities and movies. I think tags or categories would be most useful.

https://wikimedia.org/api/rest_v1/metrics/pageviews/top/en.w...

JoelSanchez9y ago

https://tools.wmflabs.org/pageviews/?project=en.wikipedia.or...

What's up with Java? (Set "logarithmic scale" to improve the visualization)

Cthulhu_9y ago

the wiki page is about Java the country, not the programming language. Haven't found any relevant news around that time though.

1 more reply

fpvracing9y ago

Cool! I wonder what spiked the views for artificial intelligence on 10/11/2016?

https://tools.wmflabs.org/pageviews/?project=en.wikipedia.or...

T-A9y ago

I think that spike peaks on October 12, which is when this was released: https://obamawhitehouse.archives.gov/blog/2016/10/12/adminis...

asafira9y ago· 7 in thread

So...How much will this do at forecasting stock prices? =)

Very cool though --- I would be interested to dive into the methods they've implemented sometime in the near future!

matheweis9y ago

> So...How much will this do at forecasting stock prices? =)

Probably quite poorly (due to stocks appearing "random" at scale), especially for indexes, which are a sum of their parts.

On the other hand, this would probably be quite useful for things that have non-random trends (like the Global Energy Forecasting Competition: http://www.drhongtao.com/gefcom)

syntaxing9y ago

It would probably perform pretty poorly as other has suggested. This is mainly due to the fact that stock prices by itself is a pretty non-stationary dataset/measurement. Most of these probabilistic models are poorly equipped to make accurate predictions for non-stationary data since it's trends are seemingly similar to noise.

curuinor9y ago

Faced with phenomena I view as self-affine, other students take an extremely different tack. Most economists, scientists and engineers from diverse fields begin by subdividing time into alternating periods of quiescence and activity. Examples are provided by the following contrasts: between turbulent flow and its laminar inserts, between error-prone periods in communication and error-free periods, and between periods of orderly and agitated ("quiet" and "turbulent") Stock Market activity. Such subdivisions must be natural to human thinking, since they are widely accepted with no obvious mutual consultation. Rene Descartes endorsed them by recommending that every difficulty be decomposed into parts to be handled separately. Such subdivisions were very successful in the past, but this does not guarantee their continuing success. Past investigations only tackled variability and randomness that are mild, hence, local. In every field where variability / randomness is wild, my view is that such subdivisions are powerless. They can only hide the important facts, and cannot provide understanding. My alternative is to move to the above-mentioned apparatus centered on scaling.

-Mandelbrot, in the foreward to Multifractals and 1/f Noise.

it's worth saying that Mandelbrot was apparently a large influence to E Fama, who proposed the efficient market hypothesis in the first place.

blazespin9y ago

Probably just help verify that the stock market is a random walk with a meager trend upwards that doesn't beat inflation + trading costs.

etjossem9y ago

> Probably just help verify that the stock market is a random walk with a meager trend upwards that doesn't beat inflation + trading costs.

That doesn't sound right. Let me clear that up for you. Since 1950:

  S&P 500 Annual Price Change: 7.2%
  S&P 500 Annual Div Dist: 3.6%
  S&P 500 Annual Total Return: 11.0%
  Annual Inflation: 3.8%
  Annual Real Price Change: 3.3%
  Annual Real Total Return: 7.0 %

Buying the straight S&P 500 beats inflation by seven percent, on average, every year. You're welcome!

2 more replies

matheweis9y ago

> the stock market is a random walk with a meager trend upwards that doesn't beat inflation + trading costs.

That assumes that the efficient-market hypothesis holds true, but it has yet to be thoroughly proven or disproven... (and funds like Medallion would strongly suggest otherwise for the medium term: https://www.bloomberg.com/news/articles/2016-11-21/how-renai...)

2 more replies

ainiriand9y ago

Some people are making pretty penny for being so random.

1 more reply

nodesocket9y ago· 3 in thread

Are there any startups/services where you pass it a series and it returns forecast models? That's something I'd be willing to pay for.

yoghurtio9y ago

You can try https://yoghurt.io/. Its fully managed platform and no need to setup anything yourself. Example: Like you want to predict the app downloads of your website coming week. Just upload the data in time series format against the date and app downloads from last 30 weeks. It will return the next 7 days predicted app downloads along with the analytical confidence. It can predict any KPI like visitors, app downloads, conversion etc. Just signup and start predicting.

kriro9y ago

I'm curious...are you worried about this release? Seems like all I'd need to compete with you (vastly simplified but for arguments sake) is hack together a simple webpage with a submit button that uses Prophet. Assuming both models yield reasonably useful results (obviously you could compete on accuracy or ease of use where you're currently ahead for business-y customers).

nodesocket9y ago

Is it possible for example to send you monthly revenue numbers for my startup for the last two years (24 data points) and have yoghurt predict the next two years of monthly revenue?

2 more replies

hubot9y ago· 3 in thread

can someone explain what's the meaning of this line

> df['y'] = np.log(df['y'])

llimllib9y ago

I have not read the code, but assuming df is a pandas dataframe, it sets the 'y' column to the log of what was previously the 'y' column.

https://gist.github.com/llimllib/385230f38c3f9b70c3e46158e60...

slashcom9y ago

df is a dataframe, which is like a spreadsheet. This line takes the logarithm of the column named 'y' and updates it in place.

hubot9y ago

thanks. that part i can understand but why do that?

yoghurtio9y ago· 2 in thread

We at https://yoghurt.io/ have been working towards similar forecasting solution. So far the feedback has been that automated solutions can also bring good results at a far lesser cost compared to hiring an expert analyst.

Its a completely managed solution. No need to setup anything yourself.Just upload the data and predict next week's data, today itself. There is a free trial and if anyone here is looking for an extended trial, they can reach out to me.

redindian759y ago

your website is very sparse on details - any examples/demos?

yoghurtio9y ago

Example: Like you want to predict the app downloads of your website coming week. Just upload the data in time series format against the date and app downloads from last 30 weeks. It will return the next 7 days predicted app downloads along with the analytical confidence. It can predict any KPI like visitors, app downloads, conversion etc. Just signup and start predicting.

1 more reply

jl69y ago· 2 in thread

I wonder what Sungard/FIS think of the name, which is the same as their commercial financial modelling/forecasting tool.

vinw9y ago

FIS Prophet is targeted at actuaries, and really no-one else so I don't know if anyone will care. They have had the name a lot longer than Facebook though!

T-A9y ago

This other Prophet has also been around for a while: https://github.com/Emsu/prophet

1 more reply

hnarayanan9y ago· 2 in thread

Is there a way to extend these models to handle spatial variation (e.g. weather forecasting, property price estimation etc.) as well?

rodionos9y ago

This would be non-trivial. Consider this paper on marijuana usage where the researchers had to group statistics by adjacent counties in Oregon and Washington in order to control the tests.

https://papers.ssrn.com/sol3/papers2.cfm?abstract_id=2841267

hnarayanan9y ago

Thank you for the pointer, will read the article.

All my attempts thus far have pointed me to something called Gaussian Proceeses that I am still working through grokking.

dmichulke9y ago· 2 in thread

I have been working for a few years on a similar project using evolutionary algorithms on top of other models (linear / ann). It works quite well (e.g., for equidistant energy demand / supply forecasts) but there's still lots of stuff to do.

It's major benefit is that it figures out relationship to the target time series by itself, so you can just throw in all time series and see what comes out.

Language is Clojure, 20kloc, incanter, encog. If anyone is interested in working for/with it, let me know. I currently develop a Rest Api for it and plan to release it as open source once the major code smells are dealt with.

feld9y ago

Why not release sooner and document the code smells? Maybe you'll get patches

dmichulke9y ago

I'd like to have a tested use case that mostly and simply works. Something to put in readme.md that shows how it works and that it works. Almost there...

recurser9y ago· 2 in thread

Very cool. Could this be re-purposed for detecting anomalies/outliers in time series data?

techno_modus9y ago

>Could this be re-purposed for detecting anomalies/outliers in time series data?

If you define anomaly as something unexpected then yes. In this case, if the reality differs significantly from the forecast (=expectation) then it is an anomaly (according to our definition). In numeric univariate case, there could be positive anomalies where you get more than expected, and negative anomaly where you get less than expected.

fgpwd9y ago

My guess would be yes. I'm thinking this could be used to find out how effective a particular marketing campaign was. Just compare the forecast with actuals and the difference would be the number of sales/clicks you got from that campaign.

saosebastiao9y ago· 1 in thread

This is an interesting project, and in one of the areas where almost all businesses could do better. Anecdotally, there is a ton of money left on the table by established businesses that do it poorly, which also leaves lots of room for resume-padding technical experience. So anything that claims to improve the state of the art of automated forecasting is definitely worth watching.

That being said this claim in point #1 baffles me:

> Prophet makes it much more straightforward to create a reasonable, accurate forecast. The forecast package includes many different forecasting techniques (ARIMA, exponential smoothing, etc), each with their own strengths, weaknesses, and tuning parameters. We have found that choosing the wrong model or parameters can often yield poor results, and it is unlikely that even experienced analysts can choose the correct model and parameters efficiently given this array of choices.

The forecast package contains an auto.arima function which does full parameter optimization using AIC which is just as hands free as is claimed of Prophet. I have been using it commercially and successfully for years now. Maybe prophet produces better models (I'll definitely take a look myself), but to claim that it's not possible to get good results without experience seems a bit disingenuous.

As an aside, anybody interested in a great introductory book on time series forecasting should check out Rob Hyndman's book which is freely available online. https://www.otexts.org/fpp

dxbydt9y ago

> Anecdotally, there is a ton of money left on the table by established businesses...

True. fwiw, I worked on the same project at Twitter 4 years back - the Facebook folks call it capacity planning at scale, we called it capacity utilization modeling. The goal was the same - there are all these "jobs" - 10s of 1000s of programs running on distributed clusters, hogging CPU, memory and disk. Can we look at a snapshot in time of the jobs usage, and then predict/forecast what the next quarter jobs usage would be ? If you get these forecasts right ( within reasonable error bounds ), the folks making purchasing decisions ( how many machines to lease for the next quarter for the datacenters) can save a bundle.

From an engineering pov, every job would need to log it's p95 and p99 CPU usage, memory stats, disk stats...Since Twitter was running some 50k programs back then (2013ish) on these Mesos clusters, the underlying C++ API had hooks to obtain CPU and memory stats, even though the actual programs running were all coded up in Scala (mostly), or python/Ruby (bigger minority), or C/Java/R/perl ( smaller minority ). There's an interesting Quora discussion on why Mesos was in C++ while rest of Twitter is Scalaland...mostly because you can't do these sort of CPU/memory/disk profiling in the jvmland as well as you can in C++.

OK, so you now have all these CPU stats. What do you do with them ? Before you get to that, you have the usual engineering hassles - how often should you obtain the CPU stats ? Where would you store them ?

So at Twitter we got these stats every minute ( serious overkill :) and stored them in a monstrous JSON ( horrible idea given 50000 programs * number of minutes in day * all the different stats you were storing :))

So every day I'd get a gigantic 20gb JSON from infra, then I'd have to do the modeling.

In those days, you couldn't find a single Scala JSON parser that would load up that gigantic JSON without choking. We tried them all. Finally we settled on GSON - Google's JSON parser written in Java, that handled these gigantic jsons with no hiccups.

Before you get to the math, you would have to parse the JSON and build a data structure that would store these (x,t) tuples in memory. You had 50k programs, so each program would get a model, each model originated from a shitton of (x,t) tuples, the t being minutely and the fact that some of these programs had been running for years, meant you had very large datasets.

The math was relatively straightforward...I used so called "LAD" - least absolute deviation from mean, as opposed to simple OLS, because least squares wasn't quite predictive for that use case. Building the LAD modeling thing in Scala was somewhat interesting...Most of the work was done by the commons math Apache libraries, I mostly had to ensure the edge cases wouldn't throw you off, because LAD admits multiple solutions to the same dataset - it's not like OLS where you give it a dataset and it finds a unique best fit line. Here you'd have many lines sitting in an array, depending on how long you let the Simplex solver run. Then came the problem of visualizing these 50,000 piecewise line models using javascript heh heh. The front end guys had a ball with the models I spit out.

If someone's doing this from scratch these days, NNs would be your best bet. Regime changes are a big part of that.

schlarpc9y ago· 1 in thread

Moderately relevant short story: https://www.facebook.com/notes/robin-sloan/julie-rubicon/985...

JoshTriplett9y ago

That was the first thing I thought of when I saw the title.

techno_modus9y ago· 1 in thread

It seems that they have developed a model for only univariate forecasts and only numeric regular time series which is a classical use case in statistics. Yet, most data sources have many dimensions (for example, energy consumption, temperature, humidity etc.) as well as categorical data like current state (On, Off). The situation is even more difficult if the data is not a regular time series but is more like asynchronous event stream. It would be interesting to find a good forecasting model for some of these use cases. In particular, it is interesting if this Prophet model can be generalized and applied to multivariate data.

unoti9y ago

> most data sources have many dimensions (for example, energy consumption, temperature, humidity etc.) as well as categorical data like current state (On, Off). The situation is even more difficult if the data is not a regular time series but is more like asynchronous event stream. It would be interesting to find a good forecasting model for some of these use cases.

I'm guessing you already know about this based on the way you described the situation, but the Hyndman Forecasting book [1] discusses various models at length for doing multivariate forecasting models. It's loaded with code and samples in R.

1. https://www.otexts.org/fpp

nickfzx9y ago· 1 in thread

This looks amazing, congratulations.

We're planning to add forecasting to our SaaS analytics product (https://chartmogul.com) later this year, I'm going to look and see if we can use this in our product now.

tommynicholas9y ago

I was trying to sort out whether adding this to an existing charting/analytics product makes sense but it looks like you've checked it out and think it does. I couldn't tell only because it seems to be built to do the charting/plotting itself, but I guess you can just use the data/API to get the forecasts then plot them yourself yes?

I may do a test implementation into Airbnb Superset actually to see how it flies.

paulvs9y ago· 1 in thread

For a corporate credit analyst working at a bank, what are some good introduction material for getting into forecasting using tools like these?

I see this being applicable to analysts when deciding on on a company's credit worthiness.

zebrafish9y ago

There are some models out there which could be used but i'm not sure that forecasting is actually what you would use.

I would think if you're already assigning credit ratings, you can set that as your dependent variable and use things like company revenue, number of employees, age of company, etc. as your independent variables. You can use a number of different models to assess credit worthiness based on this data. Evaluate several to determine the most accurate.

agounaris9y ago· 1 in thread

How different this framework is from statsmodels?

adw9y ago

Statsmodels is a grab-bag of various statistical models from linear regression upwards. This is an opinionated library for (some relevant parts of) econometrics.

cardosof9y ago

That's very cool, congrats and thank you to the Facebook guys!

A few days ago I was asked to do some forecasting with a daily revenue series for a client. Due to her business' nature the series was really tricky with weekdays and months/semesters having some specific effects on the data. I as many use Hyndman's forecast package, but I threw this data at prophet and it delivered a nice plot with the (correct) overall trend and seasonalities. Very cool and easy to do something.

anacleto9y ago

This is so great!

I've been using CasualImpact by Google [0] for months. This seems pretty straightforward.

[0] https://google.github.io/CausalImpact/CausalImpact.html

pacifika9y ago

The more facebook grows the more tools it aligns tooling with intelligence services.

Steeeve9y ago

This actually looks incredibly useful and pretty simple to learn.

Between this and Stan I think my free time for the next week is gone.

zebrafish9y ago

So.... I don't understand how this is better or worse than using forecast.

You talk about having to choose the best algorithm but it seems like Prophet is just another algorithm to choose from. Is there some kind of built in grid-search or are you just stating that results from your AM have been more accurate than ARIMA?

hn_username9y ago

This is a nice piece of work - thanks for sharing with the community!

Some feedback: it'd be nice to see you actually quantify how accurate Prophet's forecasts are on the landing page for the project. In the Wikipedia page view example, you go as far as showing a Prophet forecast, but it'd be nice to have you take it one step further and quantify its performance. Maybe withhold some of the data you use to fit the model and see how it performs on that out of sample data. It's nice that you show qualitatively that it captures seasonality, but you make bold claims about its accuracy and the data to back those claims up is conspicuously absent. Related, it might be worth benchmarking its performance against existing automated forecasting tools.

I'll definitely be checking it out!

SmellTheGlove9y ago

For us insurance/financial services folks, I would like to simply clarify that this is not the Sungard/FIS risk management platform that is also called Prophet! :D

I got really excited for a second. Actually, I'm still pretty excited about this even if it was something else entirely.

minimaxir9y ago

Interesting definition of "scale" in this context, as it does not imply "big data" like every other usage of the word scale in data science. The tool works on, and is optimized, for day-to-day, mundane data.

See also the R vignette, which shows that the data is returned per-column which gives it a lot of flexibility if you only want certain values: https://cran.r-project.org/web/packages/prophet/vignettes/qu...

syntaxing9y ago

The fact that Prophet follows the "sklearn model API" and that it's very well integrated with pandas makes it super appealing and usable!

monkeydust9y ago

Very cool, got loads of sensor data around my house over a years worth so curious to throw it at Prophet.

Has anyone managed to get this working on windows with Juypter (Anaconda build) struggling with Pystan errors. Any guidance welcomed.

eternalban9y ago

/please ignore: Oracle & Prophet. Oracle sifts through signs but Prophet has a line to the larger picture. I suppose the next 'product' will be called Messiah to complete the picture.

elwell9y ago

Why do we need Prophet when we already have Temple OS (http://www.templeos.org/)?

alexpetralia9y ago

Slightly inconvenient that the main image <figure> needs to be replaced by an <img> tag just to have the image appear in print outs.

poppingtonic9y ago

This is very interesting. Forecasters who participate in the Good Judgment Project, such as myself, will find this useful.

ayayecocojambo9y ago

Can we use other features (like temperatue?), or it has to be only time-based?

Helmet9y ago

just wanted to point out to potential windows users - this will only run on python 3.5 due to dependencies (pystan only works on python 3.5 for windows)

fagnerbrack9y ago

Facebook...

j / k navigate · click thread line to collapse

110 comments

79 comments · 33 top-level

confounded9y ago· 10 in thread

Worth noting Prophet is R/Python wrappers to some models with reasonable defaults, written in and fit by Stan, a probabilistic programming language, and Bayesian estimation framework.

Stan is amazing in that you can fit pretty much any model you can describe in an equation (given enough time and compute, of course)!

More on Stan here: http://mc-stan.org/

dragandj9y ago

... and if you like Clojure, you might try Bayadera, which has its own engine running the analysis on the GPU.

http://github.com/uncomplicate/bayadera

diab0lic9y ago

I'm pretty interested in this as I do most of my work on the JVM and I love trying this out on our stream processor at work.

I can get it to build with 0.2.2 but it is missing the "releaseable?" function.

In any case this looks awesome and I'll be keeping an eye on it / playing with it for some new projects.

EDIT: I was able to get it building by cloning your commons library and running "lein install". :)

2 more replies

mej109y ago

This looks like it could be awesome but it has almost no information about what its purpose is or how to use it.

1 more reply

bpicolo9y ago

Readme has neither useful docs, nor any link to docs. =/

treigerm9y ago

jritchie9y ago

[1] http://camdavidsonpilon.github.io/Probabilistic-Programming-...

muraiki9y ago

Doing Bayesian Data Analysis, Second Edition: A Tutorial with R, JAGS, and Stan. It is very approachable and also has lots of practice problems. It's not a math-heavy book at all.

2 more replies

projectorlochsa9y ago

Probabilistic graphical models.

For simple graphs (and simple is something one might want when modelling) the methods should be fairly effective.

Unfortunately, the biggest book on the subject that I know (Koller & Friedman) isn't accessible. Koller's course is also not that accessible.

multani-hn9y ago

Stan is nice but its GPL license is taboo in my corporate environment :( .

I am puzzled how they managed to release Prophet under BSD with such a dependency.

matthjensen9y ago

Stan has a BSD core. Prophet must avoid the GPLv3 interfaces.

2 more replies

rodionos9y ago· 7 in thread

I didn't know wikipedia page view counters are available for public usage.

The wikipediatrend R package relies on http://stats.grok.se/, which in turn relies on https://dumps.wikimedia.org/other/pagecounts-raw/ which has been deprecated.

The new dump is located at https://dumps.wikimedia.org/other/pageviews/

Data is available in hourly intervals.

* pageviews-20170227-050000

  en Peyton_Manning 58 0

[edit] There is a wikipedia-hosted OSS viewer for these logs, e.g. Swedish crime stats:

https://tools.wmflabs.org/pageviews/?project=en.wikipedia.or...

anton_tarasenko9y ago

BigQuery also has the public dataset of Wikipedia page views. Handy for quick SQL and sampling.

An intro by Felipe Hoffa (Google): https://www.reddit.com/r/bigquery/comments/3dg9le/analyzing_...

abbe989y ago

The Wikimedia foundation provides an public page view API for most Wikimedia projects:

https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI

rodionos9y ago

Thanks, that's a good resource. I'm surprised though. It seems that Top-1000 articles by monthly views are 90% about celebrities and movies. I think tags or categories would be most useful.

https://wikimedia.org/api/rest_v1/metrics/pageviews/top/en.w...

JoelSanchez9y ago

https://tools.wmflabs.org/pageviews/?project=en.wikipedia.or...

What's up with Java? (Set "logarithmic scale" to improve the visualization)

Cthulhu_9y ago

the wiki page is about Java the country, not the programming language. Haven't found any relevant news around that time though.

1 more reply

fpvracing9y ago

Cool! I wonder what spiked the views for artificial intelligence on 10/11/2016?

https://tools.wmflabs.org/pageviews/?project=en.wikipedia.or...

T-A9y ago

I think that spike peaks on October 12, which is when this was released: https://obamawhitehouse.archives.gov/blog/2016/10/12/adminis...

asafira9y ago· 7 in thread

So...How much will this do at forecasting stock prices? =)

Very cool though --- I would be interested to dive into the methods they've implemented sometime in the near future!

matheweis9y ago

> So...How much will this do at forecasting stock prices? =)

Probably quite poorly (due to stocks appearing "random" at scale), especially for indexes, which are a sum of their parts.

On the other hand, this would probably be quite useful for things that have non-random trends (like the Global Energy Forecasting Competition: http://www.drhongtao.com/gefcom)

syntaxing9y ago

curuinor9y ago

-Mandelbrot, in the foreward to Multifractals and 1/f Noise.

it's worth saying that Mandelbrot was apparently a large influence to E Fama, who proposed the efficient market hypothesis in the first place.

blazespin9y ago

Probably just help verify that the stock market is a random walk with a meager trend upwards that doesn't beat inflation + trading costs.

etjossem9y ago

> Probably just help verify that the stock market is a random walk with a meager trend upwards that doesn't beat inflation + trading costs.

That doesn't sound right. Let me clear that up for you. Since 1950:

  S&P 500 Annual Price Change: 7.2%
  S&P 500 Annual Div Dist: 3.6%
  S&P 500 Annual Total Return: 11.0%
  Annual Inflation: 3.8%
  Annual Real Price Change: 3.3%
  Annual Real Total Return: 7.0 %

Buying the straight S&P 500 beats inflation by seven percent, on average, every year. You're welcome!

2 more replies

matheweis9y ago

> the stock market is a random walk with a meager trend upwards that doesn't beat inflation + trading costs.

2 more replies

ainiriand9y ago

Some people are making pretty penny for being so random.

1 more reply

nodesocket9y ago· 3 in thread

Are there any startups/services where you pass it a series and it returns forecast models? That's something I'd be willing to pay for.

yoghurtio9y ago

kriro9y ago

nodesocket9y ago

Is it possible for example to send you monthly revenue numbers for my startup for the last two years (24 data points) and have yoghurt predict the next two years of monthly revenue?

2 more replies

hubot9y ago· 3 in thread

can someone explain what's the meaning of this line

> df['y'] = np.log(df['y'])

llimllib9y ago

I have not read the code, but assuming df is a pandas dataframe, it sets the 'y' column to the log of what was previously the 'y' column.

https://gist.github.com/llimllib/385230f38c3f9b70c3e46158e60...

slashcom9y ago

df is a dataframe, which is like a spreadsheet. This line takes the logarithm of the column named 'y' and updates it in place.

hubot9y ago

thanks. that part i can understand but why do that?

yoghurtio9y ago· 2 in thread

redindian759y ago

your website is very sparse on details - any examples/demos?

yoghurtio9y ago

1 more reply

jl69y ago· 2 in thread

I wonder what Sungard/FIS think of the name, which is the same as their commercial financial modelling/forecasting tool.

vinw9y ago

FIS Prophet is targeted at actuaries, and really no-one else so I don't know if anyone will care. They have had the name a lot longer than Facebook though!

T-A9y ago

This other Prophet has also been around for a while: https://github.com/Emsu/prophet

1 more reply

hnarayanan9y ago· 2 in thread

Is there a way to extend these models to handle spatial variation (e.g. weather forecasting, property price estimation etc.) as well?

rodionos9y ago

This would be non-trivial. Consider this paper on marijuana usage where the researchers had to group statistics by adjacent counties in Oregon and Washington in order to control the tests.

https://papers.ssrn.com/sol3/papers2.cfm?abstract_id=2841267

hnarayanan9y ago

Thank you for the pointer, will read the article.

All my attempts thus far have pointed me to something called Gaussian Proceeses that I am still working through grokking.

dmichulke9y ago· 2 in thread

It's major benefit is that it figures out relationship to the target time series by itself, so you can just throw in all time series and see what comes out.

feld9y ago

Why not release sooner and document the code smells? Maybe you'll get patches

dmichulke9y ago

I'd like to have a tested use case that mostly and simply works. Something to put in readme.md that shows how it works and that it works. Almost there...

recurser9y ago· 2 in thread

Very cool. Could this be re-purposed for detecting anomalies/outliers in time series data?

techno_modus9y ago

>Could this be re-purposed for detecting anomalies/outliers in time series data?

fgpwd9y ago

saosebastiao9y ago· 1 in thread

That being said this claim in point #1 baffles me:

As an aside, anybody interested in a great introductory book on time series forecasting should check out Rob Hyndman's book which is freely available online. https://www.otexts.org/fpp

dxbydt9y ago

> Anecdotally, there is a ton of money left on the table by established businesses...

So every day I'd get a gigantic 20gb JSON from infra, then I'd have to do the modeling.

If someone's doing this from scratch these days, NNs would be your best bet. Regime changes are a big part of that.

schlarpc9y ago· 1 in thread

Moderately relevant short story: https://www.facebook.com/notes/robin-sloan/julie-rubicon/985...

JoshTriplett9y ago

That was the first thing I thought of when I saw the title.

techno_modus9y ago· 1 in thread

unoti9y ago

1. https://www.otexts.org/fpp

nickfzx9y ago· 1 in thread

This looks amazing, congratulations.

We're planning to add forecasting to our SaaS analytics product (https://chartmogul.com) later this year, I'm going to look and see if we can use this in our product now.

tommynicholas9y ago

I may do a test implementation into Airbnb Superset actually to see how it flies.

paulvs9y ago· 1 in thread

For a corporate credit analyst working at a bank, what are some good introduction material for getting into forecasting using tools like these?

I see this being applicable to analysts when deciding on on a company's credit worthiness.

zebrafish9y ago

There are some models out there which could be used but i'm not sure that forecasting is actually what you would use.

agounaris9y ago· 1 in thread

How different this framework is from statsmodels?

adw9y ago

Statsmodels is a grab-bag of various statistical models from linear regression upwards. This is an opinionated library for (some relevant parts of) econometrics.

cardosof9y ago

That's very cool, congrats and thank you to the Facebook guys!

anacleto9y ago

This is so great!

I've been using CasualImpact by Google [0] for months. This seems pretty straightforward.

[0] https://google.github.io/CausalImpact/CausalImpact.html

pacifika9y ago

The more facebook grows the more tools it aligns tooling with intelligence services.

Steeeve9y ago

This actually looks incredibly useful and pretty simple to learn.

Between this and Stan I think my free time for the next week is gone.

zebrafish9y ago

So.... I don't understand how this is better or worse than using forecast.

hn_username9y ago

This is a nice piece of work - thanks for sharing with the community!

I'll definitely be checking it out!

SmellTheGlove9y ago

For us insurance/financial services folks, I would like to simply clarify that this is not the Sungard/FIS risk management platform that is also called Prophet! :D

I got really excited for a second. Actually, I'm still pretty excited about this even if it was something else entirely.

minimaxir9y ago

syntaxing9y ago

The fact that Prophet follows the "sklearn model API" and that it's very well integrated with pandas makes it super appealing and usable!

monkeydust9y ago

Very cool, got loads of sensor data around my house over a years worth so curious to throw it at Prophet.

Has anyone managed to get this working on windows with Juypter (Anaconda build) struggling with Pystan errors. Any guidance welcomed.

eternalban9y ago

/please ignore: Oracle & Prophet. Oracle sifts through signs but Prophet has a line to the larger picture. I suppose the next 'product' will be called Messiah to complete the picture.

elwell9y ago

Why do we need Prophet when we already have Temple OS (http://www.templeos.org/)?

alexpetralia9y ago

Slightly inconvenient that the main image <figure> needs to be replaced by an <img> tag just to have the image appear in print outs.

poppingtonic9y ago

This is very interesting. Forecasters who participate in the Good Judgment Project, such as myself, will find this useful.

ayayecocojambo9y ago

Can we use other features (like temperatue?), or it has to be only time-based?

Helmet9y ago

just wanted to point out to potential windows users - this will only run on python 3.5 due to dependencies (pystan only works on python 3.5 for windows)

fagnerbrack9y ago

Facebook...

j / k navigate · click thread line to collapse