Linear regression by hand (opens in new tab)

(dsgazette.com)

129 pointstarheeljason8y ago34 comments

34 comments

30 comments · 9 top-level

Rainymood8y ago· 11 in thread

Can someone explain to me why this has (so many) upvotes? This is like elementary undergraduate econ stats and kind of trivial?

There's very little content either, it's literally a reformulation of the formula, no interesting graphs or geometric interpretation. What I expected from a title like "Linear Regression By Hand" was the minimization of some quadratic error function, by hand (i.e. using pencil and paper).

triplesec8y ago

My suspicion, for the Eeyores round here, is that this has upvotes because people like to bookmark a reminder of references to remind you of the maths behind what most of us do automagically nowadays, for when we might need it. And often it's the comments that provide even better and advance sources, as is the case here. And indeed there are some programmers new to this kind of thing.

VHRanger8y ago

I think a lot of people here didn't take econometrics courses.

If you see all regression problems under the foil of maximum likelihood estimation, you might not know that ordinary least squares regression has a closed form solution

theophrastus8y ago

One issue that I nearly always find missing in intro discussions about linear regression is the near universal assumption of no error in the abcissal/"x" values. And while this is true-ish for time series data, (we know for certain which day we collected the data on - yet the same hour every day?), I'd be rich if I had a nickel for every time I saw standard linear regression done when the "x" had significant (and known) error. In which case you're biasing yourself unless you use some sort of 2d regression, like Deming.[1]

[1] https://en.wikipedia.org/wiki/Deming_regression

VHRanger8y ago

Regression with measurement error is usually treated in much higher level statistics/econometrics classes.

If you're interested in this you can read more in Mostly Harmless Econometrics [1] about adressing this with IV methods

[1] http://www.development.wne.uw.edu.pl/uploads/Main/recrut_eco...

1 more reply

thess248y ago

I agree, it seems like no help just to list a formula to memorize. If someone knows enough linear algebra to understand what the formula represents, they can do the derivation. This link [1] is a good one if anyone is interested.

[1] https://eli.thegreenplace.net/2014/derivation-of-the-normal-...

abecedarius8y ago

What I expected "by hand" to mean was something like a handmade analog computer. E.g. print your scatterplot, tape a penny over each data point, push a tack through the origin and let the printout swing around it under gravity until it comes to rest (while keeping it flat -- I guess I should've first stuck the printout onto some cardboard). Is there some generalization of this idea that lets the intercept vary too?

luckyt8y ago

Well, unless you're doing original research, everything you write will be trivial to somebody.

mr_toad8y ago

There aren’t that many good articles on basic statistics that are freely available on the net. Textbooks are expensive, and most of the free material is of dubious quality. Wikipedia is particularly terrible on Statistics.

bluusteel8y ago

It seems like blog posts on simple statistical methods like this one land on the front page of HNews a lot more than one would expect.

thaumaturgy8y ago

https://www.xkcd.com/1053/

I almost certainly know something I'd consider "trivial" that you haven't encountered yet. I try to be really excited when that happens.

baking8y ago

I think that perhaps the issue is that machine learning courses skip over the fact that there is a trivial closed form solution to 99.9% of all real-world machine learning problems.

tw10108y ago· 4 in thread

It's way more fun to know how to derive least squares than to memorize some formula: https://see.stanford.edu/materials/lsoeldsee263/05-ls.pdf (page 4)

xadhominemx8y ago

Also useful to understand least squares as a special case of maximum likelihood estimation. MLE I think is very intuitive.

vecter8y ago

I know a bit of stats, can you explain why MLE is intuitive?

sampo8y ago

[misunderstood]

tw10108y ago

You're confusing linear regression and least squares. (They're connected but not identical in that way.) Least squares gives the closest orthagonal (perpendicular) projection onto the range of the matrix. The slide is correct.

thaumaturgy8y ago· 3 in thread

MySQL is perfectly capable of calculating a linear regression for you, btw. In my case, I needed to be able to estimate trends from sparse time series data. Here's how you do that:

    SELECT
        @a_count := avg(count) as mean_count,
        @a_weeks := avg(`week`) as mean_weeks,
        @covariance := (sum(`week` * `count`) - sum(`week`) * sum(`count`) / count(`week`)) / count(`week`) as covariance,
        @stddev_count := stddev(`count`) as stddev_count,
        @stddev_week := stddev(`week`) as stddev_week,
        @r := @covariance / (@stddev_count * @stddev_week) as r,
        @slope := @r * @stddev_count / @stddev_week as slope,
        @y_int := @a_count - (@slope * @a_weeks) as y_int,
        @this_week_no := timestampdiff(WEEK, (select min(`date`) from dataset), curdate()) as this_week_no,
        @predicted := round(greatest(1, @y_int + (@slope * @this_week_no))) as predicted
    
    FROM (SELECT timestampdiff(WEEK, (select min(`date`) from dataset), `date`) as week, count(date) as count FROM dataset group by WEEK(date)) series;

I had to figure out how to translate the math into SQL, now you don't have to.

This performs well enough to be able to crunch tens of millions of rows of data in "reasonable time" on a wimpy VPS.

soVeryTired8y ago

Yes, but why? Right tool for the right job and all that...

barrkel8y ago

Normally the reason you write anything beyond trivial SQL is because you only have a small amount of code to run and lots of data to run it over. Pushing the code to the data is more efficient than pulling the data to the code.

The latter might be conceptually cleaner (though it's debatable, relational is a fairly nice programming model and a lot more consistent and well-founded than object orientation, for one), but it's seldom optimal.

Three orders of magnitude or more speedups are not unexpected by pushing the code to the data.

1 more reply

taeric8y ago

I would guess access to the data is a decent reason to try. No need to pump the data elsewhere, if this is available.

I share your doubt that this is worth it, to be clear.

1 more reply

dankohn18y ago· 2 in thread

I remember being blown away as an undergrad that least squares (which I had learned first algrebraiclly) had such an obvious geometric meaning:

http://www.statisticshowto.com/wp-content/uploads/2014/11/le...

You need to square the values so that points that positives and negative differences (between the points and the trend regression line) don't cancel out.

pliny8y ago

If you only needed to do that, you could just take the abs error (Even the 1-0 loss function, where every point that the regression hyperplane doesn't pass through contributes 1 to the error, fulfills this criterion).

eat_veggies8y ago

I'm super new to statistics and math, so can you fill me in on why the error is squared rather than absolute valued? Is it because it's easier to take the derivative of, and therefore minimize analytically?

vecter8y ago· 1 in thread

This is very dangerous and an awful way to compute the least squares fit due to potential numerical issues with calculating the inverse of the matrix. I wish he would put a warning in a huge bold header to never do this for actual production work.

ak_yo8y ago

This is right -- plus lm() is faster! Although, from a statistical perspective, if you can't invert X'X, that should first make you think "I have data quality issues" (i.e. multicollinearity) rather than "I need a different algorithm to compute the inverse".

wgyn8y ago

If you really want to do linear regression by hand, check out Chapter 1 of Stephen Stigler's History of Statistics: https://www.amazon.com/History-Statistics-Measurement-Uncert.... You can do least squares on astronomical data the way Legendre did it.

antirez8y ago

Very small fully connected neural networks are incredibly good at approximating functions even after one second of training with RPROP. Of course for complex non linear functions as well.

Btw doing linear regression with pencil and paper just geometrically tracing a line that appears to fit the points and then calculating then coefficients is trivial.

inlineint8y ago

I thought it was about doing scatter plot on graph paper and trying to draw a line with a ruler so that it “almost all points fit”, then empirically measuring the slope and the intercept. I had an impression that it was the way to go in cases when the requirements for accuracy were not strict and calculators was not around.

blt8y ago

No mention of gradient based solutions for huge data sets?

j / k navigate · click thread line to collapse

34 comments

30 comments · 9 top-level

Rainymood8y ago· 11 in thread

Can someone explain to me why this has (so many) upvotes? This is like elementary undergraduate econ stats and kind of trivial?

triplesec8y ago

VHRanger8y ago

I think a lot of people here didn't take econometrics courses.

If you see all regression problems under the foil of maximum likelihood estimation, you might not know that ordinary least squares regression has a closed form solution

theophrastus8y ago

[1] https://en.wikipedia.org/wiki/Deming_regression

VHRanger8y ago

Regression with measurement error is usually treated in much higher level statistics/econometrics classes.

If you're interested in this you can read more in Mostly Harmless Econometrics [1] about adressing this with IV methods

[1] http://www.development.wne.uw.edu.pl/uploads/Main/recrut_eco...

1 more reply

thess248y ago

[1] https://eli.thegreenplace.net/2014/derivation-of-the-normal-...

abecedarius8y ago

luckyt8y ago

Well, unless you're doing original research, everything you write will be trivial to somebody.

mr_toad8y ago

bluusteel8y ago

It seems like blog posts on simple statistical methods like this one land on the front page of HNews a lot more than one would expect.

thaumaturgy8y ago

https://www.xkcd.com/1053/

I almost certainly know something I'd consider "trivial" that you haven't encountered yet. I try to be really excited when that happens.

baking8y ago

I think that perhaps the issue is that machine learning courses skip over the fact that there is a trivial closed form solution to 99.9% of all real-world machine learning problems.

tw10108y ago· 4 in thread

It's way more fun to know how to derive least squares than to memorize some formula: https://see.stanford.edu/materials/lsoeldsee263/05-ls.pdf (page 4)

xadhominemx8y ago

Also useful to understand least squares as a special case of maximum likelihood estimation. MLE I think is very intuitive.

vecter8y ago

I know a bit of stats, can you explain why MLE is intuitive?

sampo8y ago

[misunderstood]

tw10108y ago

thaumaturgy8y ago· 3 in thread

MySQL is perfectly capable of calculating a linear regression for you, btw. In my case, I needed to be able to estimate trends from sparse time series data. Here's how you do that:

    SELECT
        @a_count := avg(count) as mean_count,
        @a_weeks := avg(`week`) as mean_weeks,
        @covariance := (sum(`week` * `count`) - sum(`week`) * sum(`count`) / count(`week`)) / count(`week`) as covariance,
        @stddev_count := stddev(`count`) as stddev_count,
        @stddev_week := stddev(`week`) as stddev_week,
        @r := @covariance / (@stddev_count * @stddev_week) as r,
        @slope := @r * @stddev_count / @stddev_week as slope,
        @y_int := @a_count - (@slope * @a_weeks) as y_int,
        @this_week_no := timestampdiff(WEEK, (select min(`date`) from dataset), curdate()) as this_week_no,
        @predicted := round(greatest(1, @y_int + (@slope * @this_week_no))) as predicted
    
    FROM (SELECT timestampdiff(WEEK, (select min(`date`) from dataset), `date`) as week, count(date) as count FROM dataset group by WEEK(date)) series;

I had to figure out how to translate the math into SQL, now you don't have to.

This performs well enough to be able to crunch tens of millions of rows of data in "reasonable time" on a wimpy VPS.

soVeryTired8y ago

Yes, but why? Right tool for the right job and all that...

barrkel8y ago

Three orders of magnitude or more speedups are not unexpected by pushing the code to the data.

1 more reply

taeric8y ago

I would guess access to the data is a decent reason to try. No need to pump the data elsewhere, if this is available.

I share your doubt that this is worth it, to be clear.

1 more reply

dankohn18y ago· 2 in thread

I remember being blown away as an undergrad that least squares (which I had learned first algrebraiclly) had such an obvious geometric meaning:

http://www.statisticshowto.com/wp-content/uploads/2014/11/le...

You need to square the values so that points that positives and negative differences (between the points and the trend regression line) don't cancel out.

pliny8y ago

eat_veggies8y ago

vecter8y ago· 1 in thread

ak_yo8y ago

wgyn8y ago

antirez8y ago

Very small fully connected neural networks are incredibly good at approximating functions even after one second of training with RPROP. Of course for complex non linear functions as well.

Btw doing linear regression with pencil and paper just geometrically tracing a line that appears to fit the points and then calculating then coefficients is trivial.

inlineint8y ago

blt8y ago

No mention of gradient based solutions for huge data sets?

j / k navigate · click thread line to collapse