With what I wrote, just use cross tabulation justified as the best least squares estimate just from my short derivation based on conditional expectation. Don't have to mention Bayes.
> assume that distribution of y has a specific form given x
With what I wrote, don't have to do that. To be quite precise, do need to assume that random variable Y has an expectation, but in practice that is a meager assumption. Otherwise in what I wrote, essentially don't have to make any assumptions at all about distributions (e.g., do want to assume that Y has an expectation).
Here I have a remark to the statistics community, especially the beginning students and their teachers, a remark old for me: Yes, we have random variables. Yes, each random variable, even the multidimensional ones, has a cumulative distribution. Yes, it's often not too much to assume that we have a probability density. Yes, there are some important probability densities, exponential, Poisson, ..., especially Gaussian. Yes, some of these densities have some amazing properties -- e.g., for the Gaussian, sample mean and variance are sufficient statistics (see a famous 1940s paper by Halmos and Savage and a later paper on lack of stability by E. Dynkin).
So, from that, a suggestion accepted implicitly is that in work in applied statistics early on we should seek to find "the distribution". If the usual suspects don't fit, then we should try some flexible alternatives and try to make them fit.
If you want some spreadsheet program to draw you a histogram, okay. With a good program, you can get a histogram for two dimensional data. If you strain with your eyes and some tricky software, you have an outside shot at three dimensions. For more than three dimensions, you are in Death Valley in July out'a gas and walking -- not quite hopeless yet but not in a good situation.
Sorry, on finding the distribution, really, essentially NOPE -- don't try that, basically can't do that. Blunt reality check: Yes, distributions exist but, no, usually can't find them and, actually, mostly work without knowing them. Maybe all you know is that variance is finite! This situation holds mostly true for real valued random variables. For multivariate valued random variables, e.g., vector valued, as the dimension rises above 2, are getting into trouble; before the dimension gets to be a dozen, are lost at sea; anything above a few dozen are lost in deep outer space without a paddle and hopeless.
> We don't have to believe the world obeys a certain model for this to work, we only have to believe that the world will behave the same way in the future as it does now.
Yes, classic texts on regression and some of what J. Tukey wrote about how regression has commonly been used in practice should have made this point more clear and gone ahead and been more clear about over fitting, etc.
Also, with this approach, given some positive integer n and some data (y_i, x_i), i = 1, 2, ..., n, maybe if we keep looking, e.g., fit with ln(x_i), sin(x_i), exp(x_i), ..., eventually might get a good fit. Then do we believe that the system behaved this way in the past so with meager assumptions will do so in the future? I confess, this is a small, philosophical point.
Better points:
(A) Just from the course, for any model building, we have a dozen, maybe more, efforts at TIFO -- try it and find out. That's a bit wasteful of time -- right, automate that, have the computers do all the usual dozen. Hmm.
(B) We are so short on assumptions, we will have to be short on consequences of theorems about what what we have or might have or might do. Again, we are like Ptolemy instead of Newton.
Yes, I neglected to mention the famous and appropriate "curse of dimensionality".
> What other methods would you have liked to see in the course?
As I wrote, there's a lot on the shelves of the research libraries. No one course can cover all that might be useful for solving real problems.
So, another approach is first to pick the problem and then pick the solution techniques. Even for some solution techniques that might be, even have been, useful, tough to say that they should be in a course. So, maybe what should be in a course is a start, lots of powerful, general views, that enable, given a particular problem, picking a suitable solution technique. And, even with this broad approach, it might be productive to have some specializations.
I have a gut feeling that some of the computer science community took regression a bit too seriously.
Sure, I really like and respect Leo Breiman and can respect his work on classification and regression trees (do have his book but never bothered to use or get the software), but I like his text Probability published by SIAM a lot more, have it, have studied it, along with Neveu, Chung, Loeve, Skorohod, etc.
So, to suggest a prediction technique not much related to regression? Okay: Blue has some subsurface ballistic missile (SSBN) boats, i.e., submarines that while submerged can fire intercontinental ballistic missiles, each with several nuclear warheads that can be independently targeted.
These SSBNs hide in the oceans of the world. To help remain hidden, they execute some approximation to Brownian motion. E.g., even if Red gets some data on where a Blue SSBN is, a day or so later Red will have much less good data on where it is.
If Red can find an SSBN, then sinking it is easy enough -- just explode a nuke over it. But if sink one, then that starts a nuclear war, and the remaining SSBNs might shoot. So, Red could try to find and sink all the SSBNs at once -- finding them all, all at the same time, is unlikely, and mounting a search effort that would promise to do that would be provocative.
But, for considering world war scenarios, maybe there would be a nuclear war but limited just to sea. Then could sink SSBNs one at a time as find them. In this scenario, how long might they last?
Getting a good answer is your mission, and you have to accept it! Gee, we could try regression analysis, maybe neural networks?
Or: How to find things at sea was considered back in WWII -- that was an important question then, also. There was a report OEG-56 by B. Koopman. The report argued, with some clever applied math, that with an okay approximation the encounter rates were as in a Poisson process with the encounter rate a known algebraic function, in the report, in terms of the area of the ocean (for real oceans, neglect the details of the shape), speed of the target, speed of the searcher, and detection radius, all IIRC.
So, ..., can argue that for the war at sea, encounters are like various Poisson processes. Altogether the Red-Blue inventory declines like a multi-dimensional, continuous time, discrete state space (huge state space) Markov process subordinated to a Poisson process (for a good start, see the E. Cinlar Introduction to Stochastic Processes).
So, given the speeds, radii, and what happens -- one dies, other dies, both die, neither die -- one on one, Red-Blue encounters, have all the data needed.
Now with that data, we can make some predictions. Sure, there's a closed form solution, but the "curse of dimensionality" gets in the way. But, Monte Carlo and the strong law of large numbers to the rescue: Write software for a few days and run it for a few minutes, run off 1000 sample paths, average, and get the prediction with some decently good confidence intervals.
I did that. Later the work was sold to a leading intelligence agency. I could tell you which one, but then I'd have to ...!
"Look, Ma, no regression analysis!".
I've got some more, but this post is getting long!
Lesson: There's a lot more to applied statistics than empirical curve fitting.