Introduction to Support Vector Machines in Machine Learning (opens in new tab)

(ranvir.xyz)

166 pointsrangerranvir6y ago87 comments

87 comments

51 comments · 7 top-level

astrophysician6y ago· 29 in thread

If you're new to ML or datascience, I would recommend working to build a strong basis in Bayesian statistics. It will help you understand how all of the "canonical" ML methods relate to one another, and will give you a basis for building off of them.

In particular, aspire to learn probabilistic graphical models + the libraries to train them (like pyro, tensorflow probability, Edward, Stan). They have a steep learning curve, especially if you're new to the game, but the reward is great.

All of these methods have their place. SVM's have their place, but also aren't great for probability calibration and non-linear SVM's like every single kernel method can scale absolutely terribly. Neural networks have their place, sometimes as a component of a larger statistical model, sometimes as a feature selector, sometimes in and of themselves. They're also very often the wrong choice for a problem.

Don't fall into the beginner trap: sometimes people tend to mistake 'what is the hottest research topic' for 'what is the right solution to my problem given my constraints, (data limitations, time limitations, skill limitations, etc.)'. Be realistic, don't use magical thinking, and have a strong basis in statistics to weed out the beautiful non-bullshit from the bullshit that is frustratingly prevalent (everyone and their mother is an ML expert today).

EDIT: I want to also clarify: I don't mean to suggest the author is new to ML, I just mean this as general advice for anyone coming here who is new to DS/ML. The article looks great!

unishark6y ago

Personally I'd advise against both SVM's and Bayesian methods for a beginner. Bayesian statistics is very much the deep end of the pool. Graphical models and Bayesian methods generally may make a comeback but such approaches have been superseded by other methods for good reasons, i.e. scaling.

A strong basis in statistics is certainly a great thing, but that can be maximum likelihood plus Bayes law (i.e. "MAP" estimation which is more of a hack to ML than an actual Bayesian method) and provide the big picture for almost everything.

Meanwhile a strong basis in "deterministic methods", as an alternative way to spend that learning effort, has its own rewards. The training algorithms for deep learning are also the hottest algorithm research area in machine learning, and are certainly applicable beyond deep learning. For that matter a thorough understanding of SVM delves into convex optimization, an extremely powerful framework as well.

astrophysician6y ago

> Personally I'd advise against both SVM's and Bayesian methods for a beginner. Bayesian statistics is very much the deep end of the pool.

I don’t know, I think it depends on what you mean by Bayesian. I would say understanding loss functions and regularization requires some understanding of Bayesian stats (just knowing that it comes from log p(x|q) + log p(q) and what both of those terms mean).

> Graphical models and Bayesian methods generally may make a comeback but such approaches have been superseded by other methods for good reasons, i.e. scaling

Can you be more specific here? It sounds like you’re talking about a particular problem or class of methods. PGMs/Bayesian methods can mean basically anything from logistic regression to running HMC on some hierarchical model using 10,000 CPU hours. I just mean aspiring to learn PGMs will force you to quickly learn and gain a deeper understanding of and appreciation for Bayesian stats, and then you can build on that for years and years. But it depends on what you’re interested in doing —- there’s a difference between model building and inference; you can spend your whole life using the same loss function and just focus on making your NN architecture better, you don’t need much Bayesian stats to do that.

> i.e. "MAP" estimation which is more of a hack to ML than an actual Bayesian method

Huh? Maybe we mean different things by Bayesian — the mode of your posterior seems pretty Bayesian to me!

> Meanwhile a strong basis in "deterministic methods", as an alternative way to spend that learning effort, has its own rewards. The training algorithms for deep learning are also the hottest algorithm research area in machine learning, and are certainly applicable beyond deep learning. For that matter a thorough understanding of SVM delves into convex optimization, an extremely powerful framework as well.

Would agree that optimization is an important part of ML/DS, but since nowadays virtually all of the most popular optimization algorithms are available at our fingertips in e.g. pytorch, I would still think its better to start with trying to build a fundamental understanding of how to frame problems. But that’s colored by my own experience and background, people’s priorities should be different depending on what they want to do.

3 more replies

astrofinch6y ago

>The training algorithms for deep learning are also the hottest algorithm research area in machine learning, and are certainly applicable beyond deep learning.

The lore I've heard is that most new deep learning training algorithms (optimization algorithms) only work better on particular special cases, and it is hard to do better than the established algorithms in general.

I'm also not sure why you're saying they're applicable beyond deep learning--how do you plan to train a PGM or SVM using Adam?

2 more replies

rjvir6y ago

Different people learn in different ways, but personally I’ve had more success with the opposite approach, ie “top-down”.

As in, rather than learning in depth all the low level parts then finally putting it together at the end, start with a surface high-level understanding of a working prototype then expand into the details of how everything works inside.

In the case of ML, this could mean starting with a 5 line SciKit-learn prototype of a random forest model, seeing some working predictions, then expanding knowledge from there - what data is going in and what is coming out? What’s a classifier? What’s a decision tree? Etc

antipaul6y ago

I support this learning method. Having a map of the key concepts, as well as visceral feel for them via code, will keep up the motivation.

This would be in contrast to picking up one of the plethora of “ML” textbooks that mostly only describe the math behind all the algorithms. Which is not where you should begin, in my view (years of teaching experience). The use of such textbooks is as a reference to fill in details once your are curious about them.

And more than anything, the best way to learn practical ML is to “apprentice” to some experienced practitioners or team who are willing to act as mentors.

astrophysician6y ago

Agree — people should do what they want and have fun learning. Its just a suggestion that’s also colored by my own experience. I will argue that if you’re going for a job in data science it is a bit more of a serious suggestion since you need to be able to know how to answer statistical questions and understand your assumptions, and you really do need to understand Bayesian stats for that (nothing state of the art here, don’t worry if you don’t know what a PGM is, I just mean basics).

rangerranvirOP6y ago

Great Point! I do read article where someone builds such cool stuff every now and then to keep in mind what's coming.

Anyway whatever works. Ultimate aim is to learn and have fun.

bitL6y ago

Starting with PGMs would kill 99.9% of aspiring ML practitioners. Classes related to PGM at Stanford and MIT are considered to be some of the most difficult ones. I'd rather recommend to start with something they are enthusiastic about and once they become sufficiently advanced, to naturally learn (H)PGM.

salty_biscuits6y ago

I think they said "aspire to", not start with. I would start with a naive bayes classifier, understand why it is naive and then go from there.

1 more reply

sjg0076y ago

PGMs are graphs mapped into probability theory and vice versa.

uoaei6y ago

Great comments. I heartily agree and support the statement about probabilistic graphical models. Just to add a couple more facets to this perspective:

'State of the art' does not always mean 'best for your task', and in fact lately depending on your field SOTA sometimes simply means 'unaffordable' for anyone whose budget is under 1 million dollars.

Try linear methods first.

Ensembles of decent models are usually good models. The point above about probability calibration can be at least somewhat mitigated by using ensemble averages.

Don't just assume "the $MODEL will figure it out" if you give it shitloads of degrees of freedom. Machine learning efficiency all comes down to efficiency of representation, and feature engineering can achieve huge payoffs if/when you incorporate domain knowledge and expertise.

Once you gain a perspective into the "universality" of statistical methods, optimization, and Bayesian probability theory, your work will become a lot easier to reason about. As an example, try to see if you can explain why least-squares fit results from the assumption that model residuals are normally distributed (and what connections this may have to statistical physics!).

ska6y ago

> Try linear methods first

This bears repeating.

1 more reply

rangerranvirOP6y ago

Great Points. Really appreciated. Will have to put extra effort to learn about the feature engineering part of the problem.

Also, if you know a few things about the data it becomes a little easier to explain what your model is doing and why it is producing those results.

Found a good resource which explained the trust component: https://arxiv.org/pdf/1602.04938.pdf

tesseract26y ago

Thanks for this insight. Can you kindly also suggest a good book for someone to start with Bayesian Statistics? I could really use a suggestion about first and second book on this.

About Probabilistic Graphical Models, is there book other than Daphne Koller's book that you would suggest?

astrophysician6y ago

I think PGM's are covered by a lot of "standard" ML texts -- someone else mentioned Murphy's book which is great and is humongous but is a good reference for pretty much every method under the sun.

Bishop's Pattern Recognition and Machine Learning has a chapter thats free online: https://www.microsoft.com/en-us/research/wp-content/uploads/...

1 more reply

uoaei6y ago

Introduction to Statistical Learning

https://faculty.marshall.usc.edu/gareth-james/ISL/

Elements of Statistical Learning

https://web.stanford.edu/~hastie/ElemStatLearn/

Machine Learning: A Probabilistic Perspective

https://mitpress.mit.edu/books/machine-learning-1

3 more replies

solar_perplexus6y ago

I took a course on Applied Bayesian Statistics taught by David Draper in grad school and we covered Bayesian Data Analysis (Gelman et Al.) http://www.stat.columbia.edu/~gelman/book/ and Probability Theory and tbe Logic of Science by Ed Jaynes: https://www.amazon.com/dp/0521592712/ref=cm_sw_r_em_apa_i_v3...

The former is a much recommended book since it's very comprehensive and builds everything from the ground up and was the basis for the entire course. The latter is a beast of it's own and we simply covered what was effectively the first chapter as part of the course.

phunge6y ago

For Bayesian stats, "Statistical Rethinking" by McElreath is a masterpiece.

1 more reply

tesseract26y ago

Thanks, everyone, for your kind suggestions. Much appreciated.

rangerranvirOP6y ago

antipaul6y ago

Can you give a concrete example how a Bayesian concept provides practical help in ML model training or another ML task?

Want to whet my appetite for your suggestion.

bencw6y ago

A common type of example involves relatively small or uninformative datasets. Say you flip a coin a few times and only get heads. Your maximum likelihood (frequentist) estimate is that the coin will always land heads. In a Bayesian setting, if you have a (say uniform) prior on the probability that the coin lands heads, your maximum a posteriori estimate of this probability will be non-zero, but will get continue to get smaller if you continue only seeing heads.

The above example is contrived, but makes more sense in the case of language modelling. Since a bag-of-words vector, containing say counts of words seen in a document, is typically sparse (most documents only contain a limited portion of the full vocabulary), a frequentist estimate of word probability will say that certain words can never occur, just because it's never seen them. The Bayesian estimate will still assign some non-zero chance of seeing that word.

Practically speaking, this leads to the idea of "smoothing" in tf-idf (text-frequency-inverse-document-frequency) vectors, by adding 1 to document frequencies. You don't need Bayesian statistics to do this, but maybe you never would have thought of it otherwise!

2 more replies

astrophysician6y ago

* understanding where loss functions and regularization terms come from allows you to reason about the right choice for your problem and possibly to extend/tweak them to suit your needs. Are you working directly with probabilities? Then maybe you don’t want to use L2 regularization (Gaussian prior) but a beta prior or something with the right support. Are you modeling a Poisson rate (e.g. how many people buy my product for every dollar I spend on advertising), then use a Poisson likelihood (loss function would be negative log likelihood).

* do you want to have your NN model your uncertainty as well as the mean? How do you incorporate that into the loss function? Hint: loss = (yhat- y)^2/sigma_hat^2 is missing a term but you wouldn’t know that if you don’t come from Bayesian stats.

* the rabbit hole goes as deep as you want. Understanding Bayesian stats removes a lot of the “ad hoc” and intuitive guesswork that goes into ML when you don’t have a solid statistical foundation for what you’re doing.

sibmike6y ago

The most recent example has been supply failure detection in sales timeseries data with intermittent demand. Ended up using approach described in The Longest Run of Heads by Mark F. Schilling, which is outstandingly well written stats paper and a pleasure to read.

mrslave6y ago

Requesting best book(s) on probability estimation: techniques, model accuracy, and strategies in applying them (e.g. markets, marketing, business operations)?

jononor6y ago

Do you have any learning resources to recommend for Baysian ML? Especially interested in more applied stuff, and ideally temporal and spatial modeling.

rangerranvirOP6y ago

Thanks for the advice. Will definitely try to follow that. I was trying to learn basics of statistics and went through most of the intro to statistical learning. Will complete the rest in few days.

I am more of a book person, if you have any other resource for probabilistic graphical models, please share here.

astrophysician6y ago

Nice! There are many books that cover this, even the docs for Pyro/other libraries are useful, it just depends on your preference for how material is presented + your background.

Bishop's "Pattern Recognition and Machine Learning" has a chapter on PGM's that's free online: https://www.microsoft.com/en-us/research/wp-content/uploads/...

Murphy's "Machine Learning: A Probabilistic Perspective" is another behemoth that covers this stuff, but it's really just your preference.

I say "aspire" because (1) depending on your background, it will likely be something that takes awhile to internalize and really understand, and you will probably realize many times over that you thought you understood something that you actually didn't (2) by learning PGM's, you learn a lot of Bayesian statistics as a side effect, hence why even learning a little bit about them is rewarding.

Once you learn a bit, I would use Pyro/other libraries and try to actually build PGM's for toy problems (or non-toy problems too..) because (1) it will force you to admit to yourself that you don't understand something, (2) the documentation for a lot of these libraries is also useful learning material, and (3) you will see once you learn these libraries that it is fairly easy to do something that would be astoundingly complex if you were to try and do it by hand.

You can basically build most standard ML algorithms as a PGM, so e.g. you can try to do logistic regression as a PGM and compare the results to scikit-learn.

sriram_malhar6y ago

Can you please suggest some resources for a math-comfortable beginner?

smbrian6y ago· 8 in thread

Stay away, in my opinion. I spent a year supporting a SVM in a production machine learning application, and it made me wish the ML research community hadn't been so in love with them for so long.

They're the perfect blend of theoretically elegant and practically impractical. Training scales as O(n^3), serialized models are heavyweight, prediction is slow. They're like Gaussian Processes, except warped and without any principled way of choosing the kernel function. Applying them to structured data (mix of categorical & continuous features, missing values) is difficult. The hyperparameters are non-intuitive and tuning them is a black art.

GBMs/Random Forests are a better default choice, and far more performant. Even simpler than that, linear models & generalized linear models are my go-to most of the time. And if you genuinely need the extra predictiveness, deep learning seems like better bang for your buck right now. Fast.ai is a good resource if that's interesting to you.

_pastel6y ago

100% agree. What's the use case for SVMs?

Linear models are simpler. GBMs are more powerful, more flexible, and faster.

Every ML course I took had 3 weeks of problem sets on VC dimension and convex quadratic optimization in Lagrangian dual-space, while decision tree ensembles were lucky to get a mention. Meanwhile GBMs continue to win almost all the competitions where neural nets don't dominate.

I suspect my professors just preferred the nice theoretical motivation and fancy math.

bearzoo6y ago

Svms are, by default, linear models. The decision boundary in the Svm problem is linear and since it’s the max margin we may enjoy nice generalization properties (as you probably know).

You probably also know that decision tree boundaries are non Linear And piecewise. It’s not so straightforward to find splits on continuous features.

Ie If the data is linearly separable then why not. Even using hinge loss with nns is not uncommon.

You probably see gbms winning a lot of competitions compared to svms because a lot of competitions may have a lot of data and non linear decision boundaries. some problems don’t have these characteristics.

Der_Einzige6y ago

Kernel function is simple - Are you in a high dimensional space? If so, choose linear kernel. Else? Choose the most non-linear one you can (usually a guassian or RBF). I suppose quadratic and the other kernals are useful if what your modeling looks like that but in practice that is rare.

Prediction is not that slow with linear SVMs especially not compared to something like K-NN. The main hyperparamaters which matter are the "C" value and maybe class weights if you have recall or precision requirements. The C value is something that should be grid-searched, but you might as well be grid-searching everything that matters on every ML algorithm and in this regard SVMs are fast to iterate over (because the C value is all that matters).

Applying categorical and continuous features is not difficult if you choose to do it in anything more sophisticated than sklearn. Also, pd.get_dummies() exists (though it may lead to that slow prediction you're concerned about)

You're most likely right with GBM or Random Forests - though they can have all sorts of issues with parallelism if you're not on the right kind of system. You talk about linear models but SVMs are usually using linear kernals anyway and are a generalization of linear models (including lasso and ridge regression models).

smbrian6y ago

Agreed -- linear SVMs, especially in text processing applications, is the one area where they are a natural fit. All their attributes complement the domain. Linear SVMs also have desirable performance characteristics.

But at that point, they also have a lot in common with linear models. Those also seem practical in that domain (though I have less experience here, tbh). And performant, when using SGD + feature hashing like e.g. vowpal wabbit.

My beef with non-linear kernels and structured data is a longer discussion, but I find kernel methods for structured data (which is usually high-dimension but low-rank -- lots of shared structure between features, shared structure between missingness of features) to be highly problematic.

snovv_crash6y ago

> Prediction is not that slow with linear SVMs especially not compared to something like K-NN.

Provided your structural dimensionality is below about 10 (ie. 10 dominant eigenvalues for your features), then KNN can be O(log(N)) for prediction via a well designed Kd-Tree.

KNN is also really simple to understand, and to design features for. It also never really tends to throw up surprises, which for production is the kind of thing you want. Most importantly, the failures tend to 'make sense' to humans, so you stay out of the uncanny valley.

exegete6y ago

I’d agree on the training time but your serialized model should be small on disk since only the support vectors are needed for inference. At least with my experience that has been true.

MaxBarraclough6y ago

So you're saying to stay away from SVMs, rather than to stay away from this particular tutorial?

smbrian6y ago

Sorry, I should've been clearer! Beginner to ML? Stay away from SVMs.

This tutorial looks good, and well written.

1 more reply

bitforger6y ago· 4 in thread

ITT: Whether SVMs are still relevant in the deep learning era. Some junior researchers will say neural networks are all you need. Industry folks will talk about how they still use decision trees.

Personally, I'm quite bullish on the resurgence of SVMs as SOTA. What did it for me was Mikhail Belkin's talk at IAS.[1]

[1] https://m.youtube.com/watch?index=15&list=PLdDZb3TwJPZ5dqqg_...

stu2b506y ago

I mean NNs are still quite bad at low n tabular data (and they may always be), which is honestly how a lot of real life data is, so there is clearly a need for not a neural network.

I feel like I've seem more tree ensembles in the wild than SVMs, though.

rangerranvirOP6y ago

Anyway the idea of NNs was introduced to work on data which a simple human brain couldn't make sense of.

For more general tabular data, like trees, regression and even rule based models are more realistic.

ma2rten6y ago

I don't have time watch the video could you summarize why you think SVMs will become SOTA and on which problem?

rangerranvirOP6y ago

Thanks for sharing. You actually shared a full playlist. Won't be able to get up before finishing a few of them.

zetazzed6y ago· 2 in thread

Interestingly, a top Kaggler (Ahmet) just posted a nice contest solution with SVMs for the TReNDS Neuroimaging contest: https://www.kaggle.com/aerdem4/rapids-svm-on-trends-neuroima...

amrrs6y ago

Interestingly they're also popularizing RAPIDS of Nvidia as part of it. Thanks for sharing. Never imagined SVM in such scenario!

rangerranvirOP6y ago

Nice, Thanks for sharing.

starchild_30016y ago· 1 in thread

I've been an ML practioner since 2009. I've used every method imaginable or popular, I think. With the exception of non-linear SVMs. Linear SVM => All good, just the hingle loss optimization. Non-linear SVM, a bit of overkill with basis expansion. Just too slow, or too complex a model?

My impression: SVMs are more of theoretical interest than practical interest. Yeah, learn your statistics. Loss functions. Additive models. Neural nets. Linear models. Decision trees, kNNs etc. SVM is more of a special interest, imho.

rangerranvirOP6y ago

We can definitely learn a piece from such an experienced practitioner. Thanks for sharing, I think your intuition matches with the other experienced once in the comments.

rusty-rust6y ago

Large parts of this blog are straight copy-paste from “An introduction to statistical learning” by Gareth James et. al.

rangerranvirOP6y ago

Since, I am here. I would like to take a small feedback about the general structure of the website and how it feels.

If someone has a suggestion on how I can improve the user experience feel free to hop in and let me know.

j / k navigate · click thread line to collapse

87 comments

51 comments · 7 top-level

astrophysician6y ago· 29 in thread

EDIT: I want to also clarify: I don't mean to suggest the author is new to ML, I just mean this as general advice for anyone coming here who is new to DS/ML. The article looks great!

unishark6y ago

astrophysician6y ago

> Personally I'd advise against both SVM's and Bayesian methods for a beginner. Bayesian statistics is very much the deep end of the pool.

> Graphical models and Bayesian methods generally may make a comeback but such approaches have been superseded by other methods for good reasons, i.e. scaling

> i.e. "MAP" estimation which is more of a hack to ML than an actual Bayesian method

Huh? Maybe we mean different things by Bayesian — the mode of your posterior seems pretty Bayesian to me!

3 more replies

astrofinch6y ago

>The training algorithms for deep learning are also the hottest algorithm research area in machine learning, and are certainly applicable beyond deep learning.

I'm also not sure why you're saying they're applicable beyond deep learning--how do you plan to train a PGM or SVM using Adam?

2 more replies

rjvir6y ago

Different people learn in different ways, but personally I’ve had more success with the opposite approach, ie “top-down”.

antipaul6y ago

I support this learning method. Having a map of the key concepts, as well as visceral feel for them via code, will keep up the motivation.

And more than anything, the best way to learn practical ML is to “apprentice” to some experienced practitioners or team who are willing to act as mentors.

astrophysician6y ago

rangerranvirOP6y ago

Great Point! I do read article where someone builds such cool stuff every now and then to keep in mind what's coming.

Anyway whatever works. Ultimate aim is to learn and have fun.

bitL6y ago

salty_biscuits6y ago

I think they said "aspire to", not start with. I would start with a naive bayes classifier, understand why it is naive and then go from there.

1 more reply

sjg0076y ago

PGMs are graphs mapped into probability theory and vice versa.

uoaei6y ago

Great comments. I heartily agree and support the statement about probabilistic graphical models. Just to add a couple more facets to this perspective:

'State of the art' does not always mean 'best for your task', and in fact lately depending on your field SOTA sometimes simply means 'unaffordable' for anyone whose budget is under 1 million dollars.

Try linear methods first.

Ensembles of decent models are usually good models. The point above about probability calibration can be at least somewhat mitigated by using ensemble averages.

ska6y ago

> Try linear methods first

This bears repeating.

1 more reply

rangerranvirOP6y ago

Great Points. Really appreciated. Will have to put extra effort to learn about the feature engineering part of the problem.

Also, if you know a few things about the data it becomes a little easier to explain what your model is doing and why it is producing those results.

Found a good resource which explained the trust component: https://arxiv.org/pdf/1602.04938.pdf

tesseract26y ago

Thanks for this insight. Can you kindly also suggest a good book for someone to start with Bayesian Statistics? I could really use a suggestion about first and second book on this.

About Probabilistic Graphical Models, is there book other than Daphne Koller's book that you would suggest?

astrophysician6y ago

I think PGM's are covered by a lot of "standard" ML texts -- someone else mentioned Murphy's book which is great and is humongous but is a good reference for pretty much every method under the sun.

Bishop's Pattern Recognition and Machine Learning has a chapter thats free online: https://www.microsoft.com/en-us/research/wp-content/uploads/...

1 more reply

uoaei6y ago

Introduction to Statistical Learning

https://faculty.marshall.usc.edu/gareth-james/ISL/

Elements of Statistical Learning

https://web.stanford.edu/~hastie/ElemStatLearn/

Machine Learning: A Probabilistic Perspective

https://mitpress.mit.edu/books/machine-learning-1

3 more replies

solar_perplexus6y ago

phunge6y ago

For Bayesian stats, "Statistical Rethinking" by McElreath is a masterpiece.

1 more reply

tesseract26y ago

Thanks, everyone, for your kind suggestions. Much appreciated.

rangerranvirOP6y ago

antipaul6y ago

Can you give a concrete example how a Bayesian concept provides practical help in ML model training or another ML task?

Want to whet my appetite for your suggestion.

bencw6y ago

2 more replies

astrophysician6y ago

sibmike6y ago

mrslave6y ago

Requesting best book(s) on probability estimation: techniques, model accuracy, and strategies in applying them (e.g. markets, marketing, business operations)?

jononor6y ago

Do you have any learning resources to recommend for Baysian ML? Especially interested in more applied stuff, and ideally temporal and spatial modeling.

rangerranvirOP6y ago

Thanks for the advice. Will definitely try to follow that. I was trying to learn basics of statistics and went through most of the intro to statistical learning. Will complete the rest in few days.

I am more of a book person, if you have any other resource for probabilistic graphical models, please share here.

astrophysician6y ago

Nice! There are many books that cover this, even the docs for Pyro/other libraries are useful, it just depends on your preference for how material is presented + your background.

Bishop's "Pattern Recognition and Machine Learning" has a chapter on PGM's that's free online: https://www.microsoft.com/en-us/research/wp-content/uploads/...

Murphy's "Machine Learning: A Probabilistic Perspective" is another behemoth that covers this stuff, but it's really just your preference.

You can basically build most standard ML algorithms as a PGM, so e.g. you can try to do logistic regression as a PGM and compare the results to scikit-learn.

sriram_malhar6y ago

Can you please suggest some resources for a math-comfortable beginner?

smbrian6y ago· 8 in thread

Stay away, in my opinion. I spent a year supporting a SVM in a production machine learning application, and it made me wish the ML research community hadn't been so in love with them for so long.

_pastel6y ago

100% agree. What's the use case for SVMs?

Linear models are simpler. GBMs are more powerful, more flexible, and faster.

I suspect my professors just preferred the nice theoretical motivation and fancy math.

bearzoo6y ago

Svms are, by default, linear models. The decision boundary in the Svm problem is linear and since it’s the max margin we may enjoy nice generalization properties (as you probably know).

You probably also know that decision tree boundaries are non Linear And piecewise. It’s not so straightforward to find splits on continuous features.

Ie If the data is linearly separable then why not. Even using hinge loss with nns is not uncommon.

Der_Einzige6y ago

smbrian6y ago

snovv_crash6y ago

> Prediction is not that slow with linear SVMs especially not compared to something like K-NN.

Provided your structural dimensionality is below about 10 (ie. 10 dominant eigenvalues for your features), then KNN can be O(log(N)) for prediction via a well designed Kd-Tree.

exegete6y ago

I’d agree on the training time but your serialized model should be small on disk since only the support vectors are needed for inference. At least with my experience that has been true.

MaxBarraclough6y ago

So you're saying to stay away from SVMs, rather than to stay away from this particular tutorial?

smbrian6y ago

Sorry, I should've been clearer! Beginner to ML? Stay away from SVMs.

This tutorial looks good, and well written.

1 more reply

bitforger6y ago· 4 in thread

ITT: Whether SVMs are still relevant in the deep learning era. Some junior researchers will say neural networks are all you need. Industry folks will talk about how they still use decision trees.

Personally, I'm quite bullish on the resurgence of SVMs as SOTA. What did it for me was Mikhail Belkin's talk at IAS.[1]

[1] https://m.youtube.com/watch?index=15&list=PLdDZb3TwJPZ5dqqg_...

stu2b506y ago

I mean NNs are still quite bad at low n tabular data (and they may always be), which is honestly how a lot of real life data is, so there is clearly a need for not a neural network.

I feel like I've seem more tree ensembles in the wild than SVMs, though.

rangerranvirOP6y ago

Anyway the idea of NNs was introduced to work on data which a simple human brain couldn't make sense of.

For more general tabular data, like trees, regression and even rule based models are more realistic.

ma2rten6y ago

I don't have time watch the video could you summarize why you think SVMs will become SOTA and on which problem?

rangerranvirOP6y ago

Thanks for sharing. You actually shared a full playlist. Won't be able to get up before finishing a few of them.

zetazzed6y ago· 2 in thread

Interestingly, a top Kaggler (Ahmet) just posted a nice contest solution with SVMs for the TReNDS Neuroimaging contest: https://www.kaggle.com/aerdem4/rapids-svm-on-trends-neuroima...

amrrs6y ago

Interestingly they're also popularizing RAPIDS of Nvidia as part of it. Thanks for sharing. Never imagined SVM in such scenario!

rangerranvirOP6y ago

Nice, Thanks for sharing.

starchild_30016y ago· 1 in thread

rangerranvirOP6y ago

We can definitely learn a piece from such an experienced practitioner. Thanks for sharing, I think your intuition matches with the other experienced once in the comments.

rusty-rust6y ago

Large parts of this blog are straight copy-paste from “An introduction to statistical learning” by Gareth James et. al.

rangerranvirOP6y ago

Since, I am here. I would like to take a small feedback about the general structure of the website and how it feels.

If someone has a suggestion on how I can improve the user experience feel free to hop in and let me know.

j / k navigate · click thread line to collapse