Matlab, R, and Julia: Languages for data analysis (opens in new tab)

(strata.oreilly.com)

123 pointsbugsbunnyak13y ago80 comments

80 comments

57 comments · 16 top-level

mark_l_watson13y ago· 9 in thread

I wouldn't be surprised if Octave (open source version of Matlab) doen't become very popular because a lot of Coursera classes use it for homework assignments.

I thought that Octave was an ugly little language at first, now I really like it - a great tool for doing linear algebra, data visualization, machine learning, neural networks, etc.

pav3l13y ago

Many use Matlab because of its amazing IDE, a great collection of toolboxes and remarkable speed, not so much because of its language features. Last I checked Octave was still missing all of that. If you can afford it, Matlab is usually well worth paying for. If not, other alternatives (Python, R) are much better in my opinion.

Gravityloss13y ago

When I used Matlab daily, I never used the ide. The command line and editor were good.

Matlab is awesome above all else because the design is coherent. Both the syntax and the standard libraries.

It is extremely quick to whip up anything and then turn that into a script and then into a software with functions (since functions can return many variables, and they also have zero overhead, you don't need any includes or requires, you just call them). Type conversions are practically never a problem, since they are sane and automatic. None of this 1+1.5 giving syntax error. Real booleans. Data input and output libraries just simply work like you would expect them to. ( A=imread('/home/gravityloss/abc.png') creates a width x height x 3 matrix with all the rgb values. No requires, includes, plugins, hunting and compiling libraries.). You don't need libraries to do a huge amount of stuff, but if you need them for something experimental, they work extremely easily.

You also rarely need stuff like loops since mass operations on data are native. If you as a newbie create a custom function for a scalar, there's good chance it will work for vectors or n-matrices automatically. This reduces the amount of error-prone housekeeping code for indices and lengths immensely. It's also much much faster than some looping in another scripting language. As a result, the code is often very readable as well.

There's help which actually returns something sensible when you type help, you can type help help or help command or search this or that, the help texts are actually very thoughtful and helpful too and not at all like Linux man pages... I could go on for hours on features that don't really exist anywhere else, even though everything's been in plain sight for decades in Matlab.

Julia's an awesome thing though, I hope it gets more traction...

1 more reply

tomrod13y ago

IDE: QtOctave Toolboxes: Exceeds those of Matlab (http://octave.sourceforge.net/packages.php) Speed: On par or better than Matlab generally, unless you use compile your Matlab with MEX (but then, Octave interfaces with C and Fortran for even more speedup).

Python is better generally than matlab/octave, and R I view as roughly on par.

*Source: 7 year veteran, octave and matlab user, python lover

1 more reply

keypusher13y ago

Agreed. Without the IDE, there is no reason not to use Python.

sandGorgon13y ago

In the real world, I mainly see people using R or SAS : especially in the web or ecommerce domain. I hardly ever see people using Matlab. Moreover, a lot of research papers publish R code.

If Octave's only claim to fame is being the poorer cousin of Matlab, I wonder why universities still use Octave to teach anything. I would much rather that they use R : which pretty much ensures that the students have an open source path to use it in the future.

cageface13y ago

The thing that most impressed me about R when I was dabbling with it was the quality of the graphing tools. It was very easy to create all kinds of very polished looking and expressive graphs. It was a lot more work to get comparable results in Octave and NumPy.

tomrod13y ago

Octave and Matlab are pretty easy for people with no programming whatsoever to pick up. Engineering and economics thrive in the Matlab world, especially in academia. It's a fairly good setup for them.

1 more reply

digitalzombie13y ago

I did coursera ML and Octave was whatever.

Doing coursera stat one and R is pretty easy. It remind me of PHP. Syntax wise, I don't know why it's just a feeling. This article made it a bit clearer. OOP was an after thought...

I'm getting more and more into R now. Hopefully one day Python.

So I guess what I'm saying is I think it's R that I wouldn't be surprise and that I have to respectfully disagree with your Octave statement.

tomrod13y ago

I've found R to be a close cousin of C syntactically.

tikhonj13y ago· 8 in thread

I wonder if there is room for some smaller languages optimized specifically for data analysis. In particular, I wonder how a carefully designed non-Turing-complete language would fare.

That would be a really cool project to work on: design a minimal language for expressing most types of data analysis at a higher level. If the language is sufficiently small and simple, I could see some very powerful tooling being possible for it.

Perhaps it might make sense to go even more specific: have a small language designed not just for data analysis but for analysis in a very specific vertical (say finance or bioinformatics). It would be awesome to let people express their ideas in terms of the domain and not worry about low-level details like loops.

pav3l13y ago

It seems like a good idea, but I wonder how actually useful highly specialized programming languages would be. Why?

1) Most data analysis tasks boil down to roughly the same things: accessing the data source --> data cleaning -->simple transformations --> (optional)stats/fitting/ML/specialized procedures-->pretty pictures and reporting.

2) Not everyone wants programming to be the main component of their job.

People who can take advantage of the flexibility that programming offers can usually take advantage of existing technologies. People who don't enjoy coding will always look for of-the-shelf solutions that have pretty GUI's with magic buttons that solve all their problems. I just don't think there is a huge market in between to be filled... in the domains that i've been exposed to anyway.

tel13y ago

I disagree with your supposition. I think highly specialized languages exist and are highly useful to non-programming communities. I think there is plenty of proof of their usefulness and room for growth.

For instance, consider illustrator products or d3? Both of these are specialized ("deep") tools for creating pictures that I've used extensively in the "pretty pictures and reporting" stage you outlined.

Also of serious note are BUGS[1], JAGS[2] and (recently) Stan[3] as small semi-declarative languages for MCMC model building, fitting, and checking.

SQL is an obvious example of a component of the "simple transformations" step.

[1] BUGS http://www.mrc-bsu.cam.ac.uk/bugs/ [2] JAGS http://mcmc-jags.sourceforge.net/ [3] Stan http://mc-stan.org/

darkarmani13y ago

I think you are going in the wrong direction. It's too easy to paint yourself in a corner that way.

> It would be awesome to let people express their ideas in terms of the domain and not worry about low-level details like loops.

Yes! I think you want to build this functionality on top of a powerful language to easily handle the dirty ETL work too. This is the reason lots of financial companies use python with scipy, numpy, pandas, etc, on top of it.

kd0amg13y ago

I've been working on a guaranteed-terminating language in the vein of APL/J. Primitive recursion is possible, but with the looping all implicit due to array shape, infinite looping is impossible. The issue I'm not sure how to handle is that some algorithms take the form, "repeat the following until convergence: ..." with no obvious way for a machine to prove that convergence will eventually happen.

tomrod13y ago

I thought this might be the APL language. Now I'm not so sure... But it is, there looks to be some loops here

http://www.apl.jhu.edu/~hall/Lisp-Notes/Loop-Summary.html

1 more reply

tel13y ago

I'm working on this direction in healthcare. I want to make a little specialized, non-Turing complete language for handling evented data streams from patients.

I think there's a lot of power in certain kinds of non-Turing completeness. Email me if you want to talk about it.

beagle313y ago

Stata is close to your ideal - it is popular with business and economists. However, no one can ever resist Turing completeness - that would require shelling out to another language for that one small thing every project needs (a different thing for every project)

marshallp13y ago

Such a thing was already invented a long time ago and widely celebrated - SQL. It eliminated the need for loops, was designed for ease of use, had a lot of cool features added like ACID, took over the world and ran every kind of business and website. Sadly, because it was invented too long ago (1970s) many people thought of it as uncool and failed to realize it's absolute awesomeness.

lorenzfx13y ago· 6 in thread

python fanboy here: "[python is] not as tuned to numerics as MATLAB": if you build numpy with ATLAS there is, in my experience, hardly ever any noticeable speed difference between numpy and MATLAB

aleyan13y ago

" Python a compelling alternative: not as tuned to numerics as MATLAB, or to stats as R, or as fast or elegant as Julia "

The part about python not being as fast as Julia jumped at me. Wes McKinney's benchmarks show that python is faster than Julia for numerics: http://wesmckinney.com/blog/?p=475

EDIT: should not have said "python faster than Julia". They are comparable because the slow bits get done in BLAS anyway.

StefanKarpinski13y ago

A couple of nits...

Cython is actually what is faster than Julia in Wes' comparison, not Python. Cython looks kinda, sorta like Python, but it is actually a static language with C-like types (but quite different syntax for those types), no polymorphism, and, afaict, ill-defined semantics. The best answer I seem to get about Cython's semantics is that Cython's semantics are whatever it does. I'm not alone in this complaint – Travis Oliphant expressed a similar concern at this year's SciPy (in this panel [http://www.youtube.com/watch?v=7i2vhoQY-K4], if I recall correctly), which is part of his motivation to work on Numba [https://github.com/numba/numba].

If you look at the comments on Wes' post, when I used the dot(x,y) function, which ships with Julia and uses a BLAS to compute the inner product just like the fastest "Python" version does, Julia is equally fast. That stands to reason – they're both just calling a BLAS.

Finally, that blog post is months old – since then Julia passed the milestone of being no slower than 2x C++ on its microbenchmarks suite [http://julialang.org/]. That's not a guarantee that all code is that fast, but most things we see can be pretty easily tweaked to get there (counterintuitively for those coming from Matlab, Python or R, usually by devectorizing the code rather than vectorizing it). And of course, there's a lot of room for improving Julia's performance, the compiler is still quite young and there are many optimizations that we haven't implemented. Basically, there's nothing but work standing in the way of reaching C or Fortran's speed across the board.

3 more replies

timtadh13y ago

Testing in my lab at CWRU by Gary Doran has indicated that correctly written Numpy code often outperforms the Matlab equivalent. I don't know about R and Julia but there isn't usually a speed bottleneck which can be fixed by moving to Matlab from Python.

pashields13y ago

Not after a small function change. See Wes' remarks in comments: http://wesmckinney.com/blog/?p=475#comment-521757980

greyboy13y ago

It also doesn't "cost" anywhere near MATLAB. (I couldn't check the price of MATLAB as you have to have/create an account).

keypusher13y ago

Even then it wouldn't be correct. MATLAB is most often used in lab/academic settings where licenses are bought in bulk. Even students get a discount.

1 more reply

dbecker13y ago· 5 in thread

When introducing python, the author writes "Despite the obvious advantages of MATLAB, R, and Julia, it’s also always worth considering what a general-purpose language can bring to the table."

Even with thousands of hours of experience in Matlab, R and Python... I'm not sure what "obvious advantage" Matlab and R share over Python.

chubot13y ago

Mainly it's the immediacy of Matlab and R, and the libraries. I've used all 3 and consider Python my main and favorite programming language.

But you can just type "R", do read.table(), and very quickly slice and dice your data. In Python just evaluating what package to use, then getting the packages, dealing with versioning issues, etc. kind of breaks the whole thing. Then you need to figure out what plotting library to use, etc. Having stuff built-in as a common base which all your coworkers share is important. I know there are common distributions like SciPy but they are not as common as R is.

Probably the bigger issue, as mentioned above, is that R has higher-level stuff like time series libraries that Python doesn't.

The main thing that's needed is a shell to glue all these languages together, to ease integration pain. Everybody wants the "one true language", but that's a pipe dream. Python's close but not quite. Julia is kind of falling prey to this fallacy too. The programming world is becoming more heterogeneous, and the solution is to have tools to make multiple languages work nicely together. Not to pretend that heterogeneity doesn't exist.

You can work really hard to get homogeneity on your one little project. Maybe that's what language wars are so heated. But the second you have to borrow code from another lab, or you acquire a company, or get acquired, you have a heterogeneous mix. Matlab, R, Python, or Julia will never suffice for all tasks. Non-trivial problems will always require a mix of them. You have to pick the solution according to the problem, and Matlab and R definitely are superior to Python for certain problems.

rcthompson13y ago

The straightforward answer, and the one that seems to be implied by the contrast with Python as a "general-purpose language", is that Matlab, R, and Julia are specifically designed for data analysis, mathematics, calculation, statistics, etc., while Python is not. But that's not a very concrete answer, of course.

tomrod13y ago

For me, depends on how new a person is to the language. Numpy certainly has a learning curve after spending a long time in Matlab.

dbecker13y ago

I hope the "obvious advantage" the author is speaking of isn't "it takes a while to learn numpy if you are used to Matlab."

That would be a pretty weak argument in my opinion.

1 more reply

darkarmani13y ago

Especially considering the author was using octave and not Matlab.

scottfr13y ago· 4 in thread

Personally I'm in love with R's data.frame. It allows very concise, robust and elegant manipulation and subsetting of a data set.

I wish every language would have such a built-in object type, I definitely feel its loss when I manipulate data in other languages such as Javascript or Mathematica.

dj_axl13y ago

> Personally I'm in love with R's data.frame. It allows very concise, robust and elegant manipulation and subsetting of a data set.

The performance is terrible though. For data of more than ~10,000 observations SQL is much better performance wise, is more robust, and is as good at subsetting. Although it's maybe not as elegant for everyone's definition of elegant.

minimax13y ago

What dataframe operations do you find to be slow? Usually I'm able to get huge performance wins by rewriting my slow R code in a loop free way (*apply and friends).

oddthink13y ago

If you haven't tried it, the data.table package is great. Mostly like a data.frame (in fact a subclass), but fast.

beagle313y ago

Python acquired that capability with Pandas in the last two years. Same nomenclature, tens of times faster.

travisoliphant13y ago· 3 in thread

I think the article tagline would be better "Domain Specific Languages for data analysis". Fortunately, the article does mention Python which is critical because new people might not recognize just how prevalent Python is for solving data analysis problems after reading this. The great work of the SciPy community has enabled Python to be used for all of the things that Matlab, R, and Julia can do. In addition, Python can integrate easily with these languages, so if you are a data analyst you need to learn Python.

xaa13y ago

> The great work of the SciPy community has enabled Python to be used for all of the things that Matlab, R, and Julia can do.

As much as I hate R and love Python, this is not entirely true (unless you count rpy2 as part of "Python"). R has many more statistical models and better plotting capability compared with Python. It also has a lot of domain-specific packages (for example, Bioconductor) that are not available in Python.

dbecker13y ago

Though Python doesn't have the library support that R has, it far exceeds what's available in Julie (and,depending what you are looking for, in Matlab as well)

2 more replies

achompas13y ago

Yeah, agreed. I'm a big, big fan of data analysis in Python, but there isn't a full-featured time series analysis library (statsmodels is almost there). R has at least two that I know of (and needed to hook into from Python using rpy2).

pav3l13y ago· 2 in thread

Here is a nice 4-year old still active discussion on pros and cons of different data analysis technologies: http://brenocon.com/blog/2009/02/comparison-of-data-analysis...

tomrod13y ago

HellMcFly--you're hellbanned so I couldn't reply directly. What is MPlus?

pav3l13y ago

Mplus is quite popular in social sciences. From what I understand its main functionality is fitting Latent Variable Models and structural equation modelling. I've never used it myself, but it can in fact do things for which it is hard to find R packages at this point.

myspy13y ago· 1 in thread

I have to create figures with Matlab and that's a pain in the ass. Changing XTickLabels, kills another part of the figure, and in general it's very hard to do a little more with figures.

But the basic data analysis is fine. The IDE has awful code completion and lacks more refinement in the editor.

keypusher13y ago

Python has a wrapper for matlab-style graphs called matplotlib, if you are interested in something else.

elchief13y ago· 1 in thread

Everyone loves to shit all over Java, but Mahout, RapidMiner, Weka, Hive, HBase are all written in it.

lsiebert13y ago

I've used Weka, and RapidMiner once. As I recall, RapidMiner seemed to be general purpose, but lots of posts were for using it for data mining stock data to build a model.

I think it would be interesting to see breakdowns of different software, and where they are used. Often times it seems to me that people just use the tools their peers and co-workers use, and people tend learn to like whatever they use most.

tvladeck13y ago· 1 in thread

Thought I'd ask since I'm learning Clojure - are there experiences worth sharing re: using Incanter in these types of settings?

paulbunn13y ago

I'm also learning Clojure and have been playing with Incanter. Seems quite a decent statistical library/environment. I had a few issues with lazy evaluation with the dynamic charting functions, but I think that has more to do with my inexperience with Clojure than a problem with Incanter. Also, I'm not to sure how active the project is?

agentq13y ago· 1 in thread

no love for J?

zem13y ago

J is more of a general-purpose array language than one specialised for numerical work.

lsiebert13y ago

You know what will be popular? whatever runs reasonably fast and helps you import and clean data quickly from a variety of sources.

Because the analysis is often the quickest part of being a data scientist. Coursera, as I recall, apparently cleans it's data, and also lets you easily import it.

In real life, data is messy, and messed up. You looking at birthdays from some website? expect a spike for whatever the default is... but that doesn't mean you can eliminate that data completely, because some people were presumably born Jan 1st.

You looking at birth years? I recall dealing with them in SAS... remember if it's four digit that you check for births occurring in the current and past century.

And hey... do you have two or more elements of data for an individual? 2% to 5% will probably be missing some element, and some will have wrong data. a zip code off by one, an address not in the city you are looking to geocode for, whatever. If you are lucky, it will be obvious stuff like that.

The life if the data scientist is mostly cleaning, formatting, and transferring data, with the occasional sweeeet analysis. Of course your analysis will probably give you nothing useful, because despite several thousand usable records, it's not clear if any element has a significant effect on the dependent variable you are looking at. If you are smart, maybe you can finagle an analysis based on a non parametric distribution or logistic regression.

Oh, and often the speed of your analysis running is inversely correlated with how easy it is to code and enter your data. There is a reason people use SAS, and it's not because of it's amazing IDE.

StefanKarpinski13y ago

This is a really excellent and well-balanced article. Very much captures the pluses and minuses of these various systems for data analysis.

rcthompson13y ago

One of my bioinformatics courses "required" MATLAB because the class project was based on a simulation framework called the COBRA Toolbox which was developed in MATLAB[1]. I didn't know who to ask about obtaining a MATLAB license, so instead I just got it to work in Octave and used that. I was pleasantly surprised at how little I had to tweak before the framework just worked in Octave, given that as far as I know everyone in the lab that develops the framework just uses MATLAB.

[1] http://opencobra.sourceforge.net/openCOBRA/Welcome.html

prakashk13y ago

Perl was mentioned in the article, but PDL (Perl Data Language) wasn't.

https://metacpan.org/module/PDL

PDL is the Perl Data Language, a perl extension that [...] includes fully vectorized, multidimensional array handling, plus several paths for device-independent graphics output.

PDL is fast, comparable and often outperforming IDL and MATLAB in real world applications. PDL allows large N-dimensional data sets such as large images, spectra, etc to be stored efficiently and manipulated quickly.

For integration with R, there are Statistics::R (https://metacpan.org/module/Statistics::R) and Statistics::useR (https://metacpan.org/module/Statistics::useR)

zem13y ago

surprising omission at the end - any mention of scipy should at least include a pointer to sage as well.

j / k navigate · click thread line to collapse

80 comments

57 comments · 16 top-level

mark_l_watson13y ago· 9 in thread

I wouldn't be surprised if Octave (open source version of Matlab) doen't become very popular because a lot of Coursera classes use it for homework assignments.

I thought that Octave was an ugly little language at first, now I really like it - a great tool for doing linear algebra, data visualization, machine learning, neural networks, etc.

pav3l13y ago

Gravityloss13y ago

When I used Matlab daily, I never used the ide. The command line and editor were good.

Matlab is awesome above all else because the design is coherent. Both the syntax and the standard libraries.

Julia's an awesome thing though, I hope it gets more traction...

1 more reply

tomrod13y ago

Python is better generally than matlab/octave, and R I view as roughly on par.

*Source: 7 year veteran, octave and matlab user, python lover

1 more reply

keypusher13y ago

Agreed. Without the IDE, there is no reason not to use Python.

sandGorgon13y ago

In the real world, I mainly see people using R or SAS : especially in the web or ecommerce domain. I hardly ever see people using Matlab. Moreover, a lot of research papers publish R code.

cageface13y ago

tomrod13y ago

1 more reply

digitalzombie13y ago

I did coursera ML and Octave was whatever.

Doing coursera stat one and R is pretty easy. It remind me of PHP. Syntax wise, I don't know why it's just a feeling. This article made it a bit clearer. OOP was an after thought...

I'm getting more and more into R now. Hopefully one day Python.

So I guess what I'm saying is I think it's R that I wouldn't be surprise and that I have to respectfully disagree with your Octave statement.

tomrod13y ago

I've found R to be a close cousin of C syntactically.

tikhonj13y ago· 8 in thread

I wonder if there is room for some smaller languages optimized specifically for data analysis. In particular, I wonder how a carefully designed non-Turing-complete language would fare.

pav3l13y ago

It seems like a good idea, but I wonder how actually useful highly specialized programming languages would be. Why?

2) Not everyone wants programming to be the main component of their job.

tel13y ago

Also of serious note are BUGS[1], JAGS[2] and (recently) Stan[3] as small semi-declarative languages for MCMC model building, fitting, and checking.

SQL is an obvious example of a component of the "simple transformations" step.

[1] BUGS http://www.mrc-bsu.cam.ac.uk/bugs/ [2] JAGS http://mcmc-jags.sourceforge.net/ [3] Stan http://mc-stan.org/

darkarmani13y ago

I think you are going in the wrong direction. It's too easy to paint yourself in a corner that way.

> It would be awesome to let people express their ideas in terms of the domain and not worry about low-level details like loops.

kd0amg13y ago

tomrod13y ago

I thought this might be the APL language. Now I'm not so sure... But it is, there looks to be some loops here

http://www.apl.jhu.edu/~hall/Lisp-Notes/Loop-Summary.html

1 more reply

tel13y ago

I'm working on this direction in healthcare. I want to make a little specialized, non-Turing complete language for handling evented data streams from patients.

I think there's a lot of power in certain kinds of non-Turing completeness. Email me if you want to talk about it.

beagle313y ago

marshallp13y ago

lorenzfx13y ago· 6 in thread

python fanboy here: "[python is] not as tuned to numerics as MATLAB": if you build numpy with ATLAS there is, in my experience, hardly ever any noticeable speed difference between numpy and MATLAB

aleyan13y ago

" Python a compelling alternative: not as tuned to numerics as MATLAB, or to stats as R, or as fast or elegant as Julia "

The part about python not being as fast as Julia jumped at me. Wes McKinney's benchmarks show that python is faster than Julia for numerics: http://wesmckinney.com/blog/?p=475

EDIT: should not have said "python faster than Julia". They are comparable because the slow bits get done in BLAS anyway.

StefanKarpinski13y ago

A couple of nits...

3 more replies

timtadh13y ago

pashields13y ago

Not after a small function change. See Wes' remarks in comments: http://wesmckinney.com/blog/?p=475#comment-521757980

greyboy13y ago

It also doesn't "cost" anywhere near MATLAB. (I couldn't check the price of MATLAB as you have to have/create an account).

keypusher13y ago

Even then it wouldn't be correct. MATLAB is most often used in lab/academic settings where licenses are bought in bulk. Even students get a discount.

1 more reply

dbecker13y ago· 5 in thread

When introducing python, the author writes "Despite the obvious advantages of MATLAB, R, and Julia, it’s also always worth considering what a general-purpose language can bring to the table."

Even with thousands of hours of experience in Matlab, R and Python... I'm not sure what "obvious advantage" Matlab and R share over Python.

chubot13y ago

Mainly it's the immediacy of Matlab and R, and the libraries. I've used all 3 and consider Python my main and favorite programming language.

Probably the bigger issue, as mentioned above, is that R has higher-level stuff like time series libraries that Python doesn't.

rcthompson13y ago

tomrod13y ago

For me, depends on how new a person is to the language. Numpy certainly has a learning curve after spending a long time in Matlab.

dbecker13y ago

I hope the "obvious advantage" the author is speaking of isn't "it takes a while to learn numpy if you are used to Matlab."

That would be a pretty weak argument in my opinion.

1 more reply

darkarmani13y ago

Especially considering the author was using octave and not Matlab.

scottfr13y ago· 4 in thread

Personally I'm in love with R's data.frame. It allows very concise, robust and elegant manipulation and subsetting of a data set.

I wish every language would have such a built-in object type, I definitely feel its loss when I manipulate data in other languages such as Javascript or Mathematica.

dj_axl13y ago

> Personally I'm in love with R's data.frame. It allows very concise, robust and elegant manipulation and subsetting of a data set.

minimax13y ago

What dataframe operations do you find to be slow? Usually I'm able to get huge performance wins by rewriting my slow R code in a loop free way (*apply and friends).

oddthink13y ago

If you haven't tried it, the data.table package is great. Mostly like a data.frame (in fact a subclass), but fast.

beagle313y ago

Python acquired that capability with Pandas in the last two years. Same nomenclature, tens of times faster.

travisoliphant13y ago· 3 in thread

xaa13y ago

> The great work of the SciPy community has enabled Python to be used for all of the things that Matlab, R, and Julia can do.

dbecker13y ago

Though Python doesn't have the library support that R has, it far exceeds what's available in Julie (and,depending what you are looking for, in Matlab as well)

2 more replies

achompas13y ago

pav3l13y ago· 2 in thread

Here is a nice 4-year old still active discussion on pros and cons of different data analysis technologies: http://brenocon.com/blog/2009/02/comparison-of-data-analysis...

tomrod13y ago

HellMcFly--you're hellbanned so I couldn't reply directly. What is MPlus?

pav3l13y ago

myspy13y ago· 1 in thread

I have to create figures with Matlab and that's a pain in the ass. Changing XTickLabels, kills another part of the figure, and in general it's very hard to do a little more with figures.

But the basic data analysis is fine. The IDE has awful code completion and lacks more refinement in the editor.

keypusher13y ago

Python has a wrapper for matlab-style graphs called matplotlib, if you are interested in something else.

elchief13y ago· 1 in thread

Everyone loves to shit all over Java, but Mahout, RapidMiner, Weka, Hive, HBase are all written in it.

lsiebert13y ago

I've used Weka, and RapidMiner once. As I recall, RapidMiner seemed to be general purpose, but lots of posts were for using it for data mining stock data to build a model.

tvladeck13y ago· 1 in thread

Thought I'd ask since I'm learning Clojure - are there experiences worth sharing re: using Incanter in these types of settings?

paulbunn13y ago

agentq13y ago· 1 in thread

no love for J?

zem13y ago

J is more of a general-purpose array language than one specialised for numerical work.

lsiebert13y ago

You know what will be popular? whatever runs reasonably fast and helps you import and clean data quickly from a variety of sources.

Because the analysis is often the quickest part of being a data scientist. Coursera, as I recall, apparently cleans it's data, and also lets you easily import it.

You looking at birth years? I recall dealing with them in SAS... remember if it's four digit that you check for births occurring in the current and past century.

Oh, and often the speed of your analysis running is inversely correlated with how easy it is to code and enter your data. There is a reason people use SAS, and it's not because of it's amazing IDE.

StefanKarpinski13y ago

This is a really excellent and well-balanced article. Very much captures the pluses and minuses of these various systems for data analysis.

rcthompson13y ago

[1] http://opencobra.sourceforge.net/openCOBRA/Welcome.html

prakashk13y ago

Perl was mentioned in the article, but PDL (Perl Data Language) wasn't.

https://metacpan.org/module/PDL

PDL is the Perl Data Language, a perl extension that [...] includes fully vectorized, multidimensional array handling, plus several paths for device-independent graphics output.

For integration with R, there are Statistics::R (https://metacpan.org/module/Statistics::R) and Statistics::useR (https://metacpan.org/module/Statistics::useR)

zem13y ago

surprising omission at the end - any mention of scipy should at least include a pointer to sage as well.

j / k navigate · click thread line to collapse