PEP 450: Adding A Statistics Module To The Standard Library (opens in new tab)

(python.org)

185 pointspetsos12y ago83 comments

83 comments

53 comments · 15 top-level

cabalamat12y ago· 6 in thread

> For many people, installing numpy may be difficult or impossible. For example, people in corporate environments may have to go through a difficult, time-consuming process before being permitted to install third-party software.

I do not regard this as a good justification for putting something in the standard library! If you don't have root access, use vitualenv (which you might want to do anyway) and install the package somewhere under your home directory.

rkangel12y ago

Never under-estimate how much more helpful it can be to have something in a standard library. I do a lot of embedded development, and we have Python scripts all over the place on various embedded targets to perform all sorts of functions. Often it is incredibly inconvenient, expensive or even impossible to install extra packages on those machines, due to lack of permissions, lack of bandwidth, lack of time, or even a read only form of disk to name but a few reasons.

If your work involves a lot of scientific computing then yes, you're going to need numpy but if you're just updating an existing script that's doing some performance monitoring, having an accurate version of mean and standard deviation available seems like a great idea.

Python attempts to be batteries included, this is part of its philosophy and one of the reasons for its popularity. I'm glad this is being extended into a new area.

siddboots12y ago

NumPy is not nearly portable enough to do what you are describing as a user on a windows machine. You cannot simply do a `pip install numpy` into a virtualenv. Instead you must either install the package system-wide, or compile it yourself, which means getting a working MinGW environment or similar.

Edit: Although, I do agree that NumPy being difficult to install is not, on its own, a good justification for the PEP.

cdavid12y ago

numpy is quite portable, I am not sure what you mean by not nearly portable enough.

The reason why you can't do pip install numpy is pip's fault, there is nothing that numpy can do to make that work. Note that easy_install numpy does work on windows (without the need for a C compiler).

2 more replies

nknighthb12y ago

Having the technical capability to do something is not the same as having permission to do it. The PEP refers to corporate policies. Installing third-party software in your home directory would be just as much a violation as doing it in a location that requires root.

jimmytucson12y ago

Son, I hope you never have the displeasure of working in an enterprise environment.

First of all, as others mentioned, your personal machine runs on Windows so right off the bat there goes your instant virtualenv, pip install.

Even on the Unix app server, you're probably behind a firewall that's tight as a duck's ass so chances are you're downloading the tar ball and making the package yourself.

Third, wtf are you doing littering the app server with all these binaries? And what is Python? I'm sorry, no, rewrite this in Java, please.

If you do manage to convince your manager and the rest of your team that Python is not black magic, the first time numpy breaks or some small issue crops up or you have to migrate to a new server and reinstall numpy but now there's a new version and... GTFO of here with that black magic.

I agree with you, developing websites on your MacBook Pro there's no excuse not being able to install numpy. In the real world though, having basic necessities in stdlib does absolute wonders.

dagw12y ago

Not everybody using python is writing freestanding software for machines they control. Lots of software for example ship with a python install for scripting, automation or writing plugins for example.

bachback12y ago· 6 in thread

Nice proposal. I think the problem is numpy itself. If you could just do pip install numeric_package then nobody can complain. I don't quite understand why a package has to depend on LINPACK. I will probably switch to julia-lang, because numpy is (at least for me) not that great to work with.

stiff12y ago

NumPy is a full MAT-LAB for Python, not a simple drop-in statistics library. It has to depend on LINPACK because writing a full linear algebra library that performs well is damn hard and takes several researcher-years. Most serious scientific computing libraries and utilities depend on it, including Julia I think. There is certainly room for simpler libraries for people not seriously into numeric computations, as the document linked well indicates.

cdavid12y ago

Numpy does not depend on LINPACK, only scipy does (numpy only uses blas if you have one installed, it is optional).

The reason why scipy (and Julia BTW) need blas/lapack is because that's the only way to have decent performance and reasonably accurate linear algebra. The alternative is writing your own implementation of something that has been used and debugged for 30 years, which does not seem like a good idea.

bachback12y ago

This is what I don't understand at all. Imagine somebody in a different area of computing would say: oh, we solved that 30 years ago and now there is no room for improvement at all? why can't this be done at least in C?

1 more reply

omaranto12y ago

You'll be disappointed to learn Julia depends on BLAS, LAPACK and librmath. That shouldn't dissuade you from trying it, though: is a pretty cool language.

mvanveen12y ago

numpy has all sorts of awful C bindings which make it less than versatile in environments where you want pure Python. It's great from a performance point of view, but horrible for compatibility.

Google App Engine used to suffer because of this (more specifically, it still only restricts your runtime to pure Python, but now you can import numpy at least). I believe the PyPy folks have also had their own set of struggles with numpy compatibility, although I'm not sure what the state of that is at present.

In any case, I think these compatibility concerns alone make a strong argument for including simple Statistics tooling into the standard library.

asgard102412y ago

How does that work for SQLite 3, which _is_ part of Python library?

I would actually prefer to have numpy included before those statistics functions.

3 more replies

fiatmoney12y ago· 5 in thread

It's not a terrible idea to support the absolute basics like mean & variance, but anything beyond that (particularly things like models or tests) is not a good idea for a standard library. Once you hit even something simple like a linear regression you have issues of how to represent missing or discrete variables, handling colinearity, or whether to do online or batch modes which can give different results. Tests in particular are fraught because if you're going to make them available for general consumption they need a good explanation of when they're appropriate, which is basically a semester course in statistics and well out of scope for standard library docs.

Basically, the idea of "batteries included" should also mean that if something looks like you can put a D-cell in there, you're unlikely to blow your arm off.

dalke12y ago

The PEP suggests that the functionality would be comparable to that in a high school calculator or in Excel/LibreOffice/Gnumeric. The existence of that functionality suggests to me that a stats package can be useful even if it doesn't handle things like missing data.

Similarly, Excel/etc. support these functions without a "semester course in statistics." Instead, you'll find that there are many web pages from semester courses in statistics which end up teaching how to use Excel. The same would no doubt happen with Python.

I don't why a statistics standard library module needs to provide a "good explanation of when they're appropriate" to a higher standard than any other module. Python provides trigonometric and hyperbolic functions without teaching trigonometry. It provides complex numbers and cmath without teaching people about complex numbers. It provides several different random distribution functions without teaching anything about Pareto, Weibull, or von Mises distributions.

For that matter, data structures is a semester course as well, but the Python documentation doesn't teach those differences in its documentation of deque, stack, hash table, etc., nor describe algorithms like heapq and bisect.

"whether to do online or batch modes which can give different results". The PEP says it will prefer batch mode:

      Concentrate on data in sequences, allowing two-passes over the data,
      rather than potentially compromise on accuracy for the sake of a one-pass
      algorithm

Someone12y ago

By that logic, you should get rid of about everything in standard libraries. A simple RNG? Anything related to floating point? Multi-threading? Character encodings? Unicode normalization? Time and date handling? All fraught with danger for those who do not know, and "basically a semester course and well out of scope for standard library docs".

Surely, it would be better to supply good implementations of algorithms rather than refrain from doing that, and letting programmers write and use bad ones instead?

IMO, the discussion should be about what c/should end up in the _standard_ library, and what is better put in a separate product/download.

daniel-levin12y ago

Agreed. I'm studying statistics at the moment and I'm continually reminded of how easy it is to choose the wrong model / distribution and be incorrect because of some non-obvious and technical reason. For example, just the other day, I wanted to use the binomial distribution to solve a problem. To use this distribution, the trials must be independent of one another. In that particular problem, there was a subtle condition that made the trials non-independent. I arrived at correct-appearing answers (0 <= P <= 1) that were actually all wrong. Statistics is way too easy to break to be used naively.

lutusp12y ago

> Statistics is way too easy to break to be used naively.

Fair enough, but the same argument could be made about using an unskewed standard distribution on non-symmetrical datasets, a common error even among people who should know better.

I think binomial functions should be included, on the ground that they're very useful and their probability of misuse is only equal to the continuous statistical forms, not more so.

1 more reply

andrewflnr12y ago

  ...how to represent missing or discrete variables...

Don't. Just say no. Just give me the simple easy stuff. Most of us will be fine, and everyone else will know they need something better and won't bother.

ot12y ago· 5 in thread

Just out of curiosity, I submitted this yesterday:

https://news.ycombinator.com/item?id=6190603

The URL was

    http://www.python.org/dev/peps/pep-0450/

While this is

    http://www.python.org/dev/peps/pep-0450

That is, exactly the same except for a trailing slash. Doesn't the deduplication algorithm handle this case?

daGrevis12y ago

Technically speaking, they are separate URLs that may lead to separate resources. For example, Google engine treats them as separate URLs.

That's the reason why opening http://www.python.org/dev/peps/pep-0450 redirects to http://www.python.org/dev/peps/pep-0450/ . HN engine should follow redirect to avoid situations like this.

ot12y ago

Technically speaking, there are no equivalent URLs in general, different strings may lead to different resources.

Still, there are a number of common sense heuristics to normalize URLs, that HN applies to do de-duplication. I was wondering what is the rationale for not having trailing slash removal among them. I mean, is there any legitimate website that serves a different resource if you remove the trailing slash?

3 more replies

alexholehouse12y ago

I posted an ASK PG [1] about this last month ago and got an angry email from (presumably) a mod. His point was totally valid - that PG 'aint go time fo' that' which is true, but I was just hoping to draw attention to it, as opposed to demand PG drop what he's doing right now and fix it. To be honest, I was quite surprised by the tone of the email.

[1] https://news.ycombinator.com/item?id=5908075

pg12y ago

The email wasn't from one of us.

coldtea12y ago

Why should it?

For one, it provides the welcome ability to bring topics up in Hacker News again, where they might get accepted better the second or third time (e.g because more people are online at the time of the second submission).

If the "deduplication algorithm" had "handled this case", then we would only be left with the first submission (a dead discussion), whereas as it is, HN users have now caught on to this PEP news and we have a discussion going on.

clutchski12y ago· 4 in thread

Batteries included is a fine philosophy when starting a language to encourage early adoption, but at this point, I don't think it's worth adding new libraries to the stdlib. Here's why:

- It's very easy to find and install third party modules

- Once a library is added to stdlib, the API is essentially frozen. This means we can end up stuck with less than ideal APIs (shutil/os, urllib2/urrlib, etc) or Guido & co are stuck in a time consuming PEP/deprecate/delete loop for even minor API improvements.

- libraries outside of the stdlib are free to evolve. users of those libraries who don't want to stay on the bleeding edge are free to stay on old versions.

__e12y ago

The PEP acknowledges the existence of high-end statistics libraries. It also notes that the alternative to such libraries are DIY implementations - which are often incorrect in their implementation.

The PEP proposes adding simple, but correct support for statistics.

Apart from high-end libraries being an overkill and DIY implementations being incorrect, the PEP also cites resistance to third party software in corporate environments. This problem is more social than technical though, and I'm not sure what weight must be attached to it

pmr_12y ago

There is a different perspective on the frozen APIs. Using an API from the stdlib gives you the certainty that your program is not going to break with a minor python version bump. This might not matter for all software but is crucial for others.

dalke12y ago

Those are good reasons for rejecting any addition to the standard library. However, new libraries are sometimes added to the standard library, which means the reasons you listed can be overcome by even better reasons for inclusion.

What are those reasons for why a new library can be included, and why aren't those reasons appropriate justification for including this proposed statistics package?

Peaker12y ago

> However, new libraries are sometimes added to the standard library, which means the reasons you listed can be overcome by even better reasons for inclusion.

Or overcome by bad judgement.

tvst12y ago· 4 in thread

No mention of Pandas?

http://pandas.pydata.org/

goronbjorn12y ago

They probably left out pandas because it depends on numpy and also this point:

> For many people, installing numpy may be difficult or impossible.

that's as true, and arguably more, for pandas.

maaku12y ago

Although the solution, imho, is to make numpy easier to install across many platforms (ideally `pip install numpy pandas` should just work).

1 more reply

yati12y ago

I think Pandas is a great candidate for inclusion in the stdlib if this ever happens - and hopefully, numpy/parts of scipy will also be thrown in :)

lifeisstillgood12y ago

I think the idea is to include a small independent stats package not a full featured still developing third party. Any number of people need std dev easily and reliably available. If you need numpy on top of that, you know you do and can afford the effort.

For 99% of my work numpy and the associated compilation overhead is unneeded - fits my brain, fits my needs

1 more reply

lutusp12y ago· 3 in thread

Great idea, but while assembling this library, don't leave out permutations, combinations, and the binomial Probability Mass Function (PMF) and Cumulative Distribution Function (CDF). Small overhead, easy to implement, very useful. More here:

http://arachnoid.com/binomial_probability

enalicho12y ago

Permutations and combinations already exist within the itertools module.

megrimlock12y ago

There is something deep about software that thwarts our dreams of making it re-usable, and makes debates about what primitives to put into the stdlib difficult. For one thing, it seems like we should be able to collectively determine a DAG of concepts, starting from simplest primitives and progressively building up derived concepts, like the tech tree in a game like Civilization, or Principia Mathematica. But that tree is so vast, and the most-useful points so sparse along it (ex: I don't think anyone's asking for Church numerals in their stdlib) that we end up debating which are the most-useful derived concepts to surface. Moreover, there's no language or compiler that is sufficiently smart to efficiently map all those concepts to the hardware we have, we instead go with expertly-wrought implementations.

To take this example, you _could_ count enumerations and permutations by passing a range(n) list to itertools and then counting how many actual results you get back, but that's silly when you could also just use the binomial theorem to get there directly. A compiler that could generally perform such transformations would be miraculous -- well beyond the territory of automated proof assistants like mathematica or gcc -O3 that trundle along cultivated routes of expert system rules, into the realm of actually discovering deep linkages at the frontier of our knowledge.

Until then it seems like stdlibs will just fracture along lines of strain among the userbase. Presumably, most Python users don't need anything beyond what a financial calculator would provide, and anyone else should head to numpy.

1 more reply

lutusp12y ago

> Permutations and combinations already exist within the itertools module.

Not exactly. Given argument lists, Itertools provides result lists (actually, iterators for that purpose) with the original elements permuted and combined, but doesn't provide numerical results for numerical arguments, as shown here: http://arachnoid.com/binomial_probability

I was referring to permutation and combination mathematical functions, not generator functions.

2 more replies

aristus12y ago· 3 in thread

About damned time. Writing your own stats library is like writing your own crypto.

aidos12y ago

You wouldn't write your own - numpy / scipy have everything you'll need.

dalke12y ago

Some of the things I need are:

- fewer dependencies for my package

I've written the average() and standard_deviation() functions at least a couple of dozen times, because it doesn't make sense to require numpy in order to summarize, say, benchmark timing results.

- reduced import time

NumPy and SciPy were designed with math-heavy users in mind, who start Python once and either work in the REPL for hours or run non-trivial programs. It was not designed for light-weight use in command-line scripts.

"import scipy.stats" takes 0.25 second on my laptop. In part because it brings in 439 new modules to sys.modules. That's crazy-mad for someone who just wants to compute, say, a Student's t-test, when the implementation of that test is only a few dozen lines long. (Partially because it depends on a stddev() as well.)

Sure, 0.25 seconds isn't all that long, but that's also on a fast local disk. In one networked filesystem I worked with (Lustre), the stat calls were so slow that just starting python took over a second. We fixed that by switching to zip import of the Python standard library and deferring imports unless they were needed, but there's no simple solution like that for SciPy.

- less confusing docstring/help

Suppose you read in the documentation that scipy.stats.t implements the Student's t-test as scipy.stats.t.

    >>> import scipy.stats
    >>> scipy.stats.t
    <scipy.stats.distributions.t_gen object at 0x108f87390>

It's a bit confusing to see scipy.stats.distributions.t_gen appear, but okay, it's some implementation thing.

Then you do help(scipy.stats.t) and see

    Help on t_gen in module scipy.stats.distributions object:
    
    class t_gen(rv_continuous)
     |  A Student's T continuous random variable.
     |  
     |  %(before_notes)s
     |  
        ...
     |  
     |  %(example)s

Huh?! What's %(before nodes)s and %(example)s?

The answer is, scipy.stats auto-generates various of the distribution functions, including things like docstrings. Only, help() gets confused about that because help() uses the class docstring while SciPy modifies the generator instance's docstring. Instead, to see the correct docstring you have to do it directly:

    >>> print scipy.stats.t.__doc__
    A Student's T continuous random variable.
    
        Continuous random variables are defined from a standard form and may
        require some shape parameters to complete its specification.  Any
        optional keyword parameters can be passed to the methods of the RV
        object as given below:

1 more reply

coldtea12y ago

Well, one of the things I need is for it to be built in the standard library, so not, numby/scipy doesn't have everything I need.

rev12y ago· 2 in thread

Kudos to PHP for apparently being ahead of the curve among dynamic languages with regard to statistics. Another interesting, yet unmentioned option is Clojure/Incanter.

scribu12y ago

I upvoted this comment, then I realized that those PHP stats functions aren't in the standard library. They're in a PECL extension (equivalent to Python C extensions).

draegtun12y ago

And another option is PDL - http://pdl.perl.org

NB. And I believe Perl6 (spec) includes PDL - http://perlcabal.org/syn/S09.html#PDL_support

zokier12y ago

Reminds me of the story that made rounds here couple of years ago: The Python Standard Library - Where Modules Go To Die

https://news.ycombinator.com/item?id=3913182

andrewflnr12y ago

I'm in favor. I was surprised and annoyed to find there wasn't a standard library for doing excel-level statistics. If you throw basic least-squares linear regression in there too, I can eliminate Excel from my physics classes.

bthomas12y ago

One side effect is that this would accelerate the adoption of Python 3 in the scientific community

bayesianhorse12y ago

I'm against this. Either you have to create a new statistics module or you would have to include numpy/pandas/statsmodels into the standard library. In both cases it would essentially freeze the modules for further development outside the python release cycle...

Demiurge12y ago

I would like having these simple functions, but I think they can just go into 'math' library.

matiasb12y ago

Nice idea

j / k navigate · click thread line to collapse

83 comments

53 comments · 15 top-level

cabalamat12y ago· 6 in thread

rkangel12y ago

Python attempts to be batteries included, this is part of its philosophy and one of the reasons for its popularity. I'm glad this is being extended into a new area.

siddboots12y ago

Edit: Although, I do agree that NumPy being difficult to install is not, on its own, a good justification for the PEP.

cdavid12y ago

numpy is quite portable, I am not sure what you mean by not nearly portable enough.

2 more replies

nknighthb12y ago

jimmytucson12y ago

Son, I hope you never have the displeasure of working in an enterprise environment.

First of all, as others mentioned, your personal machine runs on Windows so right off the bat there goes your instant virtualenv, pip install.

Even on the Unix app server, you're probably behind a firewall that's tight as a duck's ass so chances are you're downloading the tar ball and making the package yourself.

Third, wtf are you doing littering the app server with all these binaries? And what is Python? I'm sorry, no, rewrite this in Java, please.

I agree with you, developing websites on your MacBook Pro there's no excuse not being able to install numpy. In the real world though, having basic necessities in stdlib does absolute wonders.

dagw12y ago

bachback12y ago· 6 in thread

stiff12y ago

cdavid12y ago

Numpy does not depend on LINPACK, only scipy does (numpy only uses blas if you have one installed, it is optional).

bachback12y ago

1 more reply

omaranto12y ago

You'll be disappointed to learn Julia depends on BLAS, LAPACK and librmath. That shouldn't dissuade you from trying it, though: is a pretty cool language.

mvanveen12y ago

numpy has all sorts of awful C bindings which make it less than versatile in environments where you want pure Python. It's great from a performance point of view, but horrible for compatibility.

In any case, I think these compatibility concerns alone make a strong argument for including simple Statistics tooling into the standard library.

asgard102412y ago

How does that work for SQLite 3, which _is_ part of Python library?

I would actually prefer to have numpy included before those statistics functions.

3 more replies

fiatmoney12y ago· 5 in thread

Basically, the idea of "batteries included" should also mean that if something looks like you can put a D-cell in there, you're unlikely to blow your arm off.

dalke12y ago

"whether to do online or batch modes which can give different results". The PEP says it will prefer batch mode:

      Concentrate on data in sequences, allowing two-passes over the data,
      rather than potentially compromise on accuracy for the sake of a one-pass
      algorithm

Someone12y ago

Surely, it would be better to supply good implementations of algorithms rather than refrain from doing that, and letting programmers write and use bad ones instead?

IMO, the discussion should be about what c/should end up in the _standard_ library, and what is better put in a separate product/download.

daniel-levin12y ago

lutusp12y ago

> Statistics is way too easy to break to be used naively.

Fair enough, but the same argument could be made about using an unskewed standard distribution on non-symmetrical datasets, a common error even among people who should know better.

I think binomial functions should be included, on the ground that they're very useful and their probability of misuse is only equal to the continuous statistical forms, not more so.

1 more reply

andrewflnr12y ago

  ...how to represent missing or discrete variables...

Don't. Just say no. Just give me the simple easy stuff. Most of us will be fine, and everyone else will know they need something better and won't bother.

ot12y ago· 5 in thread

Just out of curiosity, I submitted this yesterday:

https://news.ycombinator.com/item?id=6190603

The URL was

    http://www.python.org/dev/peps/pep-0450/

While this is

    http://www.python.org/dev/peps/pep-0450

That is, exactly the same except for a trailing slash. Doesn't the deduplication algorithm handle this case?

daGrevis12y ago

Technically speaking, they are separate URLs that may lead to separate resources. For example, Google engine treats them as separate URLs.

That's the reason why opening http://www.python.org/dev/peps/pep-0450 redirects to http://www.python.org/dev/peps/pep-0450/ . HN engine should follow redirect to avoid situations like this.

ot12y ago

Technically speaking, there are no equivalent URLs in general, different strings may lead to different resources.

3 more replies

alexholehouse12y ago

[1] https://news.ycombinator.com/item?id=5908075

pg12y ago

The email wasn't from one of us.

coldtea12y ago

Why should it?

clutchski12y ago· 4 in thread

Batteries included is a fine philosophy when starting a language to encourage early adoption, but at this point, I don't think it's worth adding new libraries to the stdlib. Here's why:

- It's very easy to find and install third party modules

- libraries outside of the stdlib are free to evolve. users of those libraries who don't want to stay on the bleeding edge are free to stay on old versions.

__e12y ago

The PEP acknowledges the existence of high-end statistics libraries. It also notes that the alternative to such libraries are DIY implementations - which are often incorrect in their implementation.

The PEP proposes adding simple, but correct support for statistics.

pmr_12y ago

dalke12y ago

What are those reasons for why a new library can be included, and why aren't those reasons appropriate justification for including this proposed statistics package?

Peaker12y ago

> However, new libraries are sometimes added to the standard library, which means the reasons you listed can be overcome by even better reasons for inclusion.

Or overcome by bad judgement.

tvst12y ago· 4 in thread

No mention of Pandas?

http://pandas.pydata.org/

goronbjorn12y ago

They probably left out pandas because it depends on numpy and also this point:

> For many people, installing numpy may be difficult or impossible.

that's as true, and arguably more, for pandas.

maaku12y ago

Although the solution, imho, is to make numpy easier to install across many platforms (ideally `pip install numpy pandas` should just work).

1 more reply

yati12y ago

I think Pandas is a great candidate for inclusion in the stdlib if this ever happens - and hopefully, numpy/parts of scipy will also be thrown in :)

lifeisstillgood12y ago

For 99% of my work numpy and the associated compilation overhead is unneeded - fits my brain, fits my needs

1 more reply

lutusp12y ago· 3 in thread

http://arachnoid.com/binomial_probability

enalicho12y ago

Permutations and combinations already exist within the itertools module.

megrimlock12y ago

1 more reply

lutusp12y ago

> Permutations and combinations already exist within the itertools module.

I was referring to permutation and combination mathematical functions, not generator functions.

2 more replies

aristus12y ago· 3 in thread

About damned time. Writing your own stats library is like writing your own crypto.

aidos12y ago

You wouldn't write your own - numpy / scipy have everything you'll need.

dalke12y ago

Some of the things I need are:

- fewer dependencies for my package

I've written the average() and standard_deviation() functions at least a couple of dozen times, because it doesn't make sense to require numpy in order to summarize, say, benchmark timing results.

- reduced import time

- less confusing docstring/help

Suppose you read in the documentation that scipy.stats.t implements the Student's t-test as scipy.stats.t.

    >>> import scipy.stats
    >>> scipy.stats.t
    <scipy.stats.distributions.t_gen object at 0x108f87390>

It's a bit confusing to see scipy.stats.distributions.t_gen appear, but okay, it's some implementation thing.

Then you do help(scipy.stats.t) and see

    Help on t_gen in module scipy.stats.distributions object:
    
    class t_gen(rv_continuous)
     |  A Student's T continuous random variable.
     |  
     |  %(before_notes)s
     |  
        ...
     |  
     |  %(example)s

Huh?! What's %(before nodes)s and %(example)s?

    >>> print scipy.stats.t.__doc__
    A Student's T continuous random variable.
    
        Continuous random variables are defined from a standard form and may
        require some shape parameters to complete its specification.  Any
        optional keyword parameters can be passed to the methods of the RV
        object as given below:

1 more reply

coldtea12y ago

Well, one of the things I need is for it to be built in the standard library, so not, numby/scipy doesn't have everything I need.

rev12y ago· 2 in thread

Kudos to PHP for apparently being ahead of the curve among dynamic languages with regard to statistics. Another interesting, yet unmentioned option is Clojure/Incanter.

scribu12y ago

I upvoted this comment, then I realized that those PHP stats functions aren't in the standard library. They're in a PECL extension (equivalent to Python C extensions).

draegtun12y ago

And another option is PDL - http://pdl.perl.org

NB. And I believe Perl6 (spec) includes PDL - http://perlcabal.org/syn/S09.html#PDL_support

zokier12y ago

Reminds me of the story that made rounds here couple of years ago: The Python Standard Library - Where Modules Go To Die

https://news.ycombinator.com/item?id=3913182

andrewflnr12y ago

bthomas12y ago

One side effect is that this would accelerate the adoption of Python 3 in the scientific community

bayesianhorse12y ago

Demiurge12y ago

I would like having these simple functions, but I think they can just go into 'math' library.

matiasb12y ago

Nice idea

j / k navigate · click thread line to collapse