Show HN: Python Tests That Write Themselves (opens in new tab)

(timothycrosley.github.io)

131 pointstimothycrosley6y ago37 comments

37 comments

32 comments · 10 top-level

mrfusion6y ago· 10 in thread

This actually gave me another idea. What do you all think, I’d be up for trying to build it.

It would watch your program during execution and record each function calls input and output. And then create tests for each function using those inputs and outputs.

You could always go over the created tests manually and fix them or review them but if nothing else it could be a good start.

OJFord6y ago

Instagram created 'MonkeyType' to do something similar for type annotations.

A lower barrier way of trying this out could be to run MonkeyType in combination with OP's hypothesis-auto (which uses type annotations to generate the tests).

westurner6y ago

pytype (Google) [1], PyAnnotate (Dropbox) [2], and MonkeyType (Instagram) [3] all do dynamic / runtime PEP-484 type annotation type inference [4]

[1] https://github.com/google/pytype

[2] https://github.com/dropbox/pyannotate

[3] https://github.com/Instagram/MonkeyType

[4] https://news.ycombinator.com/item?id=19454411

knubie6y ago

I came across something similar for Clojure recently. https://github.com/ahungry/determinism/

anonytrary6y ago

This sounds like snapshot testing. It is the responsibility of the person to browse the produced snapshots to ensure they are correct.

rcap6y ago

This is basically what they call snapshot testing

noobiemcfoob6y ago

I really like this idea. So, broken out, would the development process look like: 1) Write code 2) Open interpreter (ipython) 3) Import code and call it manually 4) This magic library will have observed and created a testcase module from your execution

If the library can use the observed and recorded testcase to extrapolate far beyond your manual call, this could be a really interesting way of testing python code.

ACow_Adonis6y ago

This is actually the idea behind a little library I developed for my own use in common lisp called testy.

Because the common lisp environment already defines special variables that contain the last form evaluated and the return value of the last form evaluated, when you evaluate code at the REPL and it succeeds, you call the (testy:test) function, which captures them and wraps them up as a test for the package you're working on, adding it to the database of tests for the package.

Not only was it useful for development (no need to specify your tests up front, you write your code until it works, then that state "is" the test), but it was also helpful for fixing bugs that you hadn't thought of up front (you fix the bug and then that state is captured as a new test as well).

ryall6y ago

I remember finding something like this a few years ago for testing rest apis. It was very cool but for the life of me I can't recall the name :(

Izkata6y ago

Possibly vcr [0], except that's for mocking out APIs in unit tests. There's a whole bunch of ports to other languages midway down the page.

[0] https://github.com/vcr/vcr

timothycrosleyOP6y ago

I'd use it!

ebg136y ago· 3 in thread

Does this do anything other than assert that functions return the right basic type given extremely basic inputs?

Is naive type testing a thing that people actually bother doing vs testing that the functions actually do what they're supposed to?

If a function takes two ints and returns an int that is the integer division of the two, is the evaluation of exceptions and returned types not already implicit in the evaluation of the actual results?

If you know that divide(a, b) should return c for sufficient candidates a, b, and c, then you _know_ that divide returns the right type without explicitly checking. And knowing that the divide function happens to return ints when given ints doesn't actually tell you that it's doing anything even close to the right behavior. So this both doesn't reduce the number of tests you need to write and is also obsoleted by actually writing the tests that you need.

progval6y ago

It's also useful to check the function doesn't crash.

I use hypothesis to test a DB abstraction, and it often caught bugs in edge cases I didn't consider: timestamps/timezones too large, strings containing null bytes, etc.

These would be caught by the tests generated by hypothesis-auto.

timothycrosleyOP6y ago

By default, the biggest value from this is catching unexpected runtime exceptions with certain values (the edge cases you don't think of yourself). To get the most value out of it, you can further specify what you expect from the result using the _auto_verify=Callable() parameter, or by looking at the individual test cases.

Simple example:

  from my_library import add
  from hypothesis_auto import auto_pytest

  @auto_pytest()
  def test_add(test_case):
     add_result = test_case() 
     if test_case.params.kwargs['number_1'] > 0 and test_case.params.kwargs['number_1']:
         assert add_result > test_case.params.kwargs['number_1']
         assert add_result > test_case.params.kwargs['number_1']

ebg136y ago

> By default, the biggest value from this is catching unexpected runtime exceptions with certain values (the edge cases you don't think of yourself).

So it's a fuzzer?

1 more reply

rcfox6y ago· 2 in thread

I like the idea, but I think I would be more comfortable with it generating test files that I could keep beside my other tests.

Also, `auto_pytest_magic` doesn't seem to exist.

    ImportError: cannot import name 'auto_pytest_magic' from 'hypothesis_auto' (/home/.../.local/lib/python3.7/site-packages/hypothesis_auto/__init__.py)

timothycrosleyOP6y ago

> I think I would be more comfortable with it generating test files that I could keep beside my other tests.

This is a very interesting idea I might explore more!

`auto_pytest_magic` is only made available if you have pytest installed. You can enforce this by doing:

  pip3 install -U hypothesis-auto[pytest]

rcfox6y ago

Oh man, for some reason I thought pytest was a synonym for Python's unittest module.

hchasestevens6y ago· 2 in thread

This is neatly packaged, but it's not immediately clear what advantages it has over Hypothesis' native offerings for accomplishing this? ( https://hypothesis.readthedocs.io/en/latest/details.html#inf... and https://hypothesis.readthedocs.io/en/latest/data.html#hypoth... )

timothycrosleyOP6y ago

It's an extension for hypothesis, and its value is 100% just convenience and accessibility. Really, it's meant to be a sort of gateway to doing full property-based testing, in the least barrier way possible. I wrote a bit more about why I created it here: https://timothycrosley.com/project-5-hypothesis-auto

OJFord6y ago

Hypothesis' `infer` is inferring arguments to the test fn, that you'd then use to drive the fn under test.

OP's extension is inferring arguments to the fn under test, and then generating a test like 'assert isinstancee(fn_under_test(a, *kw), expected_ret_type)` to go with them.

You can do more types of test with `infer` (i.e. not just 'does return' and 'returns correct type' but also 'returns correct value'), but this is an easy way to cover a lot of basic return checking ground.

timothycrosleyOP6y ago· 2 in thread

Thoughts behind project creation live here: https://timothycrosley.com/project-5-hypothesis-auto

djaychela6y ago

Thanks for the write up - explains a lot (including what hypothesis is).

I think there's a typo in the last code box? It looks like you repeated "from hypothesis_auto import"? (that, or I understand even less python that I thought!)

timothycrosleyOP6y ago

You are correct! Thanks for catching this! I'll fix the example later today

iandanforth6y ago· 2 in thread

I like this idea. One piece of feedback, a parameter with a leading underscore feels very odd. In python I interpret leading underscores to indicate the programmer thinks of this as an internal / pseudo-private property. Exposing it through the api makes it "public" which means (to me) that it shouldn't have a leading underscore.

This is especially true if the usecase is common enough to put in the top level examples

timothycrosleyOP6y ago

While I agree, that `_param` generally means private, it is also a common way to allow parameters to be passed into a function where all other parameters are passed directly along via args and *kwargs to avoid naming collisions. An example of this is NamedTuple: https://docs.python.org/3/library/collections.html#namedtupl....

duckerude6y ago

I believe Raymond Hettinger now considers that a mistake, and wishes he had gone with a trailing underscore (param_) instead. A trailing underscore is just as unlikely to lead to a collision, but less confusing.

3 more replies

kburman6y ago· 1 in thread

Nice concept but does it work with real-world application. I failed to understand how will it work with methods like `authenticate_user(user)` or `load_permissions_from_db(user, db)`.

timothycrosleyOP6y ago

Hi @kburman,

For cases like that it would be best combined with a mock:

  @auto_pytest()
  def test_add(test_case, mocker):
      mocker.patch('db.call')
      test_case()

Note that this example utilizes the following pytest extension: https://github.com/pytest-dev/pytest-mock

However, I would also note, that just because you have some methods that have potentially dangerous side effects, in most large code-bases, not all functions do. Which is why it operates at a function by function basis. You can use this for new pure functions, while continuing to write other tests in the same fashion by hand.

ben5096y ago

On one job, I had to disable code coverage for a whole suite of tests that were simply making calls and completely ignoring the results.

I know, coverage is Yet Another Metric, but if you don't game it, it can help you track down branches you haven't written tests for.

So my hesitation is I can see people running this, gettting 100% code coverage and thinking, "hooray, it's fully tested!"

somada1416y ago

While I had really liked the idea of hypothesis in Python I found that the edge-cases it was uncovering were the ones that were obviously gonna break but at the same time cases I didn't care to guard against, e.g., using 3-mile long integers, or cases that wouldn't work with the underlying libraries eg NumPy. Thus, I found myself spending more time adding constraints on the generated inputs than fledging out my test-suite. So my adventures with hypothesis were short-lived.

I don't mean to detract from this library, I think its a great combination of strong-typing and property-based testing but has anyone had any experience employing property-based testing on complex functions outside of the whole add/subtract/multiple stuff? What kinda thing have you used it on?

StavrosK6y ago

This is great, thank you! Another great example of the benefits strong typing brings.

j / k navigate · click thread line to collapse

37 comments

32 comments · 10 top-level

mrfusion6y ago· 10 in thread

This actually gave me another idea. What do you all think, I’d be up for trying to build it.

It would watch your program during execution and record each function calls input and output. And then create tests for each function using those inputs and outputs.

You could always go over the created tests manually and fix them or review them but if nothing else it could be a good start.

OJFord6y ago

Instagram created 'MonkeyType' to do something similar for type annotations.

A lower barrier way of trying this out could be to run MonkeyType in combination with OP's hypothesis-auto (which uses type annotations to generate the tests).

westurner6y ago

pytype (Google) [1], PyAnnotate (Dropbox) [2], and MonkeyType (Instagram) [3] all do dynamic / runtime PEP-484 type annotation type inference [4]

[1] https://github.com/google/pytype

[2] https://github.com/dropbox/pyannotate

[3] https://github.com/Instagram/MonkeyType

[4] https://news.ycombinator.com/item?id=19454411

knubie6y ago

I came across something similar for Clojure recently. https://github.com/ahungry/determinism/

anonytrary6y ago

This sounds like snapshot testing. It is the responsibility of the person to browse the produced snapshots to ensure they are correct.

rcap6y ago

This is basically what they call snapshot testing

noobiemcfoob6y ago

If the library can use the observed and recorded testcase to extrapolate far beyond your manual call, this could be a really interesting way of testing python code.

ACow_Adonis6y ago

This is actually the idea behind a little library I developed for my own use in common lisp called testy.

ryall6y ago

I remember finding something like this a few years ago for testing rest apis. It was very cool but for the life of me I can't recall the name :(

Izkata6y ago

Possibly vcr [0], except that's for mocking out APIs in unit tests. There's a whole bunch of ports to other languages midway down the page.

[0] https://github.com/vcr/vcr

timothycrosleyOP6y ago

I'd use it!

ebg136y ago· 3 in thread

Does this do anything other than assert that functions return the right basic type given extremely basic inputs?

Is naive type testing a thing that people actually bother doing vs testing that the functions actually do what they're supposed to?

progval6y ago

It's also useful to check the function doesn't crash.

I use hypothesis to test a DB abstraction, and it often caught bugs in edge cases I didn't consider: timestamps/timezones too large, strings containing null bytes, etc.

These would be caught by the tests generated by hypothesis-auto.

timothycrosleyOP6y ago

Simple example:

  from my_library import add
  from hypothesis_auto import auto_pytest

  @auto_pytest()
  def test_add(test_case):
     add_result = test_case() 
     if test_case.params.kwargs['number_1'] > 0 and test_case.params.kwargs['number_1']:
         assert add_result > test_case.params.kwargs['number_1']
         assert add_result > test_case.params.kwargs['number_1']

ebg136y ago

> By default, the biggest value from this is catching unexpected runtime exceptions with certain values (the edge cases you don't think of yourself).

So it's a fuzzer?

1 more reply

rcfox6y ago· 2 in thread

I like the idea, but I think I would be more comfortable with it generating test files that I could keep beside my other tests.

Also, `auto_pytest_magic` doesn't seem to exist.

    ImportError: cannot import name 'auto_pytest_magic' from 'hypothesis_auto' (/home/.../.local/lib/python3.7/site-packages/hypothesis_auto/__init__.py)

timothycrosleyOP6y ago

> I think I would be more comfortable with it generating test files that I could keep beside my other tests.

This is a very interesting idea I might explore more!

`auto_pytest_magic` is only made available if you have pytest installed. You can enforce this by doing:

  pip3 install -U hypothesis-auto[pytest]

rcfox6y ago

Oh man, for some reason I thought pytest was a synonym for Python's unittest module.

hchasestevens6y ago· 2 in thread

timothycrosleyOP6y ago

OJFord6y ago

Hypothesis' `infer` is inferring arguments to the test fn, that you'd then use to drive the fn under test.

OP's extension is inferring arguments to the fn under test, and then generating a test like 'assert isinstancee(fn_under_test(a, *kw), expected_ret_type)` to go with them.

timothycrosleyOP6y ago· 2 in thread

Thoughts behind project creation live here: https://timothycrosley.com/project-5-hypothesis-auto

djaychela6y ago

Thanks for the write up - explains a lot (including what hypothesis is).

I think there's a typo in the last code box? It looks like you repeated "from hypothesis_auto import"? (that, or I understand even less python that I thought!)

timothycrosleyOP6y ago

You are correct! Thanks for catching this! I'll fix the example later today

iandanforth6y ago· 2 in thread

This is especially true if the usecase is common enough to put in the top level examples

timothycrosleyOP6y ago

duckerude6y ago

3 more replies

kburman6y ago· 1 in thread

Nice concept but does it work with real-world application. I failed to understand how will it work with methods like `authenticate_user(user)` or `load_permissions_from_db(user, db)`.

timothycrosleyOP6y ago

Hi @kburman,

For cases like that it would be best combined with a mock:

  @auto_pytest()
  def test_add(test_case, mocker):
      mocker.patch('db.call')
      test_case()

Note that this example utilizes the following pytest extension: https://github.com/pytest-dev/pytest-mock

ben5096y ago

On one job, I had to disable code coverage for a whole suite of tests that were simply making calls and completely ignoring the results.

I know, coverage is Yet Another Metric, but if you don't game it, it can help you track down branches you haven't written tests for.

So my hesitation is I can see people running this, gettting 100% code coverage and thinking, "hooray, it's fully tested!"

somada1416y ago

StavrosK6y ago

This is great, thank you! Another great example of the benefits strong typing brings.

j / k navigate · click thread line to collapse