It would watch your program during execution and record each function calls input and output. And then create tests for each function using those inputs and outputs.
You could always go over the created tests manually and fix them or review them but if nothing else it could be a good start.
A lower barrier way of trying this out could be to run MonkeyType in combination with OP's hypothesis-auto (which uses type annotations to generate the tests).
[1] https://github.com/google/pytype
[2] https://github.com/dropbox/pyannotate
If the library can use the observed and recorded testcase to extrapolate far beyond your manual call, this could be a really interesting way of testing python code.
Because the common lisp environment already defines special variables that contain the last form evaluated and the return value of the last form evaluated, when you evaluate code at the REPL and it succeeds, you call the (testy:test) function, which captures them and wraps them up as a test for the package you're working on, adding it to the database of tests for the package.
Not only was it useful for development (no need to specify your tests up front, you write your code until it works, then that state "is" the test), but it was also helpful for fixing bugs that you hadn't thought of up front (you fix the bug and then that state is captured as a new test as well).
Is naive type testing a thing that people actually bother doing vs testing that the functions actually do what they're supposed to?
If a function takes two ints and returns an int that is the integer division of the two, is the evaluation of exceptions and returned types not already implicit in the evaluation of the actual results?
If you know that divide(a, b) should return c for sufficient candidates a, b, and c, then you _know_ that divide returns the right type without explicitly checking. And knowing that the divide function happens to return ints when given ints doesn't actually tell you that it's doing anything even close to the right behavior. So this both doesn't reduce the number of tests you need to write and is also obsoleted by actually writing the tests that you need.
I use hypothesis to test a DB abstraction, and it often caught bugs in edge cases I didn't consider: timestamps/timezones too large, strings containing null bytes, etc.
These would be caught by the tests generated by hypothesis-auto.
Simple example:
from my_library import add
from hypothesis_auto import auto_pytest
@auto_pytest()
def test_add(test_case):
add_result = test_case()
if test_case.params.kwargs['number_1'] > 0 and test_case.params.kwargs['number_1']:
assert add_result > test_case.params.kwargs['number_1']
assert add_result > test_case.params.kwargs['number_1']So it's a fuzzer?
Also, `auto_pytest_magic` doesn't seem to exist.
ImportError: cannot import name 'auto_pytest_magic' from 'hypothesis_auto' (/home/.../.local/lib/python3.7/site-packages/hypothesis_auto/__init__.py)This is a very interesting idea I might explore more!
`auto_pytest_magic` is only made available if you have pytest installed. You can enforce this by doing:
pip3 install -U hypothesis-auto[pytest]OP's extension is inferring arguments to the fn under test, and then generating a test like 'assert isinstancee(fn_under_test(a, *kw), expected_ret_type)` to go with them.
You can do more types of test with `infer` (i.e. not just 'does return' and 'returns correct type' but also 'returns correct value'), but this is an easy way to cover a lot of basic return checking ground.
I think there's a typo in the last code box? It looks like you repeated "from hypothesis_auto import"? (that, or I understand even less python that I thought!)
This is especially true if the usecase is common enough to put in the top level examples
For cases like that it would be best combined with a mock:
@auto_pytest()
def test_add(test_case, mocker):
mocker.patch('db.call')
test_case()
Note that this example utilizes the following pytest extension: https://github.com/pytest-dev/pytest-mockHowever, I would also note, that just because you have some methods that have potentially dangerous side effects, in most large code-bases, not all functions do. Which is why it operates at a function by function basis. You can use this for new pure functions, while continuing to write other tests in the same fashion by hand.
I know, coverage is Yet Another Metric, but if you don't game it, it can help you track down branches you haven't written tests for.
So my hesitation is I can see people running this, gettting 100% code coverage and thinking, "hooray, it's fully tested!"
I don't mean to detract from this library, I think its a great combination of strong-typing and property-based testing but has anyone had any experience employing property-based testing on complex functions outside of the whole add/subtract/multiple stuff? What kinda thing have you used it on?