1) Write a googletest 2) Write a googlebenchmark 3) Run all unit tests under AddressSanitizer, ThreadSanitizer, and g++ UB sanitizer 4) Tidy up with clang-format 5) Run cppcheck
So I feel pretty confident I'm not doing something braindead if I can get this stuff through CI.
But for Python, I don't really have good idea when I'm doing something that'll cause me agonizing pain in the future. The only tool I use is flake8, which is awesome, but I can't see memory leaks or performance profiles.
What strategies do you adopt (and what tools do you use) to keep all hell from breaking loose in large Python projects?
2) `from collections import namedtuple` reinforces 1
3) nice lighweight tests, py.test or nose
4) integration tests so you can actually refactor w/o having to recode 100 unit tests
5) write as little code as possible
http://rbcs-us.com/documents/Why-Most-Unit-Testing-is-Waste....
You can write integration tests with the same tools as unit tests.
Personally, I try to make sure all major paths are covered, but I won't test every little UI detail. It's faster to test those manually. But YMMV.
2) From the doc [1]:
> Named tuples assign meaning to each position in a tuple and allow for more readable, self-documenting code. They can be used wherever regular tuples are used, and they add the ability to access fields by name instead of position index.
They do act as immutable containers for related data. They are usually used to replace classes that don't have any behavior/methods.
They avoid confusion by having a name, and named attributes.
[1] https://docs.python.org/2/library/collections.html#collectio...
The only way I could manage it was to write tiny functions, so I could literally eyeball the scope and keep all the details in my short term memory. I would not recommend using this language for larger projects.
It's true that there are limits to what they can do, since Python will let you do some pretty funky stuff, but you were nowhere near those limits.
Regarding the missing API documentation: Python programmers will often have a Python shell open and just experiment with stuff in there with instant feedback.
For instance, I know one project where someone decided to write a relatively large Python plugin for Excel. One essential component was a button that put up a shell so you could experiment with the Excel COM API live. So you could access the sheets and write stuff into cells and see Excel update them.
If you write Python like you write .NET or Java code, you'll probably only see the downsides. Getting a taste of the upside requires adjusting your coding style. It's going to take some time. And yes, there are of course downsides, language design is a trade-off.
Starting from here, unit tests will take you a long way. Tox + pytest + coverage.py is the defactor standard for tests now, and will give you peace of mind when editing your code. Tox can run flake8 as well so it's often done.
After that everything is a luxery. You can use mypy to get static typing, you can make sure to have a very good editor checking stuff for you such as PyCharm or Sublime Text + anaconda. You can use CI with something like Travis or buildbot.
I usually make sure to have a .editorconfig file and a clear style convention to easy team work. And I like to use sphinx to write the doc of the project, which you really, really need to do. This include docstring for modules, classes and functions (with Google style for me), comments, but also some manual rst files.
Last, but not least, experience matters a lot. You learn how to organize stuff in your dir tree. I like to split any file bigger than 500 lignes in Python because it's such an expressive language. Having one module for exceptions. Having proper unicode handling from the start. Etc.
I'd probably agree, but I would love to have more mature static typing, and especially an accurate "find all usages" function. Refactoring has been major hell because you pretty much have to grep for names if you want to find all usages of a variable (or, more specifically and problematically, a model property in Django).
One benefit of grep is that it lets me spot things in different languages (Python, Javascript, HTML, CSS, documentation).
Have you tried out the Silver Searcher? http://geoff.greer.fm/2011/12/27/the-silver-searcher-better-...
I've also heard that at Google they require assert isinstance for every parameter of every function.
New Relic has very good tools for profiling python code if you're running a service.
For example, it's throwing a "dodgy" warning on my Django project because of the SECRET_KEY variable in settings.py.
I like dodgy and I actually pull out my secret key from settings.py (I pull it from the system environment). Otherwise you can just disable dodgy by using a custom profile (http://prospector.landscape.io/en/master/profiles.html#enabl...)
Not exactly the 'microservices' approach, but similar ideas.
One of the most useful things related to this is to focus on interface design. It's easy to scrap a bad implementation and re-implement the same interface later, it's harder to fix a bad interface that you're using all over the place. Making some implementations pluggable up-front will also make it easier to swap things out later.
Another thing that ended up causing the most pain in the long run was building in too much functionality directly instead of leaving things up to plugins. Plugins can more easily be enabled or disabled to compose specific functionality. The alternative is tons of code and tons of configuration options to handle every little corner case.
As far as lower level tools, the 'coverage' tool integrated with your test suite is a must have.
Tangentially, I don't like how people conflate writing small, self-contained libraries with "microservices". It's like people forgot that you can write libraries without sticking a huge mess of networking and glue code in between them.
(https://www.stavros.io/posts/microservices-cargo-cult/ for a related rant)
* No duck typing
* Document all input/output parameters to functions.
* Avoid fancy meta-progamming.
* Try to break it up into smaller pieces with well defined interfaces that are ideally not pythonic. Think about the interfaces as something you'd potentially want other programming languages to be able to talk to.
* Don't use eval/exec or more generally don't pass Python code around.
In general as the scope of a module/unit gets larger you want to stick with simpler stuff. If the module/unit is small, the interface is small, and there are good tests around it, you can do anything you want inside that smaller bit.
[EDIT: These comments are mostly Python specific. On top of that you'd apply what you would in any other language, organize your code properly, consistent style throughout which in Python includes following the relevant standards PEP8, doing code reviews, tests etc. etc.]
Trying to make your parts as self-contained as possible is always a good idea.
We must have different definitions of "all hell from breaking loose", since none of those tools would help me avoid that in C++ at all. Hell has broken loose when you have an unmaintainable ball of mud, which IMO has little to do with what those tools help with.
But for those tools:
Many of them you just don't need in Python due to language differences. You need to make sure all branches of your code are somehow tested, but memory management is easier with (mandatory) reference counting, you don't have pointers that are not valid, and so on.
As for performance profiles, well, since you're thinking of using Python, you already decided IO is your bottleneck and not CPU. The moment you find yourself thinking "I should profile this code and speed it up" is the moment you consider using another language for the job.
Very much agree about maintainability. When the business situation allows it, you need to get around to fixing the annoyances that pop up and bug you.
You eventually wind up asking yourself "is it safe to change this behavior?" pretty often. If you do something weird you know needs to be preserved (or know can be fixed after some other fix), comment why. If a piece of code is depended on by some other thing in a potentially unexpected/important way, note that. Of course, you can't always predict what info will be most useful, but at least think of it as you write.
You're probably on a team on this size of project. You should make as much outside-the-code info as you can accessible and searchable by as many folks as possible. It can be reasonable to make bugs tracking your work/plans/intentions even if you're the only one working on a change. Write internal docs on how large new chunks of code work. Sometimes writing forces you to sort things out that you didn't even realize were messy when they were just in your head.
You can wind up with accidental "ownership" of old and more-difficult-to-safely-update code. Fight this. Try to get everyone working on everything as much as feasible. If you worked on the old code, be clear you don't think it's perfect and encourage improvements to it. If you're looking at other folks' code, if you need to ask silly-sounding questions to help figure it out, do.
Relatedly, new folks will need help, especially if they are actually junior, not just new to the product. Do help, and when you help them with a specific problem also help them fish (like, show where they'd go to look for the info you're pointing them towards, and if there's no such place, maybe there should be one). When they think things are strange, pay attention; they aren't inured to your project's weirdness like you inevitably are.
Release often, have real QA, and be prepared to be responsive as bugs come up. Releasing often tends to find bugs while the change that introduced them is fresh on the mind. Splitting up large features can help achieve that (we're still figuring this out ourselves, though).
It's probably worth running your unit tests in parallel (I've heard of a very large Ruby SaaS product using a cluster). Depending on what kind of stuff you test, you may have to deal with flaky tests. They're one of those annoyances worth working on; you want a test failure to be meaningful.
Look for ways to shrink the project. Maybe something can be done by an outside product or library, and let you focus on what only you do.
Getting things basically right across a wide range of areas is more important (and harder) than zooming in on a single area like how the code looks. In our world, that means getting operational processes, monitoring, support, feature prioritization, how well folks work together, etc. right, and minimizing walls within the org that would impede that.
But my $.02: Documentation, python code itself is easy to read but if you have a large project, broken into many small code bases for libraries, services and front ends then you need solid documentation. Not only text but also diagrams showing how all the parts work together. Not sure if they're called diagrams in english but it's the stuff you make in MS Visio or Draw.io.
They start off with a pretty decent amount of unit tests (84% coverage) and make sure it's visible to developers using:
- Travis [2] (has to pass on pull requests too before contributions are accepted)
- Coverage [3]
There's also Code Climate [4] for some more introspection.
[1]: https://github.com/pydata/pandas
[3]: https://codecov.io/
Unfortunately, it is very hard and brittle.
You definitely need rigorous testing to keep it all in one place, but I would steer away from Python for a large project.
Also take a look at mypy or some other sort of typing to see if it can aid in a large project. [1]
[0] https://www.quora.com/Why-does-Google-prefer-the-Java-stack-...
[1] http://code.tutsplus.com/tutorials/python-3-type-hints-and-s...
* Measure test coverage to be aware of what is not tested, but don't just pursue exact coverage % number - doing that leads to many integration tests and a few unit tests. Both kind of tests is important.
* Extract libraries from the main code, to make the main project smaller; write docs and tests for these libraries. Docs are important for these libraries. Try hard to maintain boundaries - a library should have a single purpose, and it shouldn't be tied to the rest of the code. If you find writing docs complicated them maybe the library does too much, or maybe its API is too hard to use. Fix that.
* Don't write all code yourselves, consider using open-source libraries. But don't use open-source libraries if you're not comfortable with contributing to them - there will be issues (like in any code). If the library you're going to use is not an industry standard read its source; if it is "ah, yeah, this is almost how I'd written that" use it, try to find another library or write your own otherwise.
I'd say the trick to handle large Python projects is to resist making them large. Don't be sloppy in code organization, be pedantic about which part "knows" about which part, extract non-specific utilities to libraries. Often projects can be kept under 20-50K lines of code after a few years of development by a small team if a team tries to maintain code quality and moves non-specific features to external libraries.
flake8 and alike linters may help with consistency; it is important, but not the main problems by far. The main problem to fight is non-locality: if one can reason about a piece of code just by looking at it, without checking lots of other components, the overall project size doesn't matter much.
* run unit tests and integration tests in an automated manner for every commit. (a.k.a use tox, jenkins or somesuch...).
* Depending on the software you are creating deploy a chaos monkey[1] kinda approach for disaster/HA testing.
* Read up on good Python practices :
http://python-guide.readthedocs.io/en/latest/
http://python.net/~goodger/projects/pycon/2007/idiomatic/han...
-----
Am I wrong?