While I am asking myself this question, the only one that popups to my mind would be Laravel: https://github.com/laravel/laravel (PHP)
One could think that a codebase as popular as React (https://github.com/facebook/react) would be a perfect example of "clean code" but with a glance, I personally don't find it very expressive.
This may all be very subjective but I would love to see examples of codebases that member of this community have enjoyed working with
Anyone have an example of a consumer application that has a good codebase? Chromium, GitLab, OpenOffice, etc? I feel like such applications inherently have more spaghetti because the human problems they're aiming to solve are less concretly scoped. Even something as simple as "Take the data from this form and send it to the project manager" ends up being insanely complex and nitpicky. In what format should the data be sent? How do we know who the project manager is? Via what format should the data be sent? How should we notify the project manager? When should we send the report? Some of these decisions are inherently inelegant, so I feel like you get inelegant code.
#ifdefs are themselves spaghetti; windows.c [0a] and files.c [0b] have tons of platform-related ifdefs (but, mercifully, not too much nesting), hacklib.c [1] has some deep-ish ifdef nesting (though, again mercifully, well commented)
Ok on re-reading some of this, I guess it was worse in my mind than it actually is. Nethack was the first large codebase I ever made any changes to and tried to understand, so maybe the relative enormity at the time made a negative impression.
Check out the source for Brogue [2] for what I consider to be pretty readable game code.
[0a] https://github.com/NetHack/NetHack/blob/NetHack-3.6/src/wind... [0b] https://github.com/NetHack/NetHack/blob/NetHack-3.6/src/file... [1] https://github.com/NetHack/NetHack/blob/NetHack-3.6/src/hack... [2] https://sites.google.com/site/broguegame/
Also curious about some good, not too hugely sized, game code (preferably something not written in C/C++, maybe like an indie game from the past decade or so). Anyone know something?
Keldon was contracted a few years later to develop the RftG apps on iOS and Android, which are easily worth $4.
Source: https://github.com/bnordli/rftg
Precompiled binaries: http://keldon.net/rftg/
But yeah, Postgres also gets my vote. I guess there’s a bit of a bias there because devs are likely to read the code of the tools they use; either to track downs bug or just to understand how it works.
I found them so under-tested and buggy that finally I just gave up. They're now blacklisted in my mind -- I'll never use one of their products again.
I guess the lesson here is that good code doesn't always have a strong relationship to a usable or bug-free product.
Just to drive the point home, I was developing in Python and with no knowledge of Ruby, I was able to go through code using just github and I got what I wanted every single time.
sqlite is much less complex, but similarly approachable.
In more recent examples, I think you see a lot of this same reader-centric pragmatic ethos in many Go projects. The Kubernetes codebase comes to mind as a very large tome that remains approachable. And the Go stdlib, of course.
Java generally falls on the opposite side, but there are counterexamples. A lot of Martin Thompsons code eschews Java "best practices" in favor of good code. Seeing competent people in the Java space "break the rules" helps.. though of course Java is forever hampered by having internalized illegible patterns as best practices in the first place.
It's a shame because at least the OpenJDK implementation of the standard library in Java is generally quite good, especially around the concurrency parts. Clean, easy to follow, reasonable comments. But of course that's Java written by C developers, mostly.
I'm a fervent believer in "good code is self-documenting", so I was curious to be proven wrong, clicked randomly until I found code and I saw this.
/*
* Round off to MAX_TIMESTAMP_PRECISION decimal places.
* Note: this is also used for rounding off intervals.
*/
#define TS_PREC_INV 1000000.0
#define TSROUND(j) (rint(((double) (j)) * TS_PREC_INV) / TS_PREC_INV)
Usage of acronyms is one of the worst offenders in bad code. The context makes it clear that TS means timestamp, so that's not too bad (still bad though), but I'm still not sure what INV means, luckily I presume it's the only place it's used.If it was named TIMESTAMP_ROUND, I wouldn't need to know "Round off to MAX_TIMESTAMP_PRECISION decimal places." Now that I've copy/paste that, it seems like the comment is wrong too, it's rounded off based on TS_PREC_INV, so if I was to believe the comment, I wouldn't get the right behaviour.
I'm not saying Postgres codebase isn't good code, just that "good code is self-documenting" is still true. That code was pretty much self-documenting except for the acronyms, but considering it was all used together, it's was fine and I was able to understand what they meant.
For me, comments should only be needed when something isn't clear. Defining what isn't clear is hard to determine for sure, but that's one thing for which code review helps quite a bit.
Man I hate dogma like that. My "common sense" comment style is always, "code tells me how, comments tell me why." The only exception is in hand optimized code where I'll non-doc comment what the reference implementation would be above the optimized version, which is _sometimes_ necessary when tests aren't in the same translation unit.
I still think comments like these are super redundant and annoying.
A special mingw tool to create importlibs is/was broken on 64bit. I think it was called dlltool. Normally you'll just need to add a flag to the linker to create that.
So no, postgresql not.
xsv: https://github.com/BurntSushi/xsv
ripgrep: https://github.com/BurntSushi/ripgrep
His code typically has extensive tests, helpful comments, and logical structure. It was fun trying to imitate his style when writing a PR for xsv.
The Quake 2 engine was also pretty interesting: It was almost totally undocumented, and it had plenty of weird things going on. But I could count on the weird things being there for a reason, if only I thought about it long enough.
Also -- the source code to Doom. Read it, marvel at its clarity and efficiency -- and then laugh when you realize that the recent console ports were completely rewritten in fucking Unity. And the Switch version chugs, despite the original running well on 486-class hardware.
I wonder why they didn't just write an emulator, then. Especially on the Switch if there are performance issues.
I found the source very approachable. Source was well laid out and fairly clear. Some of it was subjectively a bit ugly to just look at, but when you read it, it was very clear.
Couldn't use glibc as a reference because this in a closed source commercial product and, well, GPL.
For Python, I really like how SQLAlchemy is written and designed.
For Rust, ripgrep stands out as a sterling example of how to write a powerful low-level utility like that.
Strongly agree with this one about Redis.
Overall it's still definitely on the more readable side compared to other C code I've seen, I like the thorough comments, and it's generally decent, but I'm not particularly coming away from it in awe like everyone else seems to have.
[1] https://github.com/antirez/redis/blob/unstable/src/server.c#...
[2] https://github.com/antirez/redis/blob/unstable/src/server.c#...
[3] https://github.com/antirez/redis/blob/unstable/src/server.c#...
[4] https://github.com/antirez/redis/blob/unstable/src/server.c#...
[5] https://github.com/antirez/redis/blob/unstable/src/server.c#...
[6] https://github.com/antirez/redis/blob/unstable/src/server.c#...
[7] https://github.com/antirez/redis/blob/unstable/src/server.c#...
[8] https://github.com/antirez/redis/blob/unstable/src/server.c#...
[9] https://github.com/antirez/redis/blob/unstable/src/server.c#...
* hyperloglog.c
* rax.c
* acl.c (unstable branch, the Github default)
* even cluster.c
Everything you'll pick will likely be a lot better than server.c
Other things you mentioned are a matter of taste. For instance things like:
if (foo) bar();
Is my personal taste and I enforce it everywhere I can inside the Redis code, even modifying PRs received.The line breaks are to stay under 80 cols. And so forth. A lot of the things you mentioned about the "style" are actually intentional. The weakness of server.c is in the overall design because it is the part of Redis that evolved by "summing" stuff into it in the course of 10 years, without ever getting a refactoring for some reason (it is one of the places where you usually don't have bugs or alike).
Windows is quite an engineering achievement. We didn't prioritize readability or "clean code". All the variables used hungarian notation, so you had horrible names like lpszFileName (lpsz = long pointer to a zero terminated string) or hwndSaveButton (window handle). You also had super long if(SUCCEEDED(hr)) chains that looked like your code was spilling down a staircase. Oh yeah, and pidls (pronounced "piddles" and short for "pointer to an id list") used for file operations.
What made the code base beautiful was the extreme lengths we went to to be fast and keep 3rd party software working. WndProcs seem clunky, but they are elegant in their own way and blazingly fast. All throughout the code base you would find stuff like "If application = Corel Draw, don't actually free the memory for this window handle because Corel uses it after sending a WM_DESTROY message."
The fact that thousands of people worked on the code base was mind boggling.
1. I think I counted 5 string implementations in active use and code at the boundary had to convert between them all.
2. The SUCCEEDED macro is a mask against HRESULT but who the hell actually uses non-zero HRESULTS to communicate domain-specific success codes? And don’t forget that posix APIs return 0-for-non-error ints and COM APIs can use S_TRUE (0 to be a non-error) and S_FALSE (1) so you have to flip them for real bools. Or have if (bResult == S_TRUE)
3. Nobody wanted to touch old codebases. I fixed an assert in Trident layout code because a whole library used upper-left, lower-right input (and params called ul, lr) but one function (contrary to docs) used upper-left, width, & height. When I fixed the library and 2/3 call sites I was called arrogant, to revert changes in the library, and change the last 1/3 to also have the inverse bug in its call-site.
4. Another Trident API (written by an intern) had a tree where fastInsert() could only be called after slowLookup() but nothing in the api enforces this
5. Every COM object decides whether it’s faster or thread-safe by whether the refcount uses atomic ops or just —/++
6. Saw parallel arrays in files where a struct held an object which might have suffered the slicing problem in insert. Another struct field held an into the index of the sliced part array. Users rehydrated. This wouldn’t happen with an object pointer, but indirection was unacceptable because the author didn’t trust the small allocation heap’s locality.
7. My codebase included a while c++ runtime because my core-OS team didn’t trust msvcrt.dll because the shell team wrote it.
> ...like lpszFileName (lpsz = long pointer to a zero terminated string)
I remember those.
AIUI hungarian gives you some kind of typing. The typing is done by humans using the names. The humans have to get it right; they are the typecheckers.
The first thing I'd do is offload the typechecking onto an automatic framework - the idea of letting people do a computer's job is madness. It would not have been too hard to do (relatively very cheap for a large codebase like an OS), I think, and would have allowed the hungarian prefixes to be dropped because they'd become redundant, and strengthened and speeded up typechecking. So where is the flaw in my thinking?
(aside: one of my first contract jobs was working in pascal (delphi actually). The company I worked for had coding standards cos you need standards, don't you. It was to prefix every integer with i_, every float with f_, every int array with ai_, et cetera. As pascal was strongly typed this was totally pointless).
[1] https://www.joelonsoftware.com/2005/05/11/making-wrong-code-...
The codebase has 34 years worth of code written already, 100 million LOC or more if you count Office, VS etc. The cost of typechecking is trivial but the cost of rewriting this much code to be consistent with any new convention is in the hundreds of millions of dollars. This legacy cost then of course becomes higher every year...
As an author of Sciter Engine that works on Windows, MacOS and Linux/GTK I have first hand experience working with all three API sets.
Windows API is the most logical, complete and stable API among all others.
It has everything that you really need to create performant and manageable UI.
MacOS is good but less good. It uses reference counting (which is not bad by itself) but in very strange manner. Name of the function determines need of [obj retain] / [obj release] and not all names that they use are consistent in that respect. Yet Apple changes API quite frequently and dramatically.
GTK, while is built on top of quite reliable Glib foundation is a mess to be honest. You have GtkWindow and GdkWindow, you have gtk_window_resize(), gtk_window_set_default_size() and 6 more functions that should allow to set size of the window but they may or may not work in particular situations.
But hell did it take me awhile to figure out how that worked because it's so poorly documented and auto-magical. Reverse engineering a COM DLL just to find out how the hell it has a stable ABI is not fun.
I liked that it was actually possible to read it and understand what was going on.
In a similar vein, P. J. Plauger's version of the The Standard C Library is nice because even if it might not be especially optimized(?), you can actually read the code and understand the concepts that the standard library is based on.
Software Tools by Kernighan and Plauger would also be great except that you have to translate from the RatFor dialect of Fortran or Pascal to use the code examples.
Even so, I used its implementation of Ed, to create a partial clone in PowerShell that let me do remote file editing on Windows via Powershell when that was the only access that was available.
So even over 4 decades and various operating systems removed, there are still concepts in there that are useful.
Jonesforth is also a great and mind blowing code base although I'm not sure where the canonical repository is currently.
I think a common misconception amongst mid-experienced programmers is that they confuse look with quality. Reading clean written code gives you a feeling of control and also the feeling that someone must have thought about that program. It's reassuring. You have in front of you a code that gives you trust.
When in fact, that code can be complete garbage.
The look of the code doesn't matter, what matters is the program. In the abstract meaning of the term. You don't judge a code by reading it, but by running it in your head. Granted you have to understand it in order to do that. Once you understand the code, you run it in your head and that's when quality enter the scene because running it in your head is what you do all day when you code. Some says that you spend most of your time reading code. That's simply not true, the effort is definitely not in reading but in running the code in your head. Basically what I'm describing is a 2 by 2 matrix where there is one column for look bad, one for look good, one row for runs badly in the head and one for run smoothly in the head. Granted, the best may be when both the code looks right and runs right, but don't be mistaken, the real important and difficult part is whether or not it runs well in the head.
A poor quality program may look good, but don't run well in the head. It's too complex or too confusing (in terms of logic, not in terms of presentation) or convoluted or simply wrong in terms of what it's supposed to do. On the other hand good quality code is code that surprises you by the way it runs. It's beautiful in terms of simplicity, it delivers a lot, it's small so that it fits well in the coder's head. And it may look like garbage which is not so important.
You may wonder how to know very quickly the quality of a code base. Run part of it in your head. Contemplate the machinery. Try not to think to much about the language and how it's constructed in this language, try instead to contemplate it in an abstract manner. Be critic, and critic your critics.
This is probably related to a factor named 'local reasoning', procedural programing tried to encourage this through procedure, OO tried to encourage this through encapsulation, and FP encourage this through purity.
Basically the goal is when anyone look into a function, the reader can easily make sense of the code without moving around.
For pure functional programming, to make sense of a function is to make sense of the branch under the acyclic calling tree of the function. The caller and sibling branches are always completely irrelevant.
So that it will be much easier to run in people's head.
I have encountered far too many people who don't even realize this is a thing. I figure they must be doing it in some limited form and just aren't recognizing it as a skill that can be trained up, otherwise I'm not sure how they can do any development, but...
In my career, I've seen many times peer programmers laugh at me when looking at my code and subsequently keep laughing, but in the other direction, as we were rolling over our competitors. Once, at a FANG, I've had the perfect setup where our team was doing the same project than another team. We did it in 12 month with 8 people, it took them 4 years with 18. Both projects started from scratch.
Follows an example of code of my own. This is the perfect example because it looks like complete utter garbage and you will think I'm a beginner and I've never seen any good designed code. You are gonna laugh. While it is certainly not flawless, I'm pretty sure you would like to be on my team with that code if you'd knew more. All the features, to the tiniest nice subtlety are there, it has almost no bug and it's dead simple. Really dead simple. Please note that I code with sublime text which has multi cursor making duplicated code not so bad. So there is a lot of duplicated code. Again, I'm not saying the following code is flawless, I'm saying that despite its flaws, or even because of them, from my experience you'd prefer to be in my team.
https://github.com/IndieRobert/example_code/blob/master/src/...
In github, rather than see what has changed, it would be interesting if there was a comment that told you what the folder contained.
edit: Relevant here because the best codebase for me is one where I can understand the folder structure, but that is a sort of 0th order effect that should be equalized with some tool.
See for example this package comment: https://github.com/golang/go/blob/master/src/net/http/doc.go...
Turns into this documentation (the beginning only): https://godoc.org/net/http
Out of the box.
Having a sensible folder structure and good folder names is nice, but taking a few minutes to write individual READMEs can make a repo even easier to understand.
It rarely gives you what is in each folder, and what part of the functionality each folder handles, although perhaps we should try to change the conventions of readme files to include file structure.
edit: I mean the root readme might contain what is in each folder so you don't have to click on each one to see which one you want to start with.
It takes a lot of investment from a developer before they can appreciate the beauty of the code... To make matters more confusing, a lot of developers tend to become extremely attached to even horrible code if they spend enough time working with it; it must be some kind of Stockholm syndrome.
I think the problem is partly caused by a lack of diversity in experience; if a developer hasn't worked on enough different kinds of companies and projects, their understanding of coding is limited to a very narrow spectrum. They cannot judge if code is good or bad because they don't have clear values or philosophy to draw from to make such judgements. If you can't even separate what is important from what is not important, then you are not qualified to judge code quality.
If you think that the quality of a project is determined mostly by the use of static vs dynamic types, the kind of programming paradigm (e.g. FP vs OOP), the amount of unit test coverage and code linting, then you are not qualified to judge code quality.
I think that the best metric for code/project quality is simply how much time and effort it takes for a newcomer to be able to start making quality contributions to the project. This metric also tends to correlate with robustness/reliability of the code and also test quality (e.g. the tests make sense and they help newcomers to quickly adapt to the project).
As developers, we are familiar with very few projects. If a developer says that they like React or VueJS or Angular, etc... they usually have such limited view of the whole ecosystem that their opinion is essentially worthless; and that's why no one ever seems to agree about anything. We are all constantly dumbing down everything to the lowest common denominator and regurgitating hype. Hype defies all reason.
It's the same with developers; most developers (especially junior and mid-level) are incapable of telling who is actually a good developer until they've worked with them for about 6 months to a year.
If you are not a good developer, you will not be able to accurately judge/rank someone who is better than you at coding until several months or years of working with them. Sometimes it can take several years after you've left the company to fully realize just how good they were.
While I agree when evaluating a codebase by the broad architecture (which I often judge by cohesion and coupling), I feel evaluating details first requires learning to read code as well as prose. Then “bad” or “ugly” code is code that reads arcanely like olde English.
Tellingly, Marijn Haverbeke, Codemirror's creator, is also the author of the excellent 'Eloquent Javascript' [1].
Author is unwilling to change a handful lines of code to make the package compatible with a strict style-src.
Why does this matter? You can exfiltrate data such as CSRF tokens using inline styles from a HTML injection vulnerability: https://medium.com/bugbountywriteup/exfiltration-via-css-inj...
[1] Interesting Codebases: https://news.ycombinator.com/item?id=15371597
[2] Show HN: Awesome-code-reading - A curated list of high-quality codebases to read https://news.ycombinator.com/item?id=18293159
Small summary of the features I liked:
- Simple documentation
- Intuitive structure
- Lots of JS best practices, but still simple
- Event-driven architecture
- A simple API gateway that will just fire events to workers
- Properly divided workers (kind of microservices but with lots of shared code)
- Monorepo
It recently been bought by GitHub(1) and was discussed here(2).
The author has talked in his blog about some decisions he took wrong. Super interesting post(2).
0. https://github.com/withspectrum/spectrum
1.https://spectrum.chat/spectrum/general/spectrum-is-joining-g...
2. https://news.ycombinator.com/item?id=18570598
3. https://mxstbr.com/thoughts/tech-choice-regrets-at-spectrum/
Carmack said that he was "the best developer I ever worked with"
https://github.com/tornadoweb/tornado/blob/master/tornado/io...
For most web apps my default choice is Django, but for special purpose web servers Tornado or Flask are still useful.
Requests for Python https://github.com/psf/requests
Comments like the one cited are fantastic, but we're interested in examplars of good (i.e., elegant and readable) code, not ancillary matter.
[1] https://fosdem.org/2019/schedule/event/kubernetesclusterfuck...
If you can comprehensively test, then the docs can live there, and you don't need quite so many warnings about introducing bugs.
(d3 and three.js are also very interesting to read, but they're not quite in the same class as the former.)
A good example of creating a DSL and then efficiently using that. "Never had a memory leak from day 1 (Roger Hui)" (written in C).
Beautiful codebase, rock solid, and way better option than ActiveRecord IMHO
I like the Linux kernel codebase: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...
Same here, I was looking at linux kernel network code lately and I was surprised by how clean and easy to follow it was.
https://github.com/robinhood/faust
Also voted for Postgres, Redis, and NetBSD.
Further, all internal classes also include a PIMPL (d-pointer or data pointer) to hide internal details from API customers.
IMO, The d-pointer makes stuff much more difficult to read, and the Qt idioms are probably only useful if you are a Qt developer. So maybe might not be useful for you if you are on a non-Qt project.
Splitting it up into libraries is good engineering practice as it ensures boundaries. (Whether it is packaged into their own dll/so/a/dynlib is a deployment question, you can compile all parts statically together)
The licensing is a political/business question independent from code being nice, good and clean. (Except one has to be careful during refactorings, as code might change license, but since afaik the Qt Company has CLAs ensuring copyright ownership they are able to do that)
React frontend app: https://github.com/mattermost/mattermost-webapp
Kudos also to Postgres, SQLite, Lean (theorem prover), and the containers library for Haskell.
libavformat is rather difficult to use and difficult to fix bugs in - you'll never find the bugs. Same with the ffmpeg frontend, which makes it easy to ask for something it's near impossible to get right, like copying an mkv file to an avi, it'll just corrupt your data silently.
Everything about the video decoders is great, but encoding never worked as well, which is why nobody uses ffmpeg2/4/etc and x264 is a separate project.
Some examples?
encoding never worked as well, which is why nobody uses ffmpeg2/4/etc and x264 is a separate project.
Most users use x264 _via_ ffmpeg since they may need to filter the video and/or filter/process/mux audio and other streams.
1. Jersey - https://github.com/eclipse-ee4j/jersey 2. Jetty - https://github.com/eclipse/jetty.project 3. Guava - https://github.com/google/guava
Common theme is - Easy to follow, clean documentation & use of consistent patterns.
It borrows all the best practices from PostgreSQL the naming of variables and functions are more self-explaining in general.
I also believe that the practices around PRs and code reviews are also good examples.
Not only that but laravel is in a mature space where the problems are already solved. Its basically reinventing the wheel.
Im not surprised that Laravel is written cleanly but I hate its API. It reminds me of the bloat of Zend but with an obnoxious artsy style added to it.
Im an engineer not an artisan.
That was mostly branding, "I'm not a code monkey banging out the same thing as has been done 500 times before I'm an artisan.
Meanwhile the actual framework breaks backwards compatibility regularly and frequently and only just with 6 picked a damn versioning system.
Imo my opinion if you want to see a good framework that solves it's problems mostly well and is properly decoupled then Symfony kicks the shit out of Laravel on documentation, religious adherence to deprecating and backwards compatibility as well as genuinely useful/genuinely decoupled components.
The author of Laravel knew that as a massive chunk of Laravel depends on Symfony components, in fact the earlier versions where basically Ruby on Rails implemented via Symfony.
No. PSR does not at all ensure good code, only standardized code style and some of the interfaces.
This python code is responsible for the fairly recent imaging of the black-hole (i.e. imaging, analysis, and simulation software for radio interferometry).
It's extremely easy to digest despite the complexity involved.
- (BOOL) doYouKnowTheMuffinMan:(TheMuffinMan *)theMuffinMan;
Also, lots of the Objective-C runtime code was clear enough to explain concepts like ARC hacks well enough that I could learn about and give a talk on the Objective-C runtime with a month’s notice.
It's got some interesting usage of custom Swift operators to create almost diagrammatic code, like here: https://github.com/kickstarter/ios-oss/blob/master/Kickstart...
_ = self.cardholderNameTextField
|> formFieldStyle
|> cardholderNameTextFieldStyle
|> \.accessibilityLabel .~ self.cardholderNameLabel.text
And it's the first iOS codebase I've seen that puts test files right next to the files that define the things being tested. It's all there together.Tons of other goodies to find.
https://medium.com/@012parth/what-source-code-is-worth-study...
"The world's first operating-system kernel with an end-to-end proof of implementation correctness and security enforcement is available as open source."
for instance, look at strnlen:
word_t strnlen(const char *s, word_t maxlen)
{
word_t len;
for (len = 0; len < maxlen && s[len]; len++);
return len;
}
http://sel4.systems/I also enjoy diving into it when I hit a breakpoint that calls one of its methods.
https://github.com/seattlerb/minitest
Apparently, some people think my toy channel implementation called Normandy is somehow good:
C: redis
Python: scikit-learn
_______
Edit: formatting
That thing is the epitome of a framework for frameworks sake.
Pretty sure Most of Martin's talks begin by complaining about this sort of thing?
We're not talking usefulness here we are talking about clean code.
Architecture is not only correlated but causal in this situation.
It's over architecture is the problem. It's nano functions are a positive side effect but don't change the indirection problems you face.
Go on
PHP already is the "framework" and every time you load a page it's executing the script from scratch. You're wasting a lot of time loading a framework to handle control flow for your program which doesn't have any control flow in the first place.
And hopefully I can some constructive criticism :)
For something not from me but still in Java, the Proguard source code is very clean: https://sourceforge.net/p/proguard/code/ci/default/tree/