I'm looking for examples which fall along the lines of "fail to see the forest for the trees".
Tip: If you ever end up in a situation where you have to copy-paste code with minor changes then there is something that you are doing wrong. In this case using arrays and loops would be a much better solution.
(addShapedRecipe is just begging to have ASCII art as its canonical form)
Simple, clear structures with little repetition are ideal.
Copy pasted data is painful to work with but still relatively easy to maintain and fix.
Overly complex and unintuitive abstractions are the most difficult to maintain and improve.
The top pick goes to WooCommerce, although an open source E-commerce solution on top of Wordpress, it has some terrible decisions under the hood.
The top pick would go to mixing presentational logic with business logic. For example, to render a table, instead of exposing an array of objects to allow the developer to loop through it as he/she sees fit, WooCommerce will force you to use a PHP function that renders a table for you and there's actually no way to modify the presentation logic if you wanted to.
It's a really fundamental programming paradigm that even top open source companies fail to adhere to.
Again, I'm not saying this to attack them or the maintainers behind the code, just my opinion of why I think it's bad quality code while respecting the fact that developers still do take time and effort for us to enjoy something with freedom and zero cost.
WordPress and its plugins are most often cited as examples of bad code and to top it off it is written in PHP - a programming language hated by a lot of programmers.
Yet when it comes down to it, WP powers 33.6% of all websites on the internet. Just think for a second how big that number is!
So if the software gets the job done and the end-user can easily understand it, it really doesn't matter if you write it in what language, using what code patterns.
Revenue for the end-user? No way. Wordpress sites are high maintenance due to its enormous attack surface and dubious code quality. I would never even consider it as it will just be a liability I don't want to deal with.
You could say that about any language. PHP is loved by many programmers too.
An that is the exact reasons why it is hard to address implementing more tests, and addressing technical debt.
There is potential for change though. There is a project backed by the core devs (can't remember the name) which will score plugins by their code standard and show the score in the plugin directory.
I'd argue the only reason PHP still remains at the top is because no one has made X language to work out of the box with cpanel installations and run with little to no modification to run, lets say, django apps or RoR apps. Maybe if Python or any other language spent more time on improving that type of accessibility in the realm of web applications we could see PHP fragmenting in the later years...
I was reminded of that short-sighted decision every time I ripped a bunch of CDs and saw how importing song titles was not automatic because a dozen different discs had the same hash ids which resulted in collisions[1]. It ended up creating needless friction for millions that depended on that discid.
What's sad is I'm not even sure if one can extract any useful "lessons learned" from it! The programmer that wrote it was not an amateur script kiddie; he had a computer science degree from Uni California. Apparently, he didn't realize he was writing a flawed hash algorithm as he wrote it.
One could say that hash algorithms should be "peer reviewed". Well, he got unsolicited peer review that pointed how his homegrown hashing computation was flawed but he ignored the suggestion to improve it.
[0] >Ti Kan wanted to use a hash. He could have chosen something like CRC32, which would have given him a 32 bit number, yielding 4 billion unique IDs. Instead he wrote his own hash. [...] Ti Kan was made aware (not by me) of this problem back in 1994, and given a script to convert this format into a CRC32-based format, but he rejected it because the deployed base was too big. At that point it was probably in the high dozens. -- excerpt from http://quimby.gnus.org/circus/notes/cddb.html
[1] https://forums.macrumors.com/attachments/multiple-matches-jp...
The point was CDDB's non-invented-here home-grown hash algorithm was worse than CRC32. He didn't extract the maximum entropy from the discs' metadata of song times to minimize future collisions.
Or examples of projects that did things one way, but later refactored, and why they refactored.
the real meat of the matter comes when you are trying to make a change. is the structure robust? is there convenient tooling that helps you do what you need to do? does the system require extensive boilerplate to do simple things? does the system come crashing down in some unrelated area when you are trying to make simple changes?
it may be surprising, but large old codebases usually have huge hunks that serve no real purpose whatsoever except to glue together two pieces that would be much happier talking directly to one another.
I really wish as a community we could abandon the 70s business notion that software is a concrete artifact that one invests in and sells. its a really poor model. software is a process. code that is not being maintained is largely just dead. as developers we should be evaluating software as a living thing that responds to its environment...not as a shrink wrapped item we unbox and review on youtube.
And refactoring as almost always a must... unless you are stuck in a legacy support project where you are just hacking fixes away despite the glaring flaws and the client doesn't want to spend any more money on improving things.
http://cvsweb.openbsd.org/cgi-bin/cvsweb/~checkout~/src/usr....
https://github.com/coreutils/coreutils/blob/master/src/true....
That being said, the worst I ever saw was the in-house business nonsense I was paid to deal with as a Java consultant. The worst code isn't open source from my experience, subjecting it to public scrutiny would mean suicide for the companies involved.
As you say, I doubt many businesses would be willing to put this kind of code out there.
On the other hand I've been tinkering with curio for a while now and it's a fresh breath compared to that.
My trouble is that I still don't understand what makes the html2text thing "bad". What particular thing there caused me to not like working with it? I'm trying to understand that.
I've been book hunting + figuring out if it's something that I did not know which would have made the code a lot easier to work with (stuff got a lot easier after I cleaned up my set theory understanding)
- https://github.com/dabeaz/curio - https://github.com/Alir3z4/html2text
https://github.com/TheVamp/Terraria-Source-Code
It is another example of how even inelegant code full of hardcoded values can be successful.
https://github.com/vatt849/LibMinecraft/blob/master/LibMinec...
The whole library is a trip if you want to read a bunch of bad C#. Highlights:
- Generated documentation
- Giant switch/case instead of a more organized dispatch map
- Large swaths of commented code instead of using version control
- try...catch statements that just eat the errors
- Inconsistent code style
- This thing:
https://github.com/vatt849/LibMinecraft/blob/master/LibMinec...
I've written something similar from scratch since, which I'm still not entirely satisfied with, but is much better for reference:
https://github.com/ddevault/TrueCraft
The client-side networking code lives here:
https://github.com/ddevault/TrueCraft/blob/master/TrueCraft....
https://github.com/ddevault/TrueCraft/blob/master/TrueCraft....
https://github.com/ddevault/TrueCraft/tree/master/TrueCraft....
Notable improvements:
- Handwritten docs only where necessary
- Uses a stream implementation for decoding this particular wire format
- Has a different and better abstraction for reading packets out
Still has bad error handling though.
https://github.com/angelXwind/OpenSyobonAction/blob/master/m...
Having said that:
1) iText PDF library used to have some fairly poor & duplicated code. Column layout was a highlight. Also strange ideas overemphasizing subclasses, eg. for paragraph styles. (Correct approach: use values rather than types.)
2) Tomcat webserver back around 2007 used to have some amazing 'clustering' code to deploy your webapp across multiple servers. But it lacked proper knowledge & hence control of what it was doing. IIRC there was no clear master, and a server couldn't tell what had been started on it versus what had been replicated since a peer was seen to be running it. Effect: replication would be additive only, contexts would just replicate everywhere uncontrolled, and there was no good way to stop/ undeploy an app across the cluster.
OpenSSL is still an excellent example for very messy code where even maintainers / frequent contributors regularly get lost. Also a good example for designing many bad APIs and poor docs. libsodium is a good counterexample, although the internal structuring of the code base is a bit atypical, it is logical and consistent. (It does have some API idiosyncrasies which cater specifically to dynamic bindings, like providing a constant always as a #define/macro but also as an exported function; and it has a bit of an issue where you have both legacy APIs and newer APIs, but the docs are pretty clear on which is which).
BorgBackup is an example of how you don't want to mix C and Python code, and also contains various bits that only 1-2 people on the planet really bothered to understand, besides demonstrating other issues of organically grown code bases.
You have to turn that equation around. Even if all you know right now is the negative thing you don’t want, you have to figure out how to reframe that into the positive thing you do want. On,y then can you make positive progress towards that thing you want — and by the way, you will naturally avoid the things you don’t want by focusing on the things you do want.
Sure, examples of bad stuff can be instructive, but only so far as it helps you further clarify the good stuff you’re actually trying to achieve.
I think I'm going to need an example to understand this?
Having said that, one of the most informative programming books I've ever read was C Traps And Pitfalls. Flags common easily-made errors and explains them, which in turn fixes misconceptions about the language. I feel most languages could do with one.
All of us are stronger than any one of us. Long live Open Source!