Ask HN: Examples of bad open-source code to learn what to avoid?

66 pointstheSage7y ago72 comments

What are some of the bad examples of code that you have seen? Something you would want to avoid?

I'm looking for examples which fall along the lines of "fail to see the forest for the trees".

72 comments

53 comments · 27 top-level

AnaniasAnanas7y ago· 7 in thread

Here you go https://github.com/progwml6/Natura/blob/1.7.10/src/main/java...

Tip: If you ever end up in a situation where you have to copy-paste code with minor changes then there is something that you are doing wrong. In this case using arrays and loops would be a much better solution.

pjc507y ago

That looks nasty, but surprisingly hard to fix with loops because everything is of different types. If I were fixing it I'd look to some kind of code generation solution, even if just a hacky python script parsing a CSV.

(addShapedRecipe is just begging to have ASCII art as its canonical form)

AnaniasAnanas7y ago

Most of them are of the types button and item though.

thih97y ago

But: avoid premature optimisation, it could lead to overly complex abstractions.

Simple, clear structures with little repetition are ideal.

Copy pasted data is painful to work with but still relatively easy to maintain and fix.

Overly complex and unintuitive abstractions are the most difficult to maintain and improve.

ilaksh7y ago

But it's very repetitive code and uses identifiers like "var3" and "var4".

allenskd7y ago

The function addShapedRecipeFirst (List recipeList, ItemStack itemstack, Object... objArray) is really doing a number on me... gonna be fun not touching that code for a few months and when you come back you don't even know what the heck you are doing

ksaj7y ago

I've never seen so many typographic rivers in source code before. Almost reminds me of a cross-eyed holograph.

thrower1237y ago

Good god, my eyes, they burn.

neya7y ago· 6 in thread

I feel like this could lead to very opinionated, non-constructive comments/flame, but if were to give you an example, I'd suggest taking a look at Wordpress eco-system. While Wordpress core's codebase has improved significantly over the years, some of the plugins haven't.

The top pick goes to WooCommerce, although an open source E-commerce solution on top of Wordpress, it has some terrible decisions under the hood.

The top pick would go to mixing presentational logic with business logic. For example, to render a table, instead of exposing an array of objects to allow the developer to loop through it as he/she sees fit, WooCommerce will force you to use a PHP function that renders a table for you and there's actually no way to modify the presentation logic if you wanted to.

It's a really fundamental programming paradigm that even top open source companies fail to adhere to.

Again, I'm not saying this to attack them or the maintainers behind the code, just my opinion of why I think it's bad quality code while respecting the fact that developers still do take time and effort for us to enjoy something with freedom and zero cost.

superasn7y ago

An indirect thing you can learn from this example is that how little code quality matters when it comes to product popularity or revenue.

WordPress and its plugins are most often cited as examples of bad code and to top it off it is written in PHP - a programming language hated by a lot of programmers.

Yet when it comes down to it, WP powers 33.6% of all websites on the internet. Just think for a second how big that number is!

So if the software gets the job done and the end-user can easily understand it, it really doesn't matter if you write it in what language, using what code patterns.

Nextgrid7y ago

Revenue for the company behind Wordpress itself? Maybe.

Revenue for the end-user? No way. Wordpress sites are high maintenance due to its enormous attack surface and dubious code quality. I would never even consider it as it will just be a liability I don't want to deal with.

1 more reply

regularjack7y ago

> to top it off it is written in PHP - a programming language hated by a lot of programmers.

You could say that about any language. PHP is loved by many programmers too.

4 more replies

sebazzz7y ago

> An indirect thing you can learn from this example is that how little code quality matters when it comes to product popularity or revenue.

An that is the exact reasons why it is hard to address implementing more tests, and addressing technical debt.

sleavey7y ago

Agreed. The core code is actually pretty well written considering the concrete backwards compatibility standards enforced by the devs, but most plugins are a total mess (and I'm saying that as a plugin author). I guess it is mainly due to PHP's low bar for entry and PHP's historical (but arguably not now) lack of enforcement of good programming practices.

There is potential for change though. There is a project backed by the core devs (can't remember the name) which will score plugins by their code standard and show the score in the plugin directory.

allenskd7y ago

I feel it's not that PHP is low bar but rather widely accessible out of the box on almost all web hosting services... you don't have to set up a lot of stuff before running your web application.

I'd argue the only reason PHP still remains at the top is because no one has made X language to work out of the box with cpanel installations and run with little to no modification to run, lets say, django apps or RoR apps. Maybe if Python or any other language spent more time on improving that type of accessibility in the realm of web applications we could see PHP fragmenting in the later years...

2 more replies

jasode7y ago· 4 in thread

An example of bad code that always stuck with me was the flawed CDDB disc id hash algorithm.[0]

I was reminded of that short-sighted decision every time I ripped a bunch of CDs and saw how importing song titles was not automatic because a dozen different discs had the same hash ids which resulted in collisions[1]. It ended up creating needless friction for millions that depended on that discid.

What's sad is I'm not even sure if one can extract any useful "lessons learned" from it! The programmer that wrote it was not an amateur script kiddie; he had a computer science degree from Uni California. Apparently, he didn't realize he was writing a flawed hash algorithm as he wrote it.

One could say that hash algorithms should be "peer reviewed". Well, he got unsolicited peer review that pointed how his homegrown hashing computation was flawed but he ignored the suggestion to improve it.

[0] >Ti Kan wanted to use a hash. He could have chosen something like CRC32, which would have given him a 32 bit number, yielding 4 billion unique IDs. Instead he wrote his own hash. [...] Ti Kan was made aware (not by me) of this problem back in 1994, and given a script to convert this format into a CRC32-based format, but he rejected it because the deployed base was too big. At that point it was probably in the high dozens. -- excerpt from http://quimby.gnus.org/circus/notes/cddb.html

[1] https://forums.macrumors.com/attachments/multiple-matches-jp...

[2] wiki: https://en.wikipedia.org/wiki/CDDB#How_CDDB_works

the84727y ago

That advice is wrong too, a 32bit number would have been insufficient due to the birthday problem.

jasode7y ago

I don't think any reasonable person would expect zero collisions. The weakness of CRC32 for even distribution of hash values was well-known. (CRC32's goal was a fast "checksum" instead of strong cryptographic hash.)

The point was CDDB's non-invented-here home-grown hash algorithm was worse than CRC32. He didn't extract the maximum entropy from the discs' metadata of song times to minimize future collisions.

planteen7y ago

Exactly. Assuming 10 songs per CD, you should see your first collision after around 6500 CDs. If he did CRC64, it would be after 400 million CDs.

1 more reply

justaj7y ago

Why not use a SHA256 (or SHA512) then?

1 more reply

spion7y ago· 2 in thread

How about examples of open source code to learn whats really, really good, together with why it was designed that way? Seems like that would be way more useful.

Or examples of projects that did things one way, but later refactored, and why they refactored.

convolvatron7y ago

i dont think you really get very deep into it by reading code without working with it. sure, there are surface syntactic niceties one can bikeshed.

the real meat of the matter comes when you are trying to make a change. is the structure robust? is there convenient tooling that helps you do what you need to do? does the system require extensive boilerplate to do simple things? does the system come crashing down in some unrelated area when you are trying to make simple changes?

it may be surprising, but large old codebases usually have huge hunks that serve no real purpose whatsoever except to glue together two pieces that would be much happier talking directly to one another.

I really wish as a community we could abandon the 70s business notion that software is a concrete artifact that one invests in and sells. its a really poor model. software is a process. code that is not being maintained is largely just dead. as developers we should be evaluating software as a living thing that responds to its environment...not as a shrink wrapped item we unbox and review on youtube.

allenskd7y ago

I think someone starting, or intermediate, or perhaps advanced won't be able to tell what's good or bad... even I have problems sometimes identifying what would be the best approach of implementing a good design and could end up implementing a bad one easily.

And refactoring as almost always a must... unless you are stuck in a legacy support project where you are just hacking fixes away despite the glaring flaws and the client doesn't want to spend any more money on improving things.

ddebernardy7y ago· 2 in thread

OpenSSL (before HeartBleed) springs to mind.

https://news.ycombinator.com/item?id=7640378

blattimwind7y ago

OpenSSL code and docs are still a huge mess.

shoo7y ago

banter and patches from OpenBSD here: https://opensslrampage.org/

ben_bai7y ago· 1 in thread

Always a crowd-pleaser: OpenBSD true.c vs. GNU's true.c

http://cvsweb.openbsd.org/cgi-bin/cvsweb/~checkout~/src/usr....

https://github.com/coreutils/coreutils/blob/master/src/true....

k4ch0w7y ago

Simple is better imo, openbsd wins.

codr77y ago· 1 in thread

It doesn't work, energy follows thought and it makes no sense to focus it on what you want to avoid. Take or leave.

That being said, the worst I ever saw was the in-house business nonsense I was paid to deal with as a Java consultant. The worst code isn't open source from my experience, subjecting it to public scrutiny would mean suicide for the companies involved.

SmellyGeekBoy7y ago

This matches my experience. Open source projects tend to get the worst parts fixed. It's in-house applications, usually written in VB / Delphi / Java, that have been supported and added to over the past 20 years where the true horrors lie.

As you say, I doubt many businesses would be willing to put this kind of code out there.

kissgyorgy7y ago· 1 in thread

That's a really bad idea. You need to have a good counterexample, otherwise it's just wasting time at the best case. A lot of people learn from really bad codebases and picking up the same style which is terrible. You should look at GOOD codebases instead!

theSageOP7y ago

I did some work on html2text when I started off and while the little parts did make sense, the whole library was confusing for me. I couldn't change anything without breaking tests.

On the other hand I've been tinkering with curio for a while now and it's a fresh breath compared to that.

My trouble is that I still don't understand what makes the html2text thing "bad". What particular thing there caused me to not like working with it? I'm trying to understand that.

I've been book hunting + figuring out if it's something that I did not know which would have made the code a lot easier to work with (stuff got a lot easier after I cleaned up my set theory understanding)

- https://github.com/dabeaz/curio - https://github.com/Alir3z4/html2text

otras7y ago· 1 in thread

If you’re interested in a case of unnecessary optimization and effort, the infamous left-pad npm library has been refactored to only add to the string O(log(n)) times. It is short but not sweet.

https://github.com/left-pad/left-pad#readme

ksaj7y ago

I think it needs more // comments. Hilarious.

jstarfish7y ago· 1 in thread

The source for Terraria is notoriously terrible.

https://github.com/TheVamp/Terraria-Source-Code

It is another example of how even inelegant code full of hardcoded values can be successful.

czr7y ago

(Note that this is decompiled source – the original probably at least has comments here and there)

Sir_Cmpwn7y ago

Here's some old code of mine:

https://github.com/vatt849/LibMinecraft/blob/master/LibMinec...

The whole library is a trip if you want to read a bunch of bad C#. Highlights:

- Generated documentation

- Giant switch/case instead of a more organized dispatch map

- Large swaths of commented code instead of using version control

- try...catch statements that just eat the errors

- Inconsistent code style

- This thing:

https://github.com/vatt849/LibMinecraft/blob/master/LibMinec...

I've written something similar from scratch since, which I'm still not entirely satisfied with, but is much better for reference:

https://github.com/ddevault/TrueCraft

The client-side networking code lives here:

https://github.com/ddevault/TrueCraft/blob/master/TrueCraft....

https://github.com/ddevault/TrueCraft/tree/master/TrueCraft....

Notable improvements:

- Handwritten docs only where necessary

- Uses a stream implementation for decoding this particular wire format

- Has a different and better abstraction for reading packets out

Still has bad error handling though.

slezyr7y ago

See Syobon Action it's source code as bad as the game itself.

https://github.com/angelXwind/OpenSyobonAction/blob/master/m...

twhitmore7y ago

Libraries can be useful despite imperfections, and poor design decisions can occur in overall good libraries. So we can't judge too harshly.

Having said that:

1) iText PDF library used to have some fairly poor & duplicated code. Column layout was a highlight. Also strange ideas overemphasizing subclasses, eg. for paragraph styles. (Correct approach: use values rather than types.)

2) Tomcat webserver back around 2007 used to have some amazing 'clustering' code to deploy your webapp across multiple servers. But it lacked proper knowledge & hence control of what it was doing. IIRC there was no clear master, and a server couldn't tell what had been started on it versus what had been replicated since a peer was seen to be running it. Effect: replication would be additive only, contexts would just replicate everywhere uncontrolled, and there was no good way to stop/ undeploy an app across the cluster.

type07y ago

"How to Make Mistakes in Python" - https://www.oreilly.com/library/view/how-to-make/97814920482...

blattimwind7y ago

Drupal 6/7 would be an example for "widely used [at the time] but pretty bad". I don't know how many of the issues across all layers were addressed in later versions.

OpenSSL is still an excellent example for very messy code where even maintainers / frequent contributors regularly get lost. Also a good example for designing many bad APIs and poor docs. libsodium is a good counterexample, although the internal structuring of the code base is a bit atypical, it is logical and consistent. (It does have some API idiosyncrasies which cater specifically to dynamic bindings, like providing a constant always as a #define/macro but also as an exported function; and it has a bit of an issue where you have both legacy APIs and newer APIs, but the docs are pretty clear on which is which).

BorgBackup is an example of how you don't want to mix C and Python code, and also contains various bits that only 1-2 people on the planet really bothered to understand, besides demonstrating other issues of organically grown code bases.

bradknowles7y ago

There are an infinite variety of things that you don’t want. If you focus all your energy on those things, then you won’t have anything left to do the positive things you do want.

You have to turn that equation around. Even if all you know right now is the negative thing you don’t want, you have to figure out how to reframe that into the positive thing you do want. On,y then can you make positive progress towards that thing you want — and by the way, you will naturally avoid the things you don’t want by focusing on the things you do want.

Sure, examples of bad stuff can be instructive, but only so far as it helps you further clarify the good stuff you’re actually trying to achieve.

superpermutat0r7y ago

I've always found Calibre to be a huge mess.

arthev7y ago

While usually focusing on smaller snippets, thedailywtf.com is a site about (mostly) code wtfs. A fair number of management wtfs too, though.

pjc507y ago

> fall along the lines of "fail to see the forest for the trees"

I think I'm going to need an example to understand this?

Having said that, one of the most informative programming books I've ever read was C Traps And Pitfalls. Flags common easily-made errors and explains them, which in turn fixes misconceptions about the language. I feel most languages could do with one.

ksaj7y ago

Here is something atrocious I wrote in Lisp. You actually can make Lisp ugly! Who knew?

https://github.com/ksaj/Capitalize.Lisp

RickJWagner7y ago

Just a side note-- this is the beauty of Open Source. If there is some bad code (we all write it), it can be improved with a little help from Open Source "friends".

All of us are stronger than any one of us. Long live Open Source!

craftoman7y ago

Many JavaScript libraries like Fastify (Node.js) for example. You always get a nice & clean API but if you look under the hood you would be amazed at how much spaghetti code can be written in a project.

frostburg7y ago

Praat: https://github.com/praat/praat I'm not sure that this is a good way to learn anything, however.

sam_lowry_7y ago

Jgit is amazingly bad for a piece of software built on top of well thought out data structures if git. Some if its flaws could be attributed to Java IO design, though.

peterwwillis7y ago

Anything related to OpenStack, but particularly jenkins-job-builder is rather horrible.

jxub7y ago

Many, if not most OSS packages which are released by academics or universities.

yamann7y ago

https://github.com/mholt/caddy not only a mediocre code, but the guy behind it received lots of money from Mozilla as an innocent promising open source project author, then he made it as a paid product.

j / k navigate · click thread line to collapse

72 comments

53 comments · 27 top-level

AnaniasAnanas7y ago· 7 in thread

Here you go https://github.com/progwml6/Natura/blob/1.7.10/src/main/java...

pjc507y ago

(addShapedRecipe is just begging to have ASCII art as its canonical form)

AnaniasAnanas7y ago

Most of them are of the types button and item though.

thih97y ago

But: avoid premature optimisation, it could lead to overly complex abstractions.

Simple, clear structures with little repetition are ideal.

Copy pasted data is painful to work with but still relatively easy to maintain and fix.

Overly complex and unintuitive abstractions are the most difficult to maintain and improve.

ilaksh7y ago

But it's very repetitive code and uses identifiers like "var3" and "var4".

allenskd7y ago

ksaj7y ago

I've never seen so many typographic rivers in source code before. Almost reminds me of a cross-eyed holograph.

thrower1237y ago

Good god, my eyes, they burn.

neya7y ago· 6 in thread

The top pick goes to WooCommerce, although an open source E-commerce solution on top of Wordpress, it has some terrible decisions under the hood.

It's a really fundamental programming paradigm that even top open source companies fail to adhere to.

superasn7y ago

An indirect thing you can learn from this example is that how little code quality matters when it comes to product popularity or revenue.

WordPress and its plugins are most often cited as examples of bad code and to top it off it is written in PHP - a programming language hated by a lot of programmers.

Yet when it comes down to it, WP powers 33.6% of all websites on the internet. Just think for a second how big that number is!

So if the software gets the job done and the end-user can easily understand it, it really doesn't matter if you write it in what language, using what code patterns.

Nextgrid7y ago

Revenue for the company behind Wordpress itself? Maybe.

1 more reply

regularjack7y ago

> to top it off it is written in PHP - a programming language hated by a lot of programmers.

You could say that about any language. PHP is loved by many programmers too.

4 more replies

sebazzz7y ago

> An indirect thing you can learn from this example is that how little code quality matters when it comes to product popularity or revenue.

An that is the exact reasons why it is hard to address implementing more tests, and addressing technical debt.

sleavey7y ago

There is potential for change though. There is a project backed by the core devs (can't remember the name) which will score plugins by their code standard and show the score in the plugin directory.

allenskd7y ago

I feel it's not that PHP is low bar but rather widely accessible out of the box on almost all web hosting services... you don't have to set up a lot of stuff before running your web application.

2 more replies

jasode7y ago· 4 in thread

An example of bad code that always stuck with me was the flawed CDDB disc id hash algorithm.[0]

[1] https://forums.macrumors.com/attachments/multiple-matches-jp...

[2] wiki: https://en.wikipedia.org/wiki/CDDB#How_CDDB_works

the84727y ago

That advice is wrong too, a 32bit number would have been insufficient due to the birthday problem.

jasode7y ago

The point was CDDB's non-invented-here home-grown hash algorithm was worse than CRC32. He didn't extract the maximum entropy from the discs' metadata of song times to minimize future collisions.

planteen7y ago

Exactly. Assuming 10 songs per CD, you should see your first collision after around 6500 CDs. If he did CRC64, it would be after 400 million CDs.

1 more reply

justaj7y ago

Why not use a SHA256 (or SHA512) then?

1 more reply

spion7y ago· 2 in thread

How about examples of open source code to learn whats really, really good, together with why it was designed that way? Seems like that would be way more useful.

Or examples of projects that did things one way, but later refactored, and why they refactored.

convolvatron7y ago

i dont think you really get very deep into it by reading code without working with it. sure, there are surface syntactic niceties one can bikeshed.

allenskd7y ago

ddebernardy7y ago· 2 in thread

OpenSSL (before HeartBleed) springs to mind.

https://news.ycombinator.com/item?id=7640378

blattimwind7y ago

OpenSSL code and docs are still a huge mess.

shoo7y ago

banter and patches from OpenBSD here: https://opensslrampage.org/

ben_bai7y ago· 1 in thread

Always a crowd-pleaser: OpenBSD true.c vs. GNU's true.c

http://cvsweb.openbsd.org/cgi-bin/cvsweb/~checkout~/src/usr....

https://github.com/coreutils/coreutils/blob/master/src/true....

k4ch0w7y ago

Simple is better imo, openbsd wins.

codr77y ago· 1 in thread

It doesn't work, energy follows thought and it makes no sense to focus it on what you want to avoid. Take or leave.

SmellyGeekBoy7y ago

As you say, I doubt many businesses would be willing to put this kind of code out there.

kissgyorgy7y ago· 1 in thread

theSageOP7y ago

I did some work on html2text when I started off and while the little parts did make sense, the whole library was confusing for me. I couldn't change anything without breaking tests.

On the other hand I've been tinkering with curio for a while now and it's a fresh breath compared to that.

My trouble is that I still don't understand what makes the html2text thing "bad". What particular thing there caused me to not like working with it? I'm trying to understand that.

- https://github.com/dabeaz/curio - https://github.com/Alir3z4/html2text

otras7y ago· 1 in thread

If you’re interested in a case of unnecessary optimization and effort, the infamous left-pad npm library has been refactored to only add to the string O(log(n)) times. It is short but not sweet.

https://github.com/left-pad/left-pad#readme

ksaj7y ago

I think it needs more // comments. Hilarious.

jstarfish7y ago· 1 in thread

The source for Terraria is notoriously terrible.

https://github.com/TheVamp/Terraria-Source-Code

It is another example of how even inelegant code full of hardcoded values can be successful.

czr7y ago

(Note that this is decompiled source – the original probably at least has comments here and there)

Sir_Cmpwn7y ago

Here's some old code of mine:

https://github.com/vatt849/LibMinecraft/blob/master/LibMinec...

The whole library is a trip if you want to read a bunch of bad C#. Highlights:

- Generated documentation

- Giant switch/case instead of a more organized dispatch map

- Large swaths of commented code instead of using version control

- try...catch statements that just eat the errors

- Inconsistent code style

- This thing:

https://github.com/vatt849/LibMinecraft/blob/master/LibMinec...

I've written something similar from scratch since, which I'm still not entirely satisfied with, but is much better for reference:

https://github.com/ddevault/TrueCraft

The client-side networking code lives here:

https://github.com/ddevault/TrueCraft/blob/master/TrueCraft....

https://github.com/ddevault/TrueCraft/tree/master/TrueCraft....

Notable improvements:

- Handwritten docs only where necessary

- Uses a stream implementation for decoding this particular wire format

- Has a different and better abstraction for reading packets out

Still has bad error handling though.

slezyr7y ago

See Syobon Action it's source code as bad as the game itself.

https://github.com/angelXwind/OpenSyobonAction/blob/master/m...

twhitmore7y ago

Libraries can be useful despite imperfections, and poor design decisions can occur in overall good libraries. So we can't judge too harshly.

Having said that:

type07y ago

"How to Make Mistakes in Python" - https://www.oreilly.com/library/view/how-to-make/97814920482...

blattimwind7y ago

Drupal 6/7 would be an example for "widely used [at the time] but pretty bad". I don't know how many of the issues across all layers were addressed in later versions.

bradknowles7y ago

There are an infinite variety of things that you don’t want. If you focus all your energy on those things, then you won’t have anything left to do the positive things you do want.

Sure, examples of bad stuff can be instructive, but only so far as it helps you further clarify the good stuff you’re actually trying to achieve.

superpermutat0r7y ago

I've always found Calibre to be a huge mess.

arthev7y ago

While usually focusing on smaller snippets, thedailywtf.com is a site about (mostly) code wtfs. A fair number of management wtfs too, though.

pjc507y ago

> fall along the lines of "fail to see the forest for the trees"

I think I'm going to need an example to understand this?

ksaj7y ago

Here is something atrocious I wrote in Lisp. You actually can make Lisp ugly! Who knew?

https://github.com/ksaj/Capitalize.Lisp

RickJWagner7y ago

Just a side note-- this is the beauty of Open Source. If there is some bad code (we all write it), it can be improved with a little help from Open Source "friends".

All of us are stronger than any one of us. Long live Open Source!

craftoman7y ago

frostburg7y ago

Praat: https://github.com/praat/praat I'm not sure that this is a good way to learn anything, however.

sam_lowry_7y ago

Jgit is amazingly bad for a piece of software built on top of well thought out data structures if git. Some if its flaws could be attributed to Java IO design, though.

peterwwillis7y ago

Anything related to OpenStack, but particularly jenkins-job-builder is rather horrible.

jxub7y ago

Many, if not most OSS packages which are released by academics or universities.

yamann7y ago

j / k navigate · click thread line to collapse