In this case there's, I think, a better alternative; the equivalent-ish code in Ruby for the example code here would be something like this:
values = s
.partition('?')[-1]
.split('&')
.map { |key_value| key_value.partition('=')[-1] }
You can write these nice functional pipelines where you just read the code top-to-bottom and see step-by-step what is being done to the data on each line. You don't have to jump up-and-down around the code when reading it, and you don't have to keep too much context in your head when reading it.This is one of the reasons why I vastly prefer Ruby over Python for most data processing tasks. I wish more languages would support this style of programming.
The idea of the technique is to split out code at a different level of abstraction with a clear name communicating what it does, while hiding the details of the how, because you don't need to care about that detail at all to fully grok the code in the calling function.
Where this breaks down is when the code you're trying to split out is not at a different level of abstraction, and how it works is meaningful to the surrounding code in the calling function.
So I think the issue you are seeing isn't with the technique, it's with the technique being misapplied. I think this is likely the only difference between when it 'often works' and 'just as often doesn't' in the code you're working in :-)
Someone that applies "MORF" to their code winds up nearly inventing their own language in the file that they're writing. All that takes up more memory when you're reading their code, because due to leaky abstractions the actual implementation of whatever the function name that you replace it with is often important.
I have an actual track record of taking code that someone had MORF'd to hell and rewriting it, and making it about 40% shorter, with much fewer concepts to process.
Inventing a term like "MORF" is probably illustrative of the problem itself. Without looking at the blog post what exactly was that acronym again? That is just one more thing for you to try to memorize. The author is riffing on things like "DRY" and "YAGNI" that are well-known, but it isn't really helping with readability when you lift it out of that context.
It's also definitely related to testability for me. If I'm pulling out the right details, then I'll often get a nice cluster of tests that pin down the higher-level concept in a way where it's both the production and test code that gets more readable.
In practice, pithy adages don't get us any closer to sanity
Really the only news you can use is
1. Try to modify things with code you didn't write by not simply throwing parts away
2. If you think it's really difficult to deal with, figure out if other people agree with you.
3. understand why everybody thinks this
4. If you are doing that in your own code, then stop doing that.
You have to viscerally understand why a practice is bad and how doing it affects other people.
This is how you can intuitively avoid such practices in the future. Not through things that rhyme or acronyms that spell words but through social intelligence. It's fundamentally behavior
* great ability at naming methods - which isn't very common * massive discipline at RE-naming methods when they get changed even slightly to do something more
Regarding the latter, I've seen so many times methods which originally described what they were doing, but now they don't any more, that I never just trust the name to tell me what a method really does.
Maybe not relevant in simple toy examples, but you don't have to look far until to find a function that isn't so easy to name.
On the whole I think such functions are valuable. But they do have downsides.
.map { | *String* key_value | key_value.partition... }
where String shows up in light grey text indicating that IJ knows `key_value` is a StringI have far less issue with the piping operator in Ocaml (|>) which works exactly like a shell pipe because the code is in sequential order and that makes a huge difference.
Python like JS weren’t designed as functional languages and it shows in their syntax. Still grateful for the added functionality however.
The basic argument is that any time you do an extract refactor you're creating a new layer of abstraction that the next reader will have to learn and understand. This can also get worse over time as the abstractions drift away from their original purpose.
The solution she provides is to be okay with a little bit of duplication, then as patterns naturally arise in the codebase you can refactor when you know a few use cases and can clearly define the concept.
[1]: https://sandimetz.com/blog/2016/1/20/the-wrong-abstraction
import std.algorithm, std.array, std.stdio;
// Print sorted lines of a file.
void main()
{
auto sortedLines = File("file.txt") // Open for reading
.byLineCopy() // Read persistent lines
.array() // into an array
.sort(); // then sort them
foreach (line; sortedLines)
writeln(line);
} class Motor:
def __init__(self):
self.max_speed = 10.0
self.clockwise = True
self.controller = Controller()
def set_max_speed(self, max_speed: float=10.0) -> 'Self':
self.max_speed = max(0.0, max_speed)
return self
def start(self) -> 'Self':
self.controller.start()
return self
etc..
Then you can use it as follows: motor = Motor().set_max_speed(5.0).start()
There is nothing stopping you from building iterators that work the same way values = (
key_value.partition('=')[-1]
for key_value in
s.partition('?')[-1].split('&')
) .map { |key_value|
key_value.partition('=')[-1] }
Reading this literally makes me sick to my stomach. Language design is much more important than language popularity, although it will be popularity that wins. (Yay downvotes for pointing out things everyone can see - highschool dynamics)This sounds like a Windows user who can’t stand macOS because they don’t know where anything is.
Your post downvote edit assumes your opinion here is objective. It isn’t.
Here, "query_params" means "extract the last three query parameters, raw (i.e. not unescaped and not broken into key-value pairs)." The transformation shown makes precisely nothing more readable or easy to understand. "The second argument to map()" is just as easy for your brain to group into a black box to be analyzed later as a call to an opaque "query_params" function that you need to read the implementation of to really understand what the code is actually doing.
Of course sometimes it's the best solution to just extract local helper functions, especially if the actual function just becomes too unwieldy and/or the helpers are called from more than one place, but in general I try to extract things that do something more general than the thing I'm extracting it from and have an interface / a purpose that's easy to understand and describe on its own.
To stay with the example, actually extracting the query parameters would be a generic, extractable utility. Half-extracting the last three parameters because the function I'm writing needs precisely that for some reason, is a local helper function, and I'd only extract it if there's a good reason, certainly not to make an already trivial function no easier to read.
It is hard to know whether the principle is valuable with such a weird example. In this example there are lots of other ways it could have been done more meaningfully but, again, don't know if the example is real.
I suppose the function to parse the query string could have been better, its name isn't very descriptive, and the method with which it parsed wasn't very obvious either (I'd expect to get back a dict or a list of key/value tuples, not a list of strings)
I know a lot of programmers are against comments, but I also think this is exactly the kind of code where a comment is handy..., the purpose of the [-3:] part wasn't obvious to me at all.
You don't really need short names. I wouldn't advocate going full Java naming but trying to compress names just to save a bit of typing is unnecessary. Your IDE will help you out. Just learn to press tab when you've entered enough of the name instead of typing the whole thing.
It's not a trivial problem but it's not a hard one. It's just that most people don't even try to dedicate a sliver of active brain power to the task because they don't deem it worth it even if they claim to agree on the importance of readability.
TFA missed the point of splitting complex expressions into separate lines: naming the single-use vars clearly makes the whole calculation easy to follow. In the example given, nothing about the one-line `map split over split of split` tells a reader that it's parsing a query string – just splitting it in two and naming the temp var `query_params` makes it clear.
Although `last_3_query_params` would be more precise, and something that explains why TF you'd want that would be better... ツ
> “Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.”
It clicked and instantly disabused me of the notion that smart people write code that's any smarter than the minimum required to solve the problem at hand.
Comments being required are also another smell that the code doesn't explain itself. I know this is said so often it's a cliche, but it really is true.
I think both of these things point back to the same problem: out of control data structures.
There's a real danger that comments aren't updated when code is, particularly if 3rd parties make those changes. This is one of the corners where there will never be a single answer which is right in all circumstances.
As an example, given:
const sales = [ {month: "Jan", day: 1, total: 120 }, ... ]
You could determine, say, the highest sales day of a given month as follows: const highestSalesDayByMonth = _.chain(sales)
.groupBy("month")
.mapValues((salesForMonth) => _.maxBy(salesForMonth, "total"))
.mapValues("total")
.value()
// highestSalesDayByMonth = { Jan: 140, Feb: 90, ... }
Naturally, minimizing the complexity of the iteratee functions and carefully naming of their arguments is very important to ease debuggability.Concision can be used for emphasis as verbosity can be used to obscure.
> Instead, I find myself having to re-write ultra-dense blobs of code in order to debug or even simply understand what's going on.
Is this problem because their method is inherently nee complex or is it due to lack of familiarity?
Perhaps they debug that code differently then you do and it's incompatible with your previous mental model?
I believe, definitely, that they are quite intelligent people that know a lot about the language or maths, but definitely they are not usually making the smartest choice, because you should use “languages” in order for the people to understand you.
So you can call those: “50 cent expressions”
They help no one but their ego…
Typically more concise code takes advantage of core language abstractions.
It is actually simpler unless you are unfamilar with core language abstractions, but I'd argue that's a you problem.
He was a much beloved professor.
I have learned to take a step back when I find myself having a hard time to get the code "clean". Often I put that thing to rest if possible, maybe for days, months, or even years. There could be a simple solution that solves 80% of the problem, and that can ease the pressure coming from the stakeholders. If it kind-of-works and can be produced in a short time, that is much better than going down a rabbit hole for months, coming out at the other side (probably burnt out) with a solution you can't deploy because it's too complicated.
-- Fred Brooks, The Mythical Man-Month, 1975
> a computer language is not just a way of getting a computer to perform operations but rather that it is a novel formal medium for expressing ideas about methodology. Thus, programs must be written for people to read and only incidentally for machines to execute.
-- Abelson & Sussman, The Structure and Interpretation of Computer Programs, 1984
In fact, Gauss Jordan is also not obvious, just seeing a matrix.
In that context, I don't think this is superficial at all. If your code is hard to read, I suspect your design is hard to read too.
Sure, for super important architecture decisions you have to make a few top-down (spatial partitioning structures, database decisions, network architecture, etc.), but I think it's generally better to late-bind on those decisions if you can.
I know the semantic compression post, and I don't think it makes a point for bottom-up design. I find top-down and bottom-up to be quite misleading anyway. Someone told me, they don't like to think of things at the "top" and the "bottom". It's data transformations, maybe more like "left to right".
If you design bottom-up, you end up with lots of artifacts you never needed (and likely still missing the ones you can make use of). If you design top-down, you end up with lots of code you don't need. (this is where semantic compression comes in, in my understanding).
I suspect that if you like to think bottom-up, maybe that's because you like it more at the bottom (you are a low-level type of guy, or like to make libraries). If you like to think top-down, maybe you like it more at the top.
I like the semantic compression term because it reduces the act of design to the essentials, without introducing fluff terms or opinions. I find myself doing this compression no matter what kind of code I'm writing.
a) 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1
b) 20
If all the code in your project was written like a), how would you feel? Does it make your job easier or harder?
I'll tell you how most people feel when they read code that looks like a):
- The author didn't care about other maintainers.
- The author is selfish and does not have empathy for others.
- The author ruined my fucking day.
- Team members are competing by sabotaging each other's productivity.
- If I clean this, by the time I am done, the author would have pushed 10 more commits that look exactly like this and eventually become my boss.
- The author is wasting everyone's time.
- The author is forcing others to volunteer to clean up after them.
- Why does management tolerate code following the a) style? A simple intervention would make it go away and my job would be so much better.
- It's sad that everyone is too busy looking at Jira and nobody cares about the actual fucking product.
- This is slowing everyone down and I have stuff to do.
- The code is error prone, one day I'll break it.
- Why should I contribute quality code if low quality is acceptable?
As you can see, it objectively fucking sucks. It's draining, demoralizing to read, it's frustrating, it wastes people's time, it gives people the perception that nobody fucking cares and the code is everyone's toilet with no trip lever.
And while it's "superficial", it's the surface that all engineers interact with. If I spread superglue over the surface of your kitchen counter and every dish and utensil in your kitchen every day around lunch time, that problem will also be "superficial", but it will ruin your life.
So, the conclusion is: Just fucking write clean code. Shitty code ruins the morale of people who care, who are the people that want to build great things not the ones cashing a paycheck and resting and vesting.
You are not a full-time architect, you are not in business development/marketing/finance or whatever, you are in the fucking engineering department. Your contribution to the business are your deliverables. The "superficial" stuff you talk about is your job. Do it.
"Ah ah ah, you didn't say the magic word!! ah ah ah!" Don't be the fucking Dennis Nedry of the team. Format your code, make it readable by your team and your future self.
Do you want everyone to love you? Write code like this:
# Add the left & right margins
width = calculatedWidth + 1 + 1
The "+1+1" might indeed be more readable than "+2".- The size of the short-term store is normally said to be 7 plus or minus 2 (the 'magic' number 7)
- The Working Memory model has somewhat overtaken the 'short term' memory model, and it is unusual to see them being presented alongside each other like this (though 'short term memory' remains a useful, good-enough metaphor for explaining certain key aspects of memory)
- Chunking is typically viewed as a memory-supported division of stimuli (what you're reading, hearing etc.) into meaningful units based on LTM memory representations. A good example is a chess expert 'chunking' the layout of a chess board with many pieces in perhaps one or two units (e.g. 'It's the mid game configuration of [famous players] in [famous game], except the king's position is different'). We would expect more expert programmers to 'chunk' increasingly large units, I think (e.g. 'Oh, this is just the [famous sorting algorithm]').
- A single chunk is usually considered to take up a 'slot' in short term memory
If anyone wants papers/sources for the above, let me know.
I'm traditional wide comp sci by academic training, but spend my day job as a low-code enabler for non-programmers with varied backgrounds.
The working memory model explains and fits well with what I see them get and struggle with in day to day work, and I'd welcome references I could use to optimize my approach.
The Wikipedia entry for WM is also very good: https://en.wikipedia.org/wiki/Working_memory
It's a bit tricky to tell what you're doing exactly - perhaps drop me an email if you have any queries (using this account; my original parent post was on a throwaway account because I had login problems).
1. Only One Level Of Indentation PerMethod
2. Don’t Use The ELSE Keyword
3. Wrap All Primitives And Strings
4. First Class Collections
5. One Dot Per Line
6. Don’t Abbreviate
7. Keep All Entities Small
8. No Classes With More Than Two InstanceVariables
9. No Getters/Setters/Properties
Gah. I've seen the other side of this, a few people far too trigger happy to make FivePlusVeryLongNounVO/DTO for every little thing, and it gave me some new appreciation towards tuples and primitives. Sometimes you really don't want to go into another new file for an object type which is used in only one specific place. Especially with
>Don’t Abbreviate
Meaning the variable name will end up long anyway. With tuples, you get deconstruction without the hassle, too.
The rules are an exercise for a toy project. Like all similar 'rules' they are just hints to make you think. When done with the exercise, and you see a string with a social security number in a production code, you may consider creating a dedicated SocialSecurityNumber class. The class will guarantee a well formed social security number according to official rules. The class may even offer Area, Group and Serial parts of the social as separate fields. The class may decide to use a string or integers internally, but that would never be exposed to the class consumers. All the code that uses SocialSecurityNumber will not have to guess whether string is valid, if it has dashes etc. The same reason you use built-in types like an Integer (as oppose to a tuple of 4 bytes or 32 bits).
> 1. Only One Level Of Indentation Per Method
One level of indentation just leads to an explosion of tiny one-use methods with weird names, and now you can't read the code linearly. You will almost certainly never reuse these tiny methods, especially since you're likely consigning them to an instance of a class instead of a free function, so all you've done is forced people to jump around a lot.
> 2. Don’t Use The ELSE Keyword
Not using the else statement just obscures the fact that there's a branch in the code. Obscuring something important seems to be the opposite of what you should do.
> 3. Wrap All Primitives And Strings
Ugh, that seems verbose and clunky, especially in a language like Java without operator overloading. I'm all for type aliases or typedef's, or, creating a class if the builtin primitives don't work (I think a Money class makes sense because you don't exactly want to use a float, for instance). But just putting wrappers all over the place sounds grotesque.
> 4. First Class Collections: Any class that contains a collection should contain no other member variables
Why even have a class then? Why not just have functions that operate on a collection? It's much more generic that way, since if you're using iterators or an abstract collection interface, you can potentially allow the user to choose the exact data structure, and you avoid the ceremony of creating a new type that's again just a wrapper.
> 5. One Dot Per Line... Basically, the rule says that you should not chain method calls.
This is the first one I roughly agree with, but I wouldn't consider it a hard rule. Chaining .map and .filter together for instance is a very common pattern.
> 6. Don’t Abbreviate
min/max is just as clear as minimum and maximum. I'm not using "index" in my for loop when "i" will do. "n" is perfectly well understood as a count of things. Abbreviations when used properly make code easier to read, not harder.
> 7. Keep All Entities Small... No class over 50 lines and no package over 10 files
Ok, assuming the problem can't be simplified, all you've done is now fractured all that functionality into tens/hundreds of files. How is that easier to follow? Sure, there's balance in all things, but I'd probably rather read a 1000 line class than 20 small files split over 2 packages.
> 8. No Classes With More Than Two Instance Variables... I thought people would yell at me while introducing this rule, but it didn’t happen
They were being polite. I'll do it for them. What the fuck?
The example he gives is also awful, where instead of using a string for name, he makes Name a type (ugh) with FirstName and LastName. Not only is that overly ceremonial, but it's wrong, there are plenty of names from various cultures that do not fit cleanly into FirstName and LastName. Also, what happens if he wants to store a MiddleName? That's three instance variables! Ohno! OR what if the person has like 10 middle names (this shit happens). Are we going to have 5 nested data types for that?
> 9. No Getters/Setters/Properties ... My favorite rule. It could be rephrased as Tell, don’t ask.
My brain feels like it's going to explode.
> It is okay to use accessors to get the state of an object, as long as you don’t use the result to make decisions outside the object.
Why else would you want to get the state of an object?
> Any decisions based entirely upon the state of one object should be made inside the object itself.
If your classes are 50 lines long, I guarantee you that other classes will be making decisions on other objects behalf.
> Then again, they violate the Open/Closed Principle.
I think the industry is largely realizing that this is a bad principle, as it implies inheritance. I think most people outside the enterprise java world now realize that using interfaces or free functions is largely better.
You seem to be criticizing the 'rules' as if they are suggested for production code. You couldn't be seriously thinking someone suggest maximum-of-2-fields as some sort guideline for the real world.
This is what most of the "easy to read" articles forget.
Show me why it is easier to fix a bug, add a feature or make a non-functional improvement to the code with their style than without.
For example, if you've extracted something into its own function, are you then sharing this function and using it in other places as well? If you then change the body of that function, are you now possibly breaking other parts of the code that relied on its old behavior?
If you've introduced a local mutable variable in between two lines, are you then mutating that variable prior/later? Is the query_params different at the end of the function then in the middle? Can you safely use it again?
How easily can you now introduce new behavior before, in the middle, after, and anywhere in-between?
When you modify the behavior to fix a bug, add a feature or make a non-functional improvement, is it an isolated change? How many tests break? Did it require major refactoring to make or very few things had to change? How easy was it to add a test for your new behavior? Was it easy to find the most appropriate place in the code to make the change? Etc.
Sure sometimes maybe you just read code for the fun of understanding what it does, but almost always in practice when you're working on a code base, you only care to understand and reason about the code because you're looking to deliver that next sprint task that involves changing something about it.
I wish more people focused on "easy to change/modify" then simply on "easy to read/understand".
You absolutely are. Which is why I think that DRY was pushed too hard, and for the wrong reasons. Everyone said "you only have to change it once!" rather than "consider how many callers depend on this". There's a reason the "rule of 3" came around; and those reasons I've found were mostly experience and empirically based.
"should" is a word loaded with authority.
Why?
If you believe a unit tests is for turning an impure function into a pure function (so you can just test what it's doing and no other effects), then in many cases tweaking will break existing unit tests. If the function exists, it's assumed it's used by more than the tests for it. Changing the signature or even the internal dependencies necessarily breaks the known contracts with other units.
Unless I'm writing throwaway prototype code (famous last words, lol), I try to write code such that I will be able to figure out what my intention was 6-18 months from now when I'm staring at a piece of code in a panic trying to debug a production issue.
That doesn't mean I'm going to get it right when I write this code. Instead, I'll be able to better ascertain what my assumptions were, how they fell apart in practice, and what a minimal, correct fix that doesn't make things worse might be.
Edit: Incidentally, this also applies to my commit messages. I’m writing them primarily for my future self so that I can figure out WHY I made a change, not WHAT the change was.
One thing is not in contradiction with the other.
It could lower the reusability of the code, by not having many abstractions, but it will be easy to understand, concise, and it will do what it was written for very well.
If that's too slow, then it can be optimized so it's fast, if a bit less readable.
9 times out of 10 it won't be too slow in the first place these days, unless you know beforehand that you need maximum speed for valid reasons.
You make it sound like you came up with that all by yourself
I didn’t claim my opinion was novel, just that it’s mine. I hope others share my opinion, because I’d find codebases that fit my criteria easier to maintain than many other types.
Also, do you agree or disagree with any of the ideas I put forth?
1) fits entirely on my screen and
2) doesn't involve much state modification
Every intermediate variable is a chance for me to miss some modification (e.g. it was passed to a func that modifies its arguments) and consequently misunderstand what is happening.
I've been experimenting in Python with the function chaining style of coding enabled by the toolz library. So while not at all idiomatic, the example in the original article would come out as something like this:
https://gist.github.com/ZeroBomb/8ac470b1d4b02c11f2873c5d4e0...
I would say that function-chaining this example would constitute over-engineering, but I have found that writing in this style has really helped me express pretty complex function composition in a way that is still concise without using a bunch of intermediate variables.
And have you looked at code that's over a year old and modified by other people to see how poorly those comments now match the code?
In actual code I would have put most of those on one line.
If you want logging, I've added an example of how to auto-log the composed functions to the gist.
Whereas the refactoring issues I'd love to learn more about are the equivalent of a sinkhole in your living room. Stuff like "you have data dependencies that go in, out, left, right, and through the code", or "the codebase is a mishmash of React combined with Vanilla JS that is hooked up to a custom PHP MVC". Basically refactoring that involves issues that cannot be cleaned up all at once, that involve deep architectural decisions and that require some amount of buy-in from the team.
Mostly I'd like to know more about this because I've realized that I'm not very good at it. My inclination is to just refactor everything and that's not a feasible strategy. I also struggle to balance it with getting feature work done. Definitely something I plan on reading more about.
You're right that other code problems can be worse. But that's no excuse to avoid doing the basics.
To clean up system design issues, you must first know what a better system design would be. It's not enough to realize that what you have is bad.
I do this a lot, and part of my approach is to always be incremental. Improve one detail/aspect at a time. The worst, very tempting, idea in this field is to throw everything away and start over...
> I also struggle to balance it with getting feature work done
FWIW, I like to spend 1/3 of my time cleaning up and refactoring.
I’ve also (more common) encountered engineers that don’t actively try to be clever by being terse, but also don’t put their mind to writing clearly.
Put differently, I think one has to actively try to write easy-to-read code.
I agree with your point that this sort of micro-style point isn’t as big as architectural questions, but it’s definitely something you want to teach junior engineers so that it’s second nature by the time they are at the level where they are thinking about architecture.
For that you can try reading Bob Martin, Martin Fowler, Kent Beck, Domain Driven Design, Hexagonal, etc. - but you also just need to build for a decade while thinking about that stuff to really master it. Sadly architecture often seems more craft than formal engineering at this level.
I still don't have a good solution other than try to "keep things simple" from the beginning, then as soon as new features are introduced and everything becomes messy, mercilessly refactor the architecture itself to make things "make sense" again. I fully realize that this is not very helpful because what does "make sense" even mean in a code base? But that's the best I can come up with and I don't think there's anything more specific that can be said :(
If I'm writing it for myself, and only ever myself, I'll use the more clever powershell ways of doing things. Expressions like:
1..10 | % {$_}
If you're coming from another language, you're going to have to run it to understand it, or look it up. That is time lost.
Where-Object is going to let you cut down on the number of lines of code compared to foreach() and for(), and in my opinion will make the code more readable.
$vms | Where-Object -Property Name -match "sql"
vs
$vmOutput = @()
for($i = 0; $i -lt $vms.count; $i++) {
if($i.Name -match "sql"){
$vmOutput += $i
}
}vs
$vmOutput = @()
foreach($vm in $vms){
if($vm.Name -match "sql"){
$vmOutput += $vm
}
}For the Foreach-Object point, that cmdlet also give you the option to use begin{}, process{} and end{} blocks. So that you can with begin{} do something before any of your objects are processed, process your objects with process{}, and after all objects have been process do something with end{}. This logic with for and foreach would have to come before and after the for and foreach statements.
I don't see this as a "PowerShell being clever" but more as a PowerShell is a shell that uses pipelines like nix shells but it has everything as an object unlike nix shells. So you get to take advantage of that.
That was one of my favourite PWSH features when I was using it regularly. I’m a UNIX CLI-and-filter guy from way back and after using PWSH for a while I longed for the same power in bash (my shell for reasons of history, availability, and muscle memory, I’m unlikely to change).
That kind of index notation really isn't mysterious if you write a lot of python in my experience.
- Some of this is the coding equivalent of "6 rules for financial freedom" or "6 ways to find your dream soulmate". Generic advice that doesn't reflect highly nuanced reality.
- These rules are guidelines at best. There are justifiable reasons to break them; which I do often. Albeit this requires experience (and dare I say, wisdom). For example, refactoring code into a separate function levies a cost (of indirection) on the reader. Therefore copy-paste is sometimes fine.
- Clode "cleanliness" is a moving target. For a coder's mental health and value proposition for his project, he/she should know what code can afford to stay dirty.
PS: I love Jonathan Blow's opinions on coding/programming. Here are a few: https://www.youtube.com/watch?v=21JlBOxgGwY https://www.youtube.com/watch?v=ubWB_ResHwM https://www.youtube.com/watch?v=KcP1fXQv0iU
You can do your absolute worst and you will still find someone claiming there aren't enough comments, or the naming is bad, or the code is too dense, or the code isn't dense enough, or you should use typed objects instead of tuples and anonymous classes, or your code should be more functional, or your code should be more imperative, or it should be more event-driven, or it requires more logging, etc.
And it turns out, there is almost no research to tell you who is right and who is wrong. The only thing I can safely tell others, is all these discussions and additions will add 900% more work all things considered, and there's no guarantee it will be less bug free or more.
What irritates me the most are the long, non-linear comments full of distracting noise. It's like reading a choose your own adventure novel.
I think most APL programmers would disagree with this take. Dense code has real advantages, and naming everything has real costs that are hard to see. There's nothing magic about a "line" that suddenly allows for chunking. You have to build a parse tree in your head in any case.
I'm reminded of Doug McIlroy's challenge to Knuth.[1] It's worth a read. Would you rather have 6 lines of dense shell, or 10 pages of Fabergé egg? I'll take the shell, thanks.
Look at the source code for J (an APL derivative)[2]. It's written in C, but that C was written in APL style by APL programmers. Lines leverage macros and 1–2 character names, making them extremely dense. Some files have a comment on nearly every line. For an average C programmer, this code looks absolutely insane. But it's not. The J devs find this perfectly readable and maintainable. It's clean code! If written with the typical C idioms, it could easily be 10x as long, and therefore harder to maintain. Your first impression is a snap judgement due to a difference of culture. You can learn to read this style with practice. Whatever your current style, that took practice too.
[1]: http://www.leancrew.com/all-this/2011/12/more-shell-less-egg...
A few approaches got lucky in the rapid inflationary period of the personal computer revolution (C), and the advent of the Web (Javascript), and became deeply entrenched in industry, while superior alternatives that had been known for decades missed the boat. Industry languages still haven't caught up to where Lisp, Prolog, Smalltalk, and APL were in the 1970's, but they are clearly (if slowly) trending in that direction.
APL and derivatives are still used extensively in finance, a highly competitive field, to say the least. That's where you find the jobs.
That's just the whole "popularity means it must be good" argument, which I disagree with.
So, it's easier to remember the three numbers 34, 765, 812 than the eight numbers 3, 4, 7, 6, 5, 8, 1, 2.
Refactoring code into separate functions with descriptive titles is probably a lot like combining numbers.
---
† Well, the article says 4–6; I'd always heard the average was around 7.
My initial prejudices have largely held. I do find the code harder to read and follow. Having to jump around, follow variables that change name as they are passed through functions, keeping track of state that was moved to a class member rather than in a function body (because breaking into pure functions resulted in too many function parameters). I can't fit as much code on screen because of all the additional function definitions.
Lastly, the success of the method relies heavily on how well you name your functions, which is often considered one of the hardest parts of programming. A name that makes perfect sense to me may not be as clear to others, or even to myself in two months. And the devil is in the details - there are so many implied semantic preconditions and postconditions with every function you write, there is no way to fit that into a function signature no matter how well chosen, and if you tried to document them them all your comments would be larger than code itself at this level of granularity. So you still end up having to read all the called code to understand the details of what is happening anyway, which is easier to do with more flat code.
On the other hand, I've found the process of "extract till you drop" to be very helpful in forcing me to find ways to clean up my code. It naturally tends towards maintaining separation of concerns, finding ways to DRY when the initial structure wasn't conducive to it, and generally disentangling things even more so than when I try to refactor to meet these goals directly. If I had all the time in the world on other projects, I think I would apply "extract till you drop" on my code, then after it is disentangled, recombine it back into reasonable size chunks.
As an aside, when I see samples like this, it makes me itchy. I hope and assume that they're being used as made-up snippets just to illustrate a point, and aren't being lifted from an actual codebase.
Because... ugh... isn't it obvious? Attacker-controlled input such as URLs should never be manipulated with naive string processing! Always use a proper parsing library. Not to mention that complexities of URL encoding, character escapes, etc...
The problem is that the author is using abstractions at the wrong level, with or without his fixes. The correct solution would be something like:
var uri = new Uri( "http://foo/demo?test=a&blah=b%20c" );
var map = System.Web.HttpUtility.ParseQueryString( uri.Query );
Console.Out.WriteLine( "is blah equal to 'b c'?\n{0}", map["blah"] == "b c" );
The above example is C#, but similar code can be written in any language. It's simple, direct, and doesn't violate the "rule of six". It can be read like English:1. Construct a URI from a given string.
2. Parse the query part of the URI into a map.
3. Test if the 'blah' value in the query is "b c" as expected, with the escaped space decoded properly.
The example of how to apply the "MORF" rule in the article still has low-level operations involved, which doesn't make the code more readable. It doesn't describe the intent, which is the key thing to writing code that doesn't need comments every second line.
A few months later though, I started with QBASIC. BASIC of course gets an awful rap, but it was so much more intuitive for me at the time. I started out with just global variables and GOTO's everywhere. Over time, I worked up to loops, and subroutines, etc. etc. However, the simplicity of "program runs one line at a time, each line does something obvious" was incredibly important to beginner-me.
Even once I moved to C, when I was an amateur I still had a tendency towards one line per thing happening. I really hated code like
while(i++ < 10) { doSomethingWith(i); }
(Actually, I still do).As I got more sophisticated in my 20s, I started packing a lot more ideas into a single line. If I'm being perfectly honest, I think some of it was just showing off. You certainly look clever if you can put 3 list comprehensions on one line or use some of the more advanced collections apis. However, besides understandability, I found that style of code had two really big problems:
1) It's a lot harder to debug. Either you can't get a breakpoint in the precise place you want, or you can't insert a print statement easily into a complex expression, or iteration variables become implicit and you lose context.
2) It's hard to add error handling to that type of code. When a lot of things happen in a complex expression, you're depending on the entire expression working.
Luckily I've grown out of that phase, although ironically now my much more mature code looks a lot like the very simplistic code I wrote as a teenager.
You know this, but for those unaware, in the previous example, x is not initialized to 3. Similarly, in "int* p1, p2;", p2 is an int, not an int*. Easy to misread.
I tend to really hate the UX of bottom-up designed API's and find them incoherent.
If you don't understand that, multiple smaller lines won't help you, because you just don't know what you are doing.
In addition, that code example is easily testable. Testability is more important than readability in modern programs that follow modern CI/CD principles -- and the readability is not really that bad either. Also, modern debuggers don't have issues with nested/lambda statements like these.
If the article's author had a legitimate bone to pick, they would have better examples.
I'm skeptical, because typically a line like that is embedded in the middle of a larger function.
Extracting the logic into a dedicated, pure function helps with testing.
Does anybody really think that e.g. sregex[1] is better than just learning and using the regex language directly? Because that's where this kind of thinking leads.
[1]: https://github.com/jwiegley/emacs-release/blob/master/lisp/o...
from stdlib:
from urllib.parse import urlparse, parse_qsl
url = 'https://www.example.com/some_pathsome_key=some_value&foo=bar'
parsed_url = urlparse(url)
values = [v for _, v in parse_qsl(parsed_url.query)]
print(values)
which I guess you could oneliner back to this.. [v for _, v in parse_qsl(urlparse(url).query]You are saving a few lines on the surface, but adding a potential backtracking bug in the future.
s.split('?')[1]
.split('&')[-3:]
.map(lambda x: x.split('=')[1])
Unfortunately, that's not how Pythons map(), len() and such were designed. (x.split('=')[1] for x in s.split('?')[1].split('&')[-3:])
Removing the lambda cuts down on the noise considerably.And honestly, with this many splits with fixed indexes, I'd probably use a regex. Now there's a dense language for you.
In fact, for people unfamiliar with python, this expression is even more strange - you read expression starting from the middle (the _in_ ... part), and then return to the beginning. It makes your eye dart forward and backwards on the text.
>A line of code containing 6+ pieces of information should be simplified.
he put it in bold.
Try actually reading instead of skimming next time. Or at least skim more slowly.
A short line length, while yes imperfect, forces complex lines to be decomposed into individual concepts. And it allows the code to read like a book rather than <there is literally no other media format that you read sideways>. Ultra-wide monitors be damned.
And auto-formatters suck because they don’t split concepts onto individual lines. They can’t. They just mangle code and scrunch it into whatever space is allowed without regard to how the code reads. The idea of them is great and intensely alluring, but the implementation leaves much to be desired. If an auto-formatter could make my code look and read like a LaTeX document, I’d shut up already.
So if you want people to implicitly start structuring their code as advised in this post, set a 80 or 100 char line length. And adopt a fuzzy “one statement per line” philosophy.
To be fair considering such visual details is a complex task and probably hell to implement.
If I've only helped one person today, it was worth it ;-)
DoThis();
DoThat();
DoTheOtherThing();
Instead you usually have something like: x = DoThis(a, b, c);
y, z = DoThat(c, x, a);
w = DoTheOtherThing(a, z, x, y, b);
…and on top of that have to add error handling for those calls.Naming things is hard, but it's also important.
Clean code is not just a few rules about how to write a line. You can write nice lines that still don't make sense and amount to shit code
query_params = s.split('?')[1].split('&')[-3:]
map(lambda x: x.split('=')[1], query_params)
The calculation of query_params, having no dependency on the lambda parameters or anything being mutated, has been lifted out of the lambda, and thus spared from repeated execution by map. The compiler for that language won't do this automatically.The query params were never in the lambda to begin with. Python function calls have strict (not lazy) semantics, i.e. "applicative order", i.e. both expressions passed as arguments to map() are evaluated before the map body gets them as parameters, thus the query params would only be evaluated once, even when inlined as they were originally.
Same with the lambda definition: it's evaluated only once. It's just the lambda body that gets reevaluated each loop, and only evaluated for the first time on the first loop.
but I really wish debugger evolution hadn't stopped at the line.
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."
Taking a subet of query params smells in its own right so, again, might be a bad example.
life ← {⊃1 ⍵ ∨.∧ 3 4 = +/ +⌿ ¯1 0 1 ∘.⊖ ¯1 0 1 ⌽¨ ⊂⍵}
Is that shorter than essentially every other language implementation. Yep!However, to even begin to understand it you have to read an article from the original writer:
https://aplwiki.com/wiki/John_Scholes%27_Conway%27s_Game_of_...
To me, that is objectively, not subjectively, less clear than the longer implementations.
The fact that the site uses a consent solution that fakes a loading screen when trying to configure (read disable) tracking/advertising is an instant bounce for me.
Edit: To clarify a bit more, in contrast to the rule of six, i'd definitely keep a line that is more complex than usual but conveys the intention of my thinking, rather than splitting it to multiple lines and losing that important information, losing the original intention.
I also remember reading about a study where chess masters and non-experts were asked to memorize chess boards. Average people could only remember 5-7 piece locations where chess masters could remember the entire board. But when the piece layout was random (rather than from real chess matches) the experts weren’t much better than the non-experts. It’s speaks to the abstractions our brain creates to deal with limited working memory.
That cumbersome line of python is a good example. As an experienced python person, I immediately found myself giving names to the chunks to understand it.
Overall very good advice. Your code should explain the steps it takes to solve a problem (or in a more functional language, explain the solution), not be as terse and clever as possible. Keystrokes are cheap; thinking is expensive.
See, the first split use made me think it always returned the left and right part after splitting at the first match - 0 being left and 1 right. This is not the case. The code implicitly relies on this being a url.
But then the second split is accessed with the weird [-3:] which I have to assume to mean the last 3 elements. I assumed then that split must return a list but started wondering: why 3 elements only? I still don’t know. I wasn’t helped by the single letter named variables either.
I think people might want to focus on the basics before venturing into grand consideration about splitting lines and putting code in function. The one liner with proper names is too long but understandable:
last_three_url_param_values = lambda(query_string: query_string.split(‘=‘)[1], url.split(‘?’)[1].split(‘&’)[-3:])Obviously, that isn't always possible. I find this approach especially useful in writing e2e browser tests. You write an abstraction over the testing framework's (playwright, puppeteer etc) interaction and then use that in your tests.
So instead of writing:
await page.click(".play-button");
You do: await app.play();
This also has the benefit of extreme reusability. Doesn't work for everything though.Yet when it comes to programming languages, the majority of "advice" seems to be about absolute dumbing-down and propagating an attitude of "it's too hard, you can't possibly learn, just give up"?
I've always wondered about this dichotomy. Some languages like the APL family appear to have gone far into the "it's a language, to be learned like any other" territory, while more "mainstream" ones are drifting further in the opposite "don't even bother trying harder" direction.
Kernighan's Lever: http://www.linusakesson.net/programming/kernighans-lever/ind... (look at the rest of his site; he has clearly leveraged that attitude with great success)
(I looked at the one-line example in Python and, despite having very little experience in the language, it was actually faster to read and understand as a whole than the 3-line version.)
But I do agree on keeping individual lines, if not entire statements, small and simple. The ruby chainsaw is a great tool for that. I find a chain of simple statements arranged in a flow to be more readable than using lots of intermediate variables.
The real answer is write code like you write words: Rework it to make sense to the reader. How many newlines you need as a hint for your editor to wrap and where you put them will fall out of that.
Or autoformat! I love autoformatters!
Edit: edited to make easier to parse mentally.
I'm inclined to believe that there are probably many gradations of long vs. short term memory in the structure of the brain. In fact, I bet the gradations even vary by topic and of course depend on what sorts of tasks a person is accustomed to performing from day to day.
I imagine that the "4 to 6" figure fell out of a study that aggregated a large amount of data collected across subjects and that the figure itself can't capture much of the nuance or even the nuance of cohorts.
In other words, it may very well be that a large percentage of people who work professionally as software developers are capable of keeping more than 4 to 6 "facts" about code they're looking at in their head. But that they would also appear to have the same capacity as random people when it comes to arbitrary facts that one would be asked to memorize in a psychological study.
I'm not going to nitpick the incredibly bullshitty term "brainpower" and what is less and if that's actually advantageous, but if you write short lines of code, you're going to write more lines, which requires "more brainpower" to understand. You don't simply "chunk" lines in memory. If that were true, you could just as easily chunk function calls.
That memory plays a role is fairly certain. There is a pretty hard finding from psycholinguistics: it's hard to understand nested structures. The sentence "the rat the cat the cook hit chased escaped" is much harder to understand than it's right-branching equivalent "the cook hit the cat that chased the rat that escaped". However, reading code is not the same as reading natural language.
If you want to know if what you wrote is understandable, try reading your code without falling to back to remembering why you wrote it. Try to read what you wrote. Wait a few days if your recollections get in the way.
Let say you have a long code block that includes the revised snippet:
> query_params = s.split('?')[1].split('&')[-3:]
> mylist = map(lambda x: x.split('=')[1], query_params)
> ...
> ...
> (some more complex transformations, that only depends on mylist)
When you're reading the later stages of the code, you still have to maintain a memory of what "query_params" does, even though it's no longer relevant. That actually increases the burden on your working memory. The one-liner is more complex to understand initially, but it self-documents that the only info that is relevant to the downstream is the result of the map(...).
In general, the more variables that are declared in a code block, the more effort it is to understand, and the effect is probably superlinear with the number of variables. I'd say if you have to declare more than 5-6 variables, you should split into a separate function.
Functions come with abstraction overhead. You don't know who will consume them, so you may have to put up type checks, null checks other BS.
Also - functions split up the logic all over the place, it's confusing.
I think what we need are 'nested functions' which serve to kind of create a scope pushed to the stack - with an implicit 'return' - which we can then 'collapse' in the GUI etc..
I mean, it's purely cosmetic from a CS point of view, but it might help to organize things a bit better and hand off abstractions in long function implementations.
Huge projects with 1 or 2 line functions drive me crazy - you have to constantly jump around all over place to figure out what's going on. I actually believe it's a historic anti-pattern.
I make functions when we need 1) used in different places 2) meaningful abstraction.
Otherwise, well documented longer functions for me.
The solution isn't to extract every token from the expression to separate lines, but to document the "why" of the unexpected token. That can take many forms: a new variable with a meaningful name, a new function with a meaningful name, or a meaningful comment that warns the reader about the upcoming reason for getting just the last 3 parts.
Sometimes my one thing is “turns a Json file into an in memory dictionary” which might be three operations on one line.
I think the mantra for all names to be short can be counterproductive here. If the code span of a variable is short, a long name can be fine (and very clarifying, perhaps even resulting in a comment not being needed).
Shorter names for longer spans are much better. But you’d hope they’re the very obvious subject of that span.
Overall a lot of this boils down to minor style issues. I care very little if you give me 5 short lines with named intermediate steps versus a dense one-liner, however I do care very much if your code leverages pure functions, minimizes cyclomatic complexity, encapsulates messy bits, and has some form of test coverage. The former might take me a minute or two longer to grok (depending on my personal context), but the latter compounded over a wide surface area can lead to a completely unmaintainable system and a pathological fear of touching anything.
I've found that dividing software into layers, and making sure that each file relies on the same set of invariants from its dependencies, and also maintains a (different) consistent set of invariants for its callers works much better.
For instance, I'd prefer a function that takes a string and confirms it is a valid URL.
That would delegate to URL character esacaping logic and DNS validation. (Are & or ? valid DNS name characters? Will they be in the future? I neither know nor care.)
On top of that, there would be a parser for key=value config file lines.
Then, the example in the article becomes something like:
keyvalue = parseConfLine(input)
URL(keyvalue.value).params[-3]
Plus a few more lines to confirm key is as expected and that value has enough query parameters.
Alternatively, I'd use a perl oneliner with a regexp. I see no purpose for code that lands in the middle ground between these extremes.
Applying rules like these to your code may or may not result to cleaner code, but that's a testable hypothesis.
I've seen all too often some clean code recommendation or other applied to code and it gets harder to understand. And the person doing the refactoring (often myself) gets caught in sunk cost.
Now my recommendation is always:
1. Use your intuition to predict if a change makes code cleaner.
2. Try to make that change, and be open to doing things a little differently that you first imagined.
3. Test your hypothesis to see what others think. Decide what to do, but be mentally willing to throw it away.
4. Repeat
Articles like this are good resources to help train your intuition, but there is no substitute to developing your personal and team "flavor profile" for what styles suit your way of thinking.
I think the general idea is that clean code is short code, that's the base guideline. Generally shorter code does less things, reducing cognitive load. It may also have performance benefits. It also takes less space on-screen, which is also a good thing: less scrolling, ability to use bigger, more readable fonts, etc... And as explained, short-term memory is limited. Short code also tends not to repeat itself, another common advise.
But that's the baseline, all the art is in appropriate breaking of that guideline, to have short code that doesn't look like it came out of a minifier.
Splitting lines makes longer code, bad, but sometimes it is justified. So what is your justification? The article focuses on "one liners" being hard to understand, but really, it depends on many things. For example you may use a longer form if you think that it is an essential part of your code and it is critical that you should pay attention to it. On the other hand, you can use a shorter form if it is a common pattern, what is "common" depends on who is going to read your code, or the project you are working on. For example, bit manipulation can make a good part of your code base, or be a one-off thing and it will have an influence on how you write that code.
Moving code into functions is generally a good thing if that function is used often (shorter code). I think it is the origin for the term "refactoring": factoring ax+bx+cx+dx becomes x(a+b+c+d), only a single "x" remains and it is shorter. But if that function is only called once, of if the operation is hard to extract from its context, it can lead to longer, harder to understand code, and again you have to exercise judgment. For example you may want to write a specific function because it is a tricky, specific part that you want to separate from the boilerplate. There are interesting considerations to using functions, because it actually reorders code, for example "a(){do_x}; do_y; a(); do_z" is written as x,y,z and does y,x,z, which is often, but not always unintuitive.
I'm sorry but this is such an asinine statement. Your brain doesn't store "things" and the number of working items depends on so many factors the complexity of the information, the level of association between items, the attention span of the individual, which can be trained, and a multitude of other things.
Neuroscience is a useful tool for self-programming but you must be careful peddling absolutist statements like this which can do more harm than good.
Sure, it might be relevant if all variables were named "x", "xx", "xx", - but they're not.
Admittedly the example below is not a perfect solution, but that's where I thought the article was heading when splitting that code over multiple lines for readability.
map(
lambda x: x.split('=')[1],
(url
.split('?')[1]
.split('&')[-3:]
)
)
Is this still too unreadable or more messy?# URL with params https://news.ycombinator.com/item?id=32963021&something=valu...
map(lambda x: x.split('=')[1], s.split('?')[1].split('&')[-3:])
"Simple, put code in small methods"
Oh great, now I do a nav jump or a string search as a context switch rather than scroll.
Comments? increase vertical screen pollution.
Proper chunking is hard.
Maybe APL was right.
Lame.
This is pretty weak. Not knowing what a function or language feature does, doesn't make it inherently unreadable.
But, Miller states that limit is valid only for unrelated items.