On top of that, consider making every bit of clever code a reusable library snippet instead. If it becomes part of the vocabulary of your codebase it'll be easier for others to understand it by looking at call sites.
eg:
# Capitalize the first letter of every word
@sentence = @sentence.split(' ').map!{|x| x = x[0..0].upcase << x[1..-1]}.join(' ')Could this be done with one gsub regex?
why use "x = x[0..0].upcase << x[1..-1]", instead of "x[0]=x[0].upcase; x" ?
And why not use the already-written "titleize" function, which capitalizes the first letter of each word?
Frankly, I think your darling needs to be put out of its misery. :-)
Which is a function provided by ActiveSupport. It's not part of Ruby itself.
@sentence.split(' ').map(&:capitalize).join(' ')
More terse and more descriptive (imo)
The original line raises surprisingly many questions:
- The default for `split` is to split on whitespace. Is it the intent of the author to only split on spaces? (I guess so) What about tabs?
- Why is the author using #map! (with exclamation mark)? I have to admit this put me on the wrong foot for a while. My best guess right now is that the reason is speed; Mutating the array that split produced is (slightly) faster than creating a new one.
- Why is the capitalized string assigned to x again? I can think of no good reason at all. Am I missing something?
- Why is the author using x[0..0] instead of x[0] or x.first? I know why: The difference is that the latter two return nil if x is the empty string; but it's far from obvious and one could easily break the code by trying to improve this.
- Why are the parts of x concatenated with << instead of with +? For speed? Is the author not aware of String#capitalize ?
My variant would be this:
@sentence = @sentence.split.map(&:capitalize).join(' ')
... and I would put it in a one-line method called titleize to better communicate my intent.For instance your 'titleize' method, in order to properly title case any string, ought to support a second option of words to ignore, such as "a, an, the, ...". If you wrote it as a one-liner at first, then you've got to go into your mapper and add conditionals if an ignore list is passed, and you've got more work than if you left it as a more expanded piece of logic.
Again, this is somewhat of a trivial example, but building on the original post's point of "how easily can I understand this later on" can also include "how easily can I extend this later on". Avoiding one-liner cleverness can help as a general principle.
Write the simple version first, wait for the need to extend the logic arise naturally, factor out the function and extend.
capitalize downcases the rest of the each sub-string, the original code did not.
titleize = ->(sentence) { sentence.split(' ').map!{|x| x = x[0..0].upcase << x[1..-1]}.join(' ') }
titleize.call("my lovely lovely lovely horse")
["a clause", "another clause"].map(&titleize)
:'Here's the Perl version:
$_ = join ' ', map ucfirst, split / /; @sentence = @sentence.
split(' ').
map!{ |x|
x = x[0..0].upcase << x[1..-1]
}.
join(' ')
[Edit: Fixed code, thanks knodi123] map s {(\w+)} {\u$1}g, @sentence;
I'll give a dollar to anyone who already knows Perl and can tell me what the fuck is going on there.Compared to a language like Python that has few neuroses, the same developer writes much less readable code.
The post may have been better if the author included a Python sample that does the same thing:
long_string = 'ajix mxozl xoap'
new_string = []
for word in long_string.split(' '):
new_string.append("{}{}".format(word[0].upper(), word[1:]))
new_string = ''.join(new_string)
Admittedly there are a few annoying symbols in here, mostly due to the non-intuitive operation of the join function and the well-meaning but less-than-ideal string formatting syntax (which could've been avoided if we used conventional strong concatenation, and which PEP 0498 attempts to improve). Also admittedly, coders who want to show how smart they are would try to use a list comprehension to do this, which is slightly less readable and imo shouldn't be used over this format without a good reason. But it's still much easier to parse than the Ruby version because everything is explicitly spelled out in the loop, and you generally shouldn't have to consult the language docs to look up 4 different rarely-seen operators.Remember, debugging is twice as hard as authoring, so if you write the most clever code possible, you are by definition not intelligent enough to debug it. ;)
I understand that people can't be expected to stick to that on their own, so they need languages that promote function, uniformity, and ease of use over showmanship.
The simplicity of the code is one of the main things I look for in interviews. If you could've done something with the conventional, simple language construct that doesn't require someone to go back and refer to the docs, even if it takes more lines, but instead you used the super-arcane construct to prove how well you know the language or to "get it all on one line", I'm going to look on that pretty dubiously. The last thing I want to deal with on my projects is the residue of someone's ego impacting our ability to read their code and get things done quickly and easily.
P.S., in Python, there are a couple of shortcut functions for capitalizing each word in a sentence: str.title() and str.capwords().
capitalize = lambda word: "{}{}".format(word[0].upper(), word[1:])
new_string = ' '.join(map(capitalize, long_string.split(' ')))
or: new_string = ' '.join(["{}{}".format(word[0].upper(), word[1:]) for word in long_string.split(' ')])
is much clearer and simpler to me, because it explains the intention of the code: split the string into words, map each word to it's capitalized version, then join it again. With the loop, I have to mentally "run" the code to realize that the for loop and append are to go through the split version and add the converted words to a new list.Obviously, this is a pretty subjective topic, and which version you prefer will depend what you're familiar with/the languages you use.
I dislike your first example because
a) imo, the lack of spacing makes it harder to see what's going on. it's more obvious that something is being iterated in a for loop with a new indentation level than in the map function.
b) it depends on the Python-specific implementation of lambda. The behavior of a lambda varies substantially from language to language and lambdas are used rarely enough that it's pretty likely someone who doesn't spend all day every day in Python is going to have to go back and look up the specific behavior. The syntax is much less clear than a full function definition.
In my opinion, it's much easier to read code like:
def capitalize(str):
return "{}{}".format(str[0].upper(), str[1:])
for word in long_string.split(' '):
captizalize(word)
...
This makes it much more obvious what's going on, and it should be readable to anyone with a passing knowledge of Python, and possibly anyone with a knowledge of programming languages in general. Invocation of map and lambda in this case only make the intent of the program more obscure.I'm not saying that map or lambda are never appropriate to use; sometimes they are. But I don't think it's wise to use them when more basic, universal language constructs do an equally adequate job, especially if the only benefit is "fewer lines of code".
The second example is just a list comprehension form of my original example, which IMO is less readable for much the same reasons. If you're not super familiar, you'll need to go back and look up list comprehensions. There is no spacing to make it obvious that something particular is being iterated or branched.
I understand that ultimately, ease of reading comes down to what style one is most familiar with, which makes it subjective, as you said. But I think there is a stronger rational basis for always preferring the simplest construct that adequately performs the function, which is that in the general case, there is less need to refer back to docs, less possibility of unexpected behavior, and less possibility of strange performance issues.
1) .format, which isn't immediately familiar and doesn't resemble similar string formatters in other languages 2) [1:], which I believe is a string slicing syntax that doesn't resemble similar syntax in other languages 3) Bug: nothing is done with the capitalized words, since the return value is thrown away 4) Bug: the name of the function was misspelled when used.
The last one may seem like a nitpick, but it is true that when you add a name to your code for the sake of clarity, you also take on the additional burden of ensuring the name is used consistently and accurately everywhere. This can be a particular pain in cases where you are generating a lot of uninteresting temporary values -- which is precisely why people end up writing chained function or method calls.
new_string = "".join(word.title() for word in long_string.split(" "))It's exactly because people pull this sort of thing "Hey, it was really easy to understand for me, how about you?" that I have seen developers feel compelled to put clever oneliners in codebases. Clever oneliners that later end up causing problems for whatever unluckly newbie has to troubleshoot their edge cases.
Just stop it, and unless you're on a word budget or this drastically improves performance, do something sensible that looks like the pseudocode:
for word in sentence.split(' '):
word.capitalize_first_letter
(If you want to be all functional, by all means use a map instead.)So I don't have to deal with the case where your clever function barfs on (hypothetical example, don't think this happens with the current code) the special case where there is a space between the last word and a period that ends the sentence; and you're trying to capitalize the period. [Edit: As expected, looks like this code fails with even common graphemes. Which is fine, but is an argument towards at least trying to make it more comprehensible.]
The code is not written for newbies. It never will be. That's why they're 'newbies' and not 'professionals'.
irb> @sentence = "\u00e6sir are gods"
=> "æsir are gods"
irb> @sentence = @sentence.split(' ').map!{|x| x = x[0..0].upcase << x[1..-1]}.join(' ')
=> "æsir Are Gods"
(Expected output being "Æsir Are Gods")If you want to understand why this is failing, the code I gave above will make it way simpler. Of course, viewpoints differ – you might claim that troubleshooting it is only a job for "professionals".
I've used Ruby somewhat actively for almost 10 years now and I still hate it when people barf out a bunch of Ruby-specific shorthand and think they're clever for doing so. Write it in the simplest form that works. I don't care if it takes 10 more keystrokes.
I've seen people write fizzbuzz with lambdas and such thinking that it would signal they're super-serious Ruby all-stars, but all it really does is let me know I should keep my distance, because they're going to be writing needlessly convoluted code.
It's easy to make it more readable by aligning the code with the dots, so that it becomes a pipeline:
@sentence = @sentence.split(' ')
.map!{|x| x = x[0..0].upcase << x[1..-1]}
.join(' ')
I definitely prefer that to half a screen page of crappy imperative code, where people over time will add lots of side effects etc.Besides, the middle part is clearly a strawman because
@sentence = @sentence.split(' ')
.map!(&:capitalize)
.join(' ')
but yeah. ;)Compare the output of both when run against "foo bar WIBBLE"
We all have large monitors -often rotated 90degrees to portrait. Extra lines are not a bad thing (especially if it helps visually break things into multiple smaller steps), and more work on a single line does not == elegance.
Optimizing my source for incompetent programmers isn't something I care about.
Additionally, optimizing the source to aid understanding for competent programmers is only something I care about to the extent that it doesn't otherwise impair their productivity.
Code spread over five lines instead of one might be easier to read, but not necessarily easier to edit.
Code should be written to be easy to edit first, and comments used to patch up where that conflicts with making it easy to understand.
Target semi-skilled developers and up, format the code for domain readability and ignore rules that get in the way.
@sentence = capitalizeEveryWordInThe(@sentence)
Then you can both unit test (if you're into that kind of thing) and refactor without breaking a lot of things.
Edit. I'm saying I find the one-liner form frequently more understandable than one embodied in a function.
a.map { x in x * 3 } is probably intelligible enough (in Swift) so no need for util.multiplyEveryElementBy3(a)..
I do it because it reduces physical code size and helps me "chunk" code in my brain. Since I wrote it, I know what I intended it to do and how it does it, possibly with the aid of a small comment.
This helps me have a more compact complete representation of the surrounding code, both in my editor and in my brain.
Is it bad for verifying correctness and allowing others to understand the code? Sure. But there is absolutely a well-considered reason why I do it, and it's not to show off.
And aren't the number of bugs loosely related to the number of lines of code? A darling has fewer chances of a typo bug.
This basically means that these keywords and operations are always OK:
* function definitions
* if conditionals
* full for loops
* full while loops
* return statements
and these things should only be used when it would be unreasonably intrusive or problematic to perform the same operation using only the above flow control mechanisms: * classes
* lambdas/closures
* language-specific shorthands like list comprehensions
* language-specific operator aliases, like << for .append in Ruby
* built-in functions that are likely to behave substantially differently from the intuitive expectation
The goal is to keep the program as simple and straightforward as possible. Of course you're going to need some of the things in the second category to do that sometimes. But my opinion is that one should start with the basic building blocks, and if one finds that it will save a lot of effort and time in both reading and writing the code to use one of the secondary things, only then should those things be deployed.For example, if you're using a lambda, you should have a good explanation for why a function definition wouldn't have worked. "Lambdas take one line and defs take two" isn't a good reason.
So where does it end? Dumbing down code until its so wordy, so bland, that it takes minutes to digest every phrase? To rewrite it in your minds eye until its back to that 'darling' concise expression. I recall taking a contractors 2 pages of code, and reducing it to a line. Not even a very complex line. In my view that was all improvement.
Long long ago when I encountered the string copy idiom in C 'while (s++ = t++)' for the first time, it left me scratching my head for a bit. Then it becomes a part of your vocabulary.
Native idioms make communication better.
People often misunderstand 'basic' for 'simple'.
@sentence = @sentence.split(' ').map!{|x| x = x[0..0].upcase << x[1..-1]}.join(' ')
Nobody sensible would write this code. The nearest sane version is: @sentence = @sentence.split(' ').map{|x| x[0].upcase + x[1..-1]}.join(' ')
But, although capitalize doesn't do the same thing, it's almost certainly what is actually wanted, so it would be @sentence = @sentence.split(' ').map(&:capitalize).join(' ')
Having said that, it's much more likely that a "clever one-liner" would be @sentence.gsub!(/\b\w/){|x|x.upcase}
Code like the article's would generally not be the result of cleverness, but of code that has organically mutated into something silly you'd never have written from scratch that way.That said, a block isn't necessary here and the one-liner can be even shorter:
@sentence.gsub!(/\b\w/, &:upcase)
&: call syntax is actually a little faster post 1.9, too.The most you could could say about this example, IMHO, is that it could be made more readable by breaking it up into more than one line.
The last thing we need is more people writing C code in dynamic languages that have much better and more concise programming constructs. If a programmer does not wish to learn how to use these constructs, that's their business. But I intend to use all the power available to me in whatever language I am programming in -- when it is appropriate, of course.
Just because some programmers don't understand how a particular language construct works, that does not make its use the product of a self-absorbed hacker who thinks he has "superpowers".
Yes, we should all guard against writing code that is difficult to read, but I think such code is rarely caused by the language features we use (although some languages make that easier than others -- I'm looking at you, C++). In fact, you could argue that it is more commonly caused by the features we don't use. Unnecessary verbosity is often the product of someone who is new to a particular programming language, and I think most of us are happy to discover shorter, simpler, clearer ways of expressing things.
In fairness to the author of the article, I think there is a point when you can chain a few too many things together without any intertwined temporary variables, separate function calls, or at least comments, and the reader of your code can get lost. However, in my mind, the code example given hasn't quite reached that threshold yet.
Here's a quote from Brian Kernighan: "Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?"
https://en.wikipedia.org/wiki/Brian_Kernighan
I've come to appreciate simplicity especially when spending a lot of time dealing with other people's code. If you spend most of your time in other peoples code, simplicity will save you endless amounts of time.
In fact, I actually coined a code metric for this - Jones Complexity, which is the mean operations-per-line (My thesis is that "good" code has a Jones Complexity of less than 8.) For some reason, it looks like code density of this type wasn't ever really considered before. I value readability and maintainability of code even more than I value efficiency and speed (in most cases.)
flake8 extension PoC: https://github.com/Miserlou/JonesComplexity
I think virtuosic displays like this are important for the ego of the creators at times, and should be commended as it represents a deep understanding of the craft and leads to even deeper and more interesting and perhaps profitable explorations later on.
But as in music, virtuoso's often, as they mature, realize that they work in ensembles, and that requires a tempering of their prodigious abilities in order for the "whole" to be its best.
In other words, its good to show off, but in the end the true master knows how to temper it for the greater good.
It is also the case that we often overestimate how understandable things are to others - I write above discussing goals before any compensation for this is applied.
Here's the discussion: https://news.ycombinator.com/item?id=1041500
Here's a (different) one-line Life implementation, linked to in that discussion: http://catpad.net/michael/apl/
x().and_then(|foo| exp).and_then(|foo| exp)
to handle errors. The imperative form requires a match statement after every function call that can return an error. Or "try!()", which bails with a return. It looks like we just have to get used to this Haskell-like style.