On the other hand, though, this sounds like a theoretical/academic article to me. I've been using Clojure for 15 years now, 8 of those developing and maintaining a large complex SaaS app. I've also used Clojure for data science, working with large datasets. The disadvantages described in the article bothered me in the first 2 years or so, and never afterwards.
Laziness does not bother me, because I very rarely pass lazy sequences around. The key here is to use transducers: that lets you write composable and reusable transformations that do not care about the kind of sequence they work with. Using transducers also forces you to explicitly realize the entire resulting sequence (note that this does not imply that you will realize the entire source sequence!), thus limiting the scope of lazy sequences and avoiding a whole set of potential pitfalls (with dynamic binding, for example), and providing fantastic performance.
I do like laziness, because when I need it, it's there. And when you need it, you are really happy that it's there.
In other words, it's something I don't think much about anymore, and it doesn't inconvenience me in any noticeable way. That's why I find the article puzzling.
Sounds like that is going to the point the article is making - the best way to use lazy sequences is not to. Lazy sequence bugs make for a miserable experience. Clojure already has an onboarding problem where every new learner has to discover all the obscure do- and don't-s and go through the lessons of which parts of the language are more of a gimmick vs the parts that do real work. Attempting to do tricks with lazy sequences is part of that but it is polite to warn people before they try rather than when they get to Stack Overflow after hours of head-to-desk work.
Although I will put in a small plug for lazy sequences because they work well in high latency, high data i/o bound situations like paged HTTP calls or reading DB collections from disk. When memory gets tight it can be helpful to be processing partially realized sequences. But the (map println [1 2 3]) experience that everyone has is a big price to pay.
I disagree — I do use lazy sequences, I just rarely pass them around. Very few functions in my code return lazy sequences, and those are usually the "sources": functions that can return database data, for example.
Most of the code does not return lazy sequences, and thanks to transducers can be abstracted away from the entire notion of a sequence.
> In other words, it's something I don't think much about anymore, and it doesn't inconvenience me in any noticeable way.
Interesting, because it still does bother me. I mean, if I use lazy sequences and functions on them. Sure, if I consciously avoid them, then it doesn't me anymore, that's the point of the article :D.
There's always new minutiae to learn. Plus I get a handy link that I can just paste next time the topic of laziness comes up in a code review.
I'd just simply point out that there are a few sub-tribes within the Clojure world. Some are very attracted to formalism/correctness, other pride themselves in rejecting them.
I am using clojure for my side projects & hustles. If the project is quick and dirty who cares how its implemented. If project evolves to more serious product, I should re-write anyway and optimize the critical code paths.
It might also be good to mention Injest
https://github.com/johnmn3/injest
Which makes transducers more ergonomic to use if you are like me and use threading macros everywhere
Would be curious to hear how others feel about it
I realize the maintainers likely would not even be interested in such a thing, of course, just daydreaming.
Clojure is the only language where it is baked in that prominently though.
Writing custom transducers, especially stateful transducers is really difficult. But that's not something you'll do often. My 10kLOC complex app has three stateful transducers that I wrote.
I think transducers are an under-appreciated aspect of Clojure. They are an extremely valuable and flexible tool, and have allowed me to write reusable and composable code and tackle significant complexity, all with great performance.
Obviously in languages that can reliably perform stream fusion transparently, maybe you care less, but the abstraction isn’t just about the speedup.
You can't hide from complexity. It will lurk somewhere anyway.
I was drawn to Clojure because it looked like a lisp for getting stuff done. But a few things put me off. This article puts me off more. I want to get the semantics down before I have to think about what's going on under the hood.
There is the issue of startup time with the JVM, but you can also do AOT compilation now so that really isn't a problem. Here are some other cool projects to look at if you're interested:
Malli: https://github.com/metosin/malli
Babashka: https://github.com/babashka/babashka
Clojure is fun enough that people get to know all the edge cases. And put up with the stack traces.
If you do decide to try it, don't use deps.edn, go straight to Leiningen for build tooling. Even just for playing around in the REPL.
Clojure is a fantastic language, and probably the best lisp you could start out with due to the fact that you have the entire Java ecosystem at your fingertips.
I wonder if a Scheme dialect would be a better fit for you? They tend to be smaller and might let you focus on semantics more.
Full disclosure: I haven’t spent nearly as much time with any of the Scheme/Scheme-inspired dialects as I have with Clojure. I’m basing this off of their design philosophy and others’ observations.
Is this just a personal goal you’re setting? I’m curious because I’m in a similar position, so I’d love to hear your plan!
And clojure also doesn't give an error/warning when lazy sequences aren't finalized.
GHC would be a better example, I think. It performs stream fusion. This means it can turn 'map f (map g xs)' into 'map (f . g) xs', and of course it gets more complex than that, but that's the basics. It directly optimises lists (which, this being Haskell, are lazy sequences).
Is it for built-in map or it would work in a general way for, say, `myMap f (myMap g xs)` ?
I'm typically using it like so:
(defn realize [v] (doto v pr-str))
(binding [*some* binding]
(realize (f some-nested-lazy-seq)))Actually be very careful with side effects. Some functions like `map` and `for` take things in chunks, typically in steps of 32 as most underlying structures are in log-32 leaves.
```
(let [printing-range (map (fn [i] (print "debug: " i) i) (range))
first-10 (take 10 printing-range)]
first-10)
debug: 0
debug: 1
debug: 2
debug: 3
debug: 4
debug: 5
debug: 6
debug: 7
debug: 8
debug: 9
debug: 10
debug: 11
debug: 12
debug: 13
debug: 14
debug: 15
debug: 16
debug: 17
debug: 18
debug: 19
debug: 20
debug: 21
debug: 22
debug: 23
debug: 24
debug: 25
debug: 26
debug: 27
debug: 28
debug: 29
debug: 30
debug: 31
(0 1 2 3 4 5 6 7 8 9)
```Suppose we make a sequence of numbers which grows very rapidly, so that by the time we hit the 17th one, we have a bignum that is gigabytes wide.
You probably don't want this to be chunked in batches of 32.
Another situation might be if we have some side effect: the lazy sequence is connected to some external API somehow or foreign code. You might want it so that the observable behaviors happen only to the extent that the sequence is materialized.
The advice to be careful with side effects is good in general; not sure why you're downvoted.
1> (len
(with-stream (s (open-file "/usr/share/dict/words"))
(get-lines s)))
** error reading #<file-stream /usr/share/dict/words b7ad7270>: file closed
** during evaluation of form (len (let ((s (open-file "/usr/share/dict/words")))
(unwind-protect
(get-lines s)
(close-stream s))))
** ... an expansion of (len (with-stream
(s (open-file "/usr/share/dict/words"))
(get-lines s)))
** which is located at expr-1:1
The built-in solution is that when you create a lazy list which reads lines from a stream, that lazy list takes care of closing the stream when it is done.If the lazy list isn't processed to the end, then the stream semantically leaks; it has to be cleaned up by the garbage collector when the lazy list becomes unreachable.
We can see with strace that the stream is closed:
$ strace txr -p '(flow "/usr/share/dict/words" open-file get-lines len)'
[...]read(3, "d\nwrapper\nwrapper's\nwrappers\nwra"..., 4096) = 4096
read(3, "zigzags\nzilch\nzilch's\nzillion\nzi"..., 4096) = 826
read(3, "", 4096) = 0
close(3) = 0
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
write(1, "102305\n", 7102305
) = 7
exit_group(0) = ?
+++ exited with 0 +++
It is possible to address the error issue with reference counting. Suppose that we define a stream with a reference count, such that it has to be closed that many times before the underlying file descriptor is closed.I programmed a proof of concept of this today. (I ran into a small issue in the language run-time that I fixed; the close-stream function calls the underlying method and then caches the result, preventing the solution from working.)
(defstruct refcount-close stream-wrap
stream
(count 1)
(:method close (me throw-on-error-p)
(put-line `close called on @me`)
(when (plusp me.count)
(if (zerop (dec me.count))
(close-stream me.stream throw-on-error-p)))))
(flow
(with-stream (s (make-struct-delegate-stream
(new refcount-close
count 2
stream (open-file "/usr/share/dict/words"))))
(get-lines s))
len
prinl)
With my small fix in stream.c (already merged, going into Version 292), the output is: $ ./txr lazy2.tl
close called on #S(refcount-close stream #<file-stream /usr/share/dict/words b7aecee0> count 2)
close called on #S(refcount-close stream #<file-stream /usr/share/dict/words b7aecee0> count 1)
102305
One close comes from the with-stream macro, the other from the lazy list hitting EOF when its length is being calculated.Without the fix, I don't get the second call; the code works, but the descriptor isn't closed:
$ txr lazy2.tl
close called on #S(refcount-close stream #<file-stream /usr/share/dict/words b7b70f10> count 2)
102305
In the former we see the call to close in strace; in the latter we don't.