1.) The in-memory DB .exe was around 500 KB. Imagine that.
2.) The Q language syntax, while consistent, is fairly arcane and throwback to decades past.
3.) The documentation and driver support is abysmal.
4.) It's supposedly extremely fast, but I can't help but wonder if this is a lot of successful PR and hype (like hedge fund bosses insisting on Oracle because it's the only db that 'scales')
Based on that experience...
1) Yes, but that's not huge by modern standard.
2) Q is a DSL version of K. As others have commented, K is a pretty clean implementation of APL, and Q makes K more approachable.
3) I have to agree here, but Q for Mortals makes up for it.
4) It is really fast. As we all know, a vast majority of us actually don't have terabytes and terabytes of data, especially after a reasonably cleanup / ETL / applying common sense. I suppose it helped that I worked in finance, which meant my desktop had 16GB of memory in 2009 and 128GB of memory on a server shared by 4-5 traders.
Finally, Q was never intended for general-purpose computing nor a widespread adoption. At least when I was an active user, the mailing list had the same 20-30 people asking questions and 3-4 people answering them, including a@kx.com (= Arthur Whitney, the creator). Back then, I'd say there were at most 2-3k active users of Q/K in the world. Now that Kx Systems is part of First Derivative and has been working on expanding their customer base, perhaps they have more...?
The machines that $dayjob-1 used to build dominated the STAC-M3 for a few years (2013-2015) because we paid careful attention to how kdb liked to work, and how users liked to structure their shards. Our IO engine was built to handle that exceptionally well, so, not only did in-memory operations roar, the out of memory streaming from disk ops positively screamed on our units (and whimpered on others).
I miss those days to some degree. Was kind of fun to have a set of insanely fast boxen to work with.
OP could have phrased it better, but I presume his point was that 500KB is extremely small by modern standards. The whole executable fits comfortably in L3, so you'll probably never have a full cache miss for instructions. On the other hand, while it's cool that it's small, I'm not sure that binary size is a good proxy for performance. Instruction cache misses are rarely going to be a limiting factor.
It still is, but the hot path is much smaller than that.
> 2) The Q language syntax, while consistent, is fairly arcane and throwback to decades past.
I can't comment about this. I don't mind the syntax. I prefer k syntax though.
> 3) The documentation and driver support is abysmal.
This is getting a lot better. The fusion API[1] goes a long way towards better "drivers", and the new documentation site[2] shows a lot of energy being put into organization. There's also Q for mortals[3] which is linearized for people who like that.
[1]: http://code.kx.com/q/interfaces/fusion/
[3]: http://code.kx.com/q4m3/preface/
> 4) It's supposedly extremely fast, but I can't help but wonder if this is a lot of successful PR and hype
There are a lot of good benchmarks of kdb[4] unlike Oracle :)
J has JDB[2] and Jd[3] for things somewhat similar to qdb with Jd being the commercial offering similar to qdb rather than JDB.
I would probably choose APL over Q if that were a choice. In J you can always make your definitions (verbs, nouns, etc...) plain words if you like the way Q reads.
[1] jsoftware.com
[2] http://code.jsoftware.com/wiki/JDB
[3] http://www.jsoftware.com/jdhelp/overview.htmlk variants:
kona (C, interpreter): https://github.com/kevinlawler/kona
klong (C, interpreter): http://t3x.org/klong/
kuc (C++, JITted): http://althenia.net/kuc
oK (JS, interpreter): https://github.com/JohnEarnest/ok (see also iKe by John Earnest)
cousins:
J (C, interpreter) http://jsoftware.com/
A+ (C, interpreter, unmaintained): http://www.aplusdev.org/index.html
Gnu APL (C, interpreter): https://www.gnu.org/software/apl/
apl.js (JS, interpreter): https://github.com/ngn/apl
There's also NARS2000 and a few other APL interpreters
related:
Numpy and R provide similar functionality, albeit with more verbose (and less fluent) composability. They are usually slower.
Nial (C, GPL interpreter): https://github.com/danlm/QNial7
The author of Nial, Mike Jenkins, has recently released v7 of Nial. Nial is akin to Q in that many of the operators are keywords rather than symbols. Its computational model is slightly different due to its roots in Trenchard More's array theory.
Also, maybe thinking what if make Fortran-like arrays or similar.
Maybe I'm reading too much into this, but it seems like you expect the answer "nothing much" to the first part of your question. I have done nothing more than read lots of articles on KDB's lineage and play around a bit with J, so take my answer with a considerable lump of salt, but my impression is that the answer to the second part of your question is "because there's something 'so special' about KDB"; my understanding is that it provides blazing-fast access to memory-compact databases, from a tiny codebase, building on the APL/J legacy. Why can't this be done in an open sourced way? Well, surely there's no inherent reason, but the fact that it hasn't been is probably evidence that it's not just a problem of trivially cloning existing work (or else someone would have done it).
There’s also a JS implementation that aims to implement K6:
http://johnearnest.github.io/ok/index.html
I don’t believe either implement all the “database” side of kdb+ though.
Edit: The J language is similar in some respects, and is GPLed. They've also got a columnar database... http://www.jsoftware.com/jdhelp/overview.html
see here:
However, I couldn't get used to Q. I understand that it is fast, and I also started to like the functional programming aspect of it. But oh boy, there is no proper error messages (no, "`type" is not helpful). The short versions of various map and apply were handy, but there are no equivalent versions with longer names, therefore there is no way to write a code which is readable to a non Q-expert. The strange evaluation order made it impossible for me to read other people's code, I often started to add parentheses and checked whether the behavior changed. There is no debugger (I used KdbStudio). Once I used more than 16 local variables, which gave a runtime error with some strange unhelpful error message.
I'm not questioning its usefulness, but I think that it could be much more developer friendly without compromising speed.
I think a lot can be done to improve teaching q/Kdb.
The current “best practice” is to change you until it makes sense but this takes time- anywhere from 6 months to a couple years based on your other experiences. And you still have to want to “get it”.
I want that to be better.
However this is alien technology: the terseness is a feature. Limited locals are a feature. The evaluation order is a feature. These things actually help it go fast (as surprising as that is!)
We only just recently got “a debugger” and good error messages because of repeated complaints and wishes for it from newbies, but there is a reason experienced Kdb/q programmers never wanted it. Why doesn’t that reason prompt those newbies to figure out why?
Yes, ideally you would learn the way and the why we do things, but there are a number of serious hurdles to overcome: To you q/Kdb seems merely unpolished even if maybe “fast”. Is it worth it? “Maybe”, you suggest, but then there are crazy people like me telling you something preposterous; that q/Kdb is actually incredibly well crafted, highly readable and an absolute pleasure to use.
I'm also trying to say that everything is worth learning: Nim and Python and JavaScript are all basically the same thing. You learned one of them, you kindof learned them all, so adding another one feels like these things are easy to learn. Alien technology is alien though, you haven't learned any of it. How can we even talk to each other?
I'm hopeful: Lambda was tricky, but it snuck in to things. Can we get tables, views, and high code density?
We need to find something better- some better way to talk about it, but the peanut gallery is loud.
Thank you for the time to respond. I understand that speed is very important and I'm happy that Q takes it seriously. I was never questioning that part. I also understand the beauty of functional programming paradigms (working with maps and applys), I wrote many lines in Mathematica without "for" loops. I also understood many code snippets written by developers working only with Q (I guess they are experts). I don't know the full power of Q, but I can imagine what is achievable.
I haven't worked with Q for 18 months, the debugger sounds cool. Forgive me if my knowledge is not up-to-date. Also I wanted to use the language immediately without learning it for months, which might be the main source of my frustration.
I believe that there could be a better developer environment without making the system slower, and that could reduce the time required to use the language efficiently from the several months to several days or hours.
The evaluation order is not clear to me. I understand this expression:
q){x+2*y} scan 2 3 5 7
(the result is: 2 8 18 32). However, I often had to read more complicated code which contained 1 or 2 character long operators. Without fully understand their syntax, I was unable to tell whether those operators take values from both sides, which stopped me from understanding the evaluation order. It would be great if a tool could convert this expression to q)scan[{x+2*y};2 3 5 7]
and that tool also would replace cryptic 2 character long operators by long readable names (e.g., MapForTables).I think that the error messages need to be as verbose as possible. You mentioned that experts don't ask for the debugger and verbose error messages. I think that they simply got used to not having these useful features, but they would use them eventually. I'm okey with the limited number of variables until it gives a proper error message if violated.
People working with Q told me that it was hard for them to restart working with Q after 1 month break, because they forgot the lexical knowledge needed for the efficient work.
Peter
This is an understatement. It helps explain why a person who has never programmed will pick up APL/J/K easier than someone who already knows Python or C.
So, what about build a transpiler? Make a more verbosed variant and transpile to q/kdb.
As it is, I give any tool which I can only trust to spy on me a hard pass.
Its quite strictly evaluated right to left... Which I think is one of the best features over most other languages.
If you could FD wouldn't be able to justify charging £1.5k/day per head consultancy fees for someone they just sent on a 1 week training course.
// remove more clutter
#define O printf
#define R return
#define Z static
#define P(x,y) {if(x)R(y);}
#define U(x) P(!(x),0)
#define SW switch
#define CS(n,x) case n:x;break;
https://github.com/KxSystems/kdb/blob/master/c/c/k.h#L96Or this wall of code:
This code is basically obfuscated by hand. Absolutely unapproachable. Only the original author(s) can understand it.
Judging by other comments, it seems to work well. So they seem to be good programmers producing working code. It's just not intelligible by other human beings, which is a pretty bad thing, but not the only factor in software quality/health.
The Vim code base is at some parts straight batshit insane, but it's one of the most polished programs I've ever used.
All that being said, I would find infuriating to work with such code. Nope!
And it is not for all people.
But it's just a foreign language; You could look at Japanese text[0] and make similar statements, and would be just as valid (or rather, invalid) as your statement.
You expect to be able to read it because you're used to a class of languages which are all similar enough at the surface level -- perhaps you are even familiar with more than one fundamentally different classes, say, "lispish and algolish" or "germanic and latin". But that doesn't make APLish or Japanese[0] horrible.
This is just APL using C syntax.
[0] assuming, the proverbial you does not know Japanese
It's not in the "k.h" header, though that references it also.
Maybe the build system injects it through the compiler command line or a forced-include header. Though the gcc command line alluded to in the block comment header shows no evidence of that.
Edit: Found it! It's a typedef for a pointer to a struct k0:
typedef struct k0{signed char m,a,t;C u;I r;union{G g;H h;I i;J j;E e;F f;S s;struct k0*k;struct{J n;G G0[1];};};}*K;This approach helps me find bugs and repetition in my programs which makes my code shorter and faster as a result.
Not only that, it is the only software I have ever seen that used a GUI and then ditched it in a subsequent version.
Few programs are so aligned with my own software sensibilities.
Only complaint is that they used to have a FreeBSD port and now only have Linux and macOS but no BSD.
Unfortunately Linux compat in BSD is being perceived by some as a potential security issue these days.
I agree, this is admirable!
Even if it is under different names (which I think is a far better approach), Sustrik does this too: http://250bpm.com/blog:50 He went from AMPQ (not his own creation) -> ZeroMQ -> nanomsg -> Libmill (essentially Go in C/UNIX style)
Also, OpenBSD comes to mind (LibreSSL for example).
Whenever I have suggested on HN that there is such a thing as "finished" software that is free of serious bugs, I get some resistance. There is a consistent knee-jerk reaction citing the same tired, old meme, "All software has bugs", and "Software is never finished."
Sustrik's post proves I am not the only one perplexed by this strange belief that no software is ever finished.
IMO, it is not a question of being infallibile or being available to fix bugs. The point is that there are programs that are not continuously growing in size and complexity. They are not "dead". They are "finished".
As for OpenBSD, certainly some programs I would consider "finished" but overall the size of the kernel and base distribution are in fact growing. Not only new drivers, but new programs and new libraries continue to be added. More code means more probability for bugs and vulnerabilties.
Anyway, less code, more terse syntax means it can be easier to find problems. Not everyone will agree with this of course. But I agree with Whitney and others. Less code makes it easier for me.
In that respect it's interesting to compare libmill to Plan 9 libthread [1], which is arguably an ancestor of Go channels/goroutines.
With earlier versions, e.g. k2.8, there is a `show command to trigger a pop-up window that reminds me of Tcl/Tk, containing the values in editable fields.
The interpreter is terminal friendly and works without the GUI but it has no formatted output of tables in ASCII like in k4. x11 libraries are a dependency.
The license already changed once (for the worse, IMnsHO), so what's to say that won't happen again. Too risky to touch.
No, can't use it (32bit version) for commercial, so this is a non-starter. Sorry.
kOS? really? https://news.ycombinator.com/item?id=8475809
Some languages are worth learning to expand your horizons. Lisp is one of them, even if you never use it, and the APL family (of which K/Q are members) is another. My C code has become faster, simpler, shorter and less buggy after I dabbled in K. YMMV, but using it in a commercial setting is not the only reason to look at it.
> kOS? really? https://news.ycombinator.com/item?id=8475809
Really. What exactly is your "really?" question?
Interesting. Can you explain why you think that has happened?
> Some languages are worth learning to expand your horizons. Lisp is one of them, even if you never use it, and the APL family (of which K/Q are members) is another.
Exactly.
I have more than one hammer.
life:{3=a-x*4=a:2(sum -1 0 1 rotate\:,’/)/x} life←{↑1 ⍵∨.∧3 4=+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂⍵}(I am not certain if my comment follows HN's guidelines, apologies if I offended anyone)
* If your code doesn't spend most of its time in primitive verbs operating on large vectors, it's gonna be more or less as slow as any other interpreted language. Q and kdb+ can be fast and beautiful if you can arrange your problem in the right way, but it's not magic.
* The internals are locked away. If you don't like the way something fundamental works, tough. I've known some folks to go to heroic lengths with debuggers and hacked up shared objects to get Q to do what they want. You could also get Kx to add the stuff you need (they're pretty reasonable and responsive). But, you can't really take it apart and put it back together again like you can with, say, Lua, Ruby, or Python.
* Relating to the above point, one of the weaknesses of the language is that there are a lot of useful (even necessary) features packed into weird corners. There's little room for abstractions beyond the basics, so you get stuff like CSV parsing controlled by the structure of lists passed to a function called "0:". It's getting better documented lately, but it's still not pretty.
* Various annoyances (no real module system, no lexical scoping, etc...)
In many of those cases, I'm not even sure what could be done without compromising some other aspect of the language. Most of the time (at least for me), it's really a joy to use.
(Some context: http://archive.vector.org.uk/art10501320)
For those not already in the loop with respect to k6, the reference card (http://kparc.com/k.txt) provides a good overview. Note that it is neither exhaustive nor completely representative of the current state of the language.
Unlike SQL which pays lip service to Codd's relational model but breaks it with things like TOP, ORDER BY, LIMIT and others, the Q language embraces the order between tuples to great effect, making e.g. "as-of" queries which are quite common trivial; whereas in SQL and the relational model, as-of queries are inefficient in either execution time or storage space (usually both), and reasonable execution speed schemas cannot, in fact, guarantee their integrity (which is often quoted as one of the the main advantages of the relational model).
As another example, Q implements "foreign key chasing" also called "reference chasing", which is also implemented in the web2py DAL and surely others; compare[0] the equivalent tpc-h query:
in q:
select revenue avg supplier.nation=`BRAZIL by order.date.year from lineitem
where order.customer.nation.region=`AMERICA, order.date.year in 1995 1996, part.type=`STEEL
in sql: select o_year,sum(case when nation = 'BRAZIL' then revenue else 0 end)/sum(revenue) as mkt_share
from(select year(o_orderdate) as o_year,revenue,n2.n_name as nation
from part,supplier,lineitem,orders,customer,nation n1,nation n2,region
where p_partkey = l_partkey and s_suppkey = l_suppkey and l_orderkey = o_orderkey
and o_custkey = c_custkey and c_nationkey = n1.n_nationkey
and n1.n_regionkey = r_regionkey and r_name = 'AMERICA' and s_nationkey = n2.n_nationkey
and o_orderdate between date('1995-01-01') and date('1996-12-31') and p_type = 'STEEL')
as all_nations group by o_year order by o_year;
[0] bottom of http://kparc.com/d.txtThe columnar structure of the DB as well as its IPC layer make it very good at creating chains of processes that can be used to stream row updates and branch them out to different processes with different responsibilities. Likewise it's on disk database is great for running complex (time series) queries.
This speed and terseness comes at a cost of being fairly "old school" in its approach. It's only recently for instance the we got stack traces, and readability is definitely not for the faint of heart though this depends on how it is written.
In my view the biggest thing that is needed right now is better tooling and libraries. Some attempts have been made to do this, and I am hearing that the new initiatives by kx will be addressing this in the coming months. The lack of standardized testing library/framework also can be problematic, as every team that I have seen does it slightly differently, and a "best practice" would beneficial.
Another useful intro to the language
1. adding more syntactic sugar to the language to improve readability
2. open-sourcing the thing or building an open-source compatible Q interpreter with all the nice features of KDB (functional programming, vector-based data structures, "scripting language within a database" approach, web features: HTTP server and websockets, etc.. etc.. )
Smaller Code, Better Code: https://news.ycombinator.com/item?id=13565743
AMA: Explaining my 750 line compiler+runtime designed to GPU self-host APL: https://news.ycombinator.com/item?id=13797797