The fact that your DSL is represented as an immutable datastructure by the host is really not important. Any code you write is going to be turned into a datastructure at some point. The immutability should be read as "the context in which this result is valid has been made fully explicit", where "context" includes "things that mutate over time". You don't loose in expressivity, you just can't forget to handle special cases.
Checker (Java 8's pluggable intersection type-system framework) is able to do that (define side-effect types), but isn't a single general type system. I.e. you need to write a different type system for each property (of course, they can all work together). Maybe that's the only known way to do that. Searching for more general solutions, I've come up empty handed (the problem is well known, but the solutions are all in early stages of research).
The idea of such a type system needs to be something like: each function has a set of types associated with it, and a function's type interacts with the types of the functions it calls in some ways. So the most primitive example is, of course, checked exceptions, where the type of a function is a simple union of the types of its callees. But the interactions can, and should be more interesting, and take into account the order in which the callees are called etc.
Well, to describe a side effect, you need to, uh, describe the side effect. Which requires understanding the compiler/interpreter, and a description of the things the compiler/interpreter does, expressed as machine-readable data.
It sounds like what you're after is something like a standard library where all the effects are carefully enumerated? i.e. something like Haskell, but where rather than the IO "sin bin", every effectful function would be very explicit about its effects? A bit like how Ada(IIRC) shipped with quite high-level concurrency operations as part of the language spec (which then lead users to reimplement low-level concurrency primitives on top of them, but maybe we can hope that history wouldn't repeat)?
Some libraries are taking us in that direction, but it's slow going. E.g. doobie seems to be gaining some momentum, as a principled way of expressing database access. Maybe in 10 years' time (if anyone is still using scala...) this will be "the" standard way of writing any program that uses a database, and the sort of thing that deserves to be in the standard library (whatever that means - IMO with modern dependency management tools a language standard library should only contain those primitives that need direct compiler support, anything else can be a regular library. But that's a whole 'nother argument). But right now it would be very premature to say that the algebra doobie uses is the correct one. Often the appropriate interpreter for a particular problem depends on that particular problem; just as it's good to construct your datatypes bottom-up, writing a program can feel like writing a succession of interpreters that get closer to your domain and capture more of the laws - basic Haskell gives you "thing that does any kind of I/O", then you have a library for "thing that sends complete messages to sockets", a higher-level library on top of that for "thing that does database operations", and so on.
And honestly standardization isn't that important, as long as the language makes it easy to let your code handle it generically. Doobie itself allows a couple of different implementations for your lower-level effect-capturing monad (scalaz Task, IO), or allows you to provide a simple adapter. So if two libraries use the equivalent of a different checker, a different "type system", that's not really any worse than e.g. two C++ libraries using two different String classes - that is to say, it's annoying, but not impossible to work with. User-level code can do the same thing - I've got a generic "report" process that can use a doobie database call, an async call to a web service, or something else. As and when the community reaches a consensus on the right way to express a particular kind of effect, I'd expect more higher-level libraries to standardize on that. But in the meantime, as long as types are first-class values and you can manipulate them as easy as any other values (which these kind of languages get you), multiple "type systems" are no more of a problem to work with than multiple types.
> a type system limitation that every type describing side effects [...] can only be used in the context of a compiler/interpreter.
A type system is just a logic description of a property that a program might have. It specifies how the language constructs interact with this property []. The fact that compilers use a type system is independent of your ability to write your custom type system, or to use it in a different context (IDE, toolchain, ..).
[
] Are you saying that a type system shouldn't be concerned by the AST of the language?An external type checker for Haskell: http://goto.ucsd.edu/~rjhala/liquid/haskell/blog/blog/2013/0... And the various type checkers for the not statically typed languages (python, ..)
Also, I don't understand why you distinguish "typing side effects" as something different from the rest?
> you need to write a different type system for each property
But you do have to define your type system at some point? Are you thinking of something like Mezzo, where some userland properties can be defined without going down the type-system-definition road? https://protz.github.io/mezzo/
> But the interactions can, and should be more interesting, and take into account the order in which the callees are called etc.
That's why type systems are defined with respect to the language construct, by rules on the AST (? I'm confused because of the earlier [*] here).
To go back to Agda significance for a custom type system, it's not enough to define a type system. You generally want to prove that your definition is valid: If your "checked exceptions" says that a program `P` can't fail due to an exception, you want to check that you didn't forget anything by verifying the type checker with respect to the actual semantic of the language.
You can also embed your custom property into Agda type system (isn't that what you want?).
That's the key idea ;)
If I write some Java like `DB.connect(credentials).select("users").where("name", "Kevin")`, am I using "some new language"? As far as Alan Kay is concerned, yes; that was one of his main inspirations for OOP:
> My math background made me realize that each object could have several algebras associated with it, and there could be families of these, and that these would be very very useful.
From http://www.purl.org/stefan_ram/pub/doc_kay_oop_en
The only difference between that Java example and my earlier Haskell/Agda example is that the functional version operates in two phases: calculating which operations to perform (building a `Program` containing `Ops`) is separate to performing those operations (applying an interpreter function to a `Program`).
However, keep in mind that we're not doing imperative programming, so there's is no inherent notion of time: we get the same result no matter which order we evaluate stuff in (that's the Church-Rosser Theorem). Hence these two "phases" are logically distinct, but not necessarily temporally distinct. In practice, we tend to define our `Program` lazily, so it gets constructed on-demand by the interpreter.
This is a bit like having a `main` loop containing a `switch`, for example (forgive mistakes; I've never used function pointers in C):
void main(op* instruction, int* length, void* callback) {
socket* s = set_up_socket();
int close = 0;
char* data;
while (!close) {
switch(*instruction) {
case READ:
// Read logic
data = read_from_socket(s, *length);
*callback(instruction, data); // Update instruction based on data
break;
case WRITE:
// Write logic
data = *callback(instruction); // Update instruction and return data
write_to_socket(*length, data);
break;
case CLOSE:
// Close logic (ignores any other instructions)
close = 1;
break;
}
close_socket(s);
}
This main function is basically the same as our interpreter; it's keeping the dangerous socket-related stuff in one place, providing some safety that the socket will be closed after use, etc. The values of `data` and `instruction` are being computed by arbitrary C code living at `callback`, just like our `Program` can be computed by arbitrary Haskell/Agda/whatever code. In this case the interleaving of choosing and executing instructions is explicit, but a lazily-evaluated `Program` will behave similarly in practice.Does this mean that those callbacks are in "some different language" to C? No; everything is regular C except that we just-so-happen to be labelling some ints as `READ`, `WRITE` and `CLOSE`. Likewise, in Haskell/Agda/etc. we have the full breadth of the whole language available to us; we just-so-happen to be returning values of type `Op` and `Program`. All of our logic, concurrency, etc. is done in the "host" language; we only define `Op`s for the things we want to restrict, like `Read` and `Write`; there's absolutely no point defining operations for boolean logic, arithmetic, string manipulation, arrays/lists, etc. because that's available already. Of course, once we've finished defining our socket library, that too will be available to everyone (if we release it); just because it might use some crazy `Op`, `Program` and `runSocket` things internally doesn't mean anyone has to care about that; it's just an implementation detail.
Notice that this is very similar to an OOP dynamic dispatch system, where `READ`, `WRITE` and `CLOSE` are method names and the callbacks just-so-happen to be kept together in a struct which we call an "object". Just like Alan Kay said.
(forgive mistakes; I've never used function pointers in C)
You mostly got it right; the function declaration would look like this (I moved the * for pointers onto the parameter name for stylistic reasons related to how C declares types):
void main(op *instruction, int *length, void (*callback)(...))
Also, your function pointer calls should omit the * in front (it would be applied to the return value of the callback, which is void in this case, so you'd get an error during compilation).