I would never really consider them alternatives though? To me Zapier is a low/no code tool that offers a bazillion integrations and Airflow is a workflow orchestration tool.
So comparing to both of them confuses me and I guess choosing one would give you a more nieche audience but one that you can connect better to as well.
As of now these are the broad categories that abuse the term workflow.
1. State Machines
For example in Jira a bug/ticket moves through different states to reach a final stage. This type of state-machines can be found in a lot of different softwares - most CMS/bug-trackers/CRMs where the entity is different (document/bug/lead).
The motive of these systems who call themselves as workflow engines - is to provide a structure to an otherwise ad-hoc movement of entities so that a lead/manager is able to ensure a process and collect statistics.
2. Automations
For example "apple workflow" app or Zapier or Zoho Flow. These softwares define a sequence of steps that are triggered when an event occurs in the system.
The motive of these systems are to enable automation and integrations between different software components with zero code (thus, no-code).
3. Process Designers (Bad term but can't think of anything better at the moment)
For example Airflow/Camunda. These systems are not necessarily low-code but they mostly deal with arranging individual components of code such that a process can be assembled as quickly as possible. These systems usually are accompanied by a visual designer like what Zapier has, but the intentions are mostly to ease out the process, than being a complete no-code tool to create automations. However, their marketing tries to sell themselves as no-code platform for business folks.
The motive is not yet very clear to me but from my initial intuition they can be used to initiate some data-processing pipeline, I guess? If anybody can throw more clarity, please leave a reply.
Now as you can see, much like how a "Process" can mean many things in many different context, the term "Workflow" can mean a lot depending on the context. Any software that calls itself the ultimate workflow solution is just a lie. It's like calling something an "ultimate process engine" - doesn't make sense.
Huh? I've never seen Airflow described as no code or tried to sell itself that way, in fact all the pipelines are written in python and you can do some really complex orchestration.
I get you're not saying Airflow is no code but that the category you've put it in is typically low code or marketed as low code, but then I don't think Airflow belongs in that category or rather, and maybe more accurately, no/low code is not really a major defining quality of the bucket you're calling "Process Designers".
I've also been calling them "Process Schedulers" because typically it involves translating a more manual, but well defined process into it's automated phase.
The no-code is an illusion in the enterprise realm - before you know it, you are waist deep in the custom code.
No-code can really work only for small businesses imo.
I come from enterprise background and that is one of the reasons I built Titanoboa - to make something that makes it easy to rapidly prototype new integrations on the fly.
I summed up some of my thoughts on this topic here: https://www.titanoboa.io/repl.html
The main point I am trying to test with Titanoboa is this however:
State Machines <-> Process Designers is a spectrum and one product could handle the entire spectrum (or part of it).
Titanoboa makes it possible to pre-define workflow steps and make it "no code" while also making complex custom integrations possible from the same environment with the same concepts. Plus also distributed data handling is in the mix.
I guess now the challenge is how to market this versatility or whether it could create more confusion...
My current line of thinking is: A workflow is a workflow is a workflow. Titanoboa is built the way that pretty much everything it shuffles is data - and either it can be big (Airflow) or the steps can have side effects (iPaaS/Zapier).
The "no code" is achievable since adding more predefined steps is not that hard (there's not that many at the moment though) - see for instance https://github.com/mikub/titanoboa/wiki/Getting-Started-with... - you dont code, you just fill in properties, so it is very straightforward, pretty much like Zappier.
My aim in the iPaaS space is more at the enterprise level, where the predefined steps won't do anymore and you have to custom develop them anyway - so in that are I think Titanoboa shines since you really can rapidly prototype steps on the fly, a bit like Repl.it and Zapier together...
I agree that it might be good to pick one audience, at this stage I am just experimenting if somehow I can market to both and don't mess up and confuse both groups.
I have been experimenting with combinations of huggin, camunda, airflow and others to try and achieve an integrated workflow/state/process management.
There exists a gap in the enterprise space between all these tools.. and it has been further exacerbated by the disruption introduced to many industries by covid. There is a real opportunity for small and medium businesses to be able to access tooling that is beyond what a base line Zap, etc can accomplish.
Would be happy to connect offline.
How does it compare to n8n? [0]
n8n is the closest OSS alternative to Zapier I've seen so far.
[0] https://n8n.io/
n8n is pretty neat as a low-code or no-code tool and comes close to zapier in many cases.
As someone with zero Java experience, this tool seems like a very steep learning curve.
Zapier requires zero coding knowledge for most part and is a great no-code tool.
- make rapid prototyping of new steps possible during runtime, without any need to restart/redeploy
- focus on distributed processing, where in master-less Titanoboa cluster you can have pretty much any number of nodes
I just picked up the more restrictive license at the beginning - being a sole funder and not working on this full time etc. I simply did not want somebody (e.g. a big company with a big team) grabbing my code along the way and running away with it.
Since now Titanoboa got to the shape I envisioned it to be in I am starting to focus more on adoption, so yes I am definitely thinking about switching to less restrictive license since it will probably help.
Also at the beginning I was not aware how badly AGPL is perceived (I always thought if it was good for Mongo it could work for me, but I may have been wrong).
So far I've stumbled upon using dual AGPL + commercial for those who don't like the copyleft; using something like Mongo's SSPL; MariaDB's BSL; and now, Commons Clause.
On the surface (I didn't yet study carefully the intricacies of all options) all these look to me as a great way to publish code and contribute to the whole of our common knowledge, while at the same time being able to maybe make a living from it, something for which it's important to prevent some bad actors from bundling it and profiting from it on your behalf. Otherwise that code wouldn't really exist at all to start with...
I might write an Ask HN becaude this topic is complex.
The JVM is more underrated/unknown than it should be in the startup space, tools like what you're making make it far more accessible.
So if we break that level please don't be mad if you don't get your instance :)
Instead give me a star on github and come back later :)
Cheers Miro
Apologies for inconvenience folks, but I am sure people rarely play with each instance for the whole 3 hours anyway at this stage.
Honestly did not expect this kind of "public beta" rollout, still surprised how well the service holds so far under the load.
5 syllables.
Agreed that this is unwieldy to say in normal speech.
I also am probably not good with names as my other alternative was Megalodon (4 syllables) - I just wanted to have some megafauna name.
Happy to hear what you folks would suggest :)
I especially love your web UI. It makes it very easy to start experimenting with workflows without the overhead of having to set up a local development environment.
I am curious how you store secrets (e.g. AWS access key id, secret access key). There isn't a login wall and it's not clear that the values will be protected from you, so I am loathe to put my credentials on there.
Great work overall. Wish you the best.
Yes, I would still be careful with passwords in the public beta.
It goes via SSL and you get a unique UUID URL so all-in-all it is kinda secured (plus time-to-live of the instance is only 3 hours which limits any time to hack it) but still this is not 100% secure and I would not recommend it for any kind of production use (including any use of passwords you dont want exposed).
The free instances are not (yet) password protected (other then the UUID) - this on the other hand is useful if you want to share the envrionment with somebody, just send them the link... This is just a beginning of the public hosting so I will need to think through further improvements that would go into the hosting, security-wise and other aspects as well.
My main objective at this stage for the free hosted instances was to give people way to play with Titnaoboa or quickly test something, especially if something is not working in their local environment (say dependencies) and they want quickly re-test in vanilla environment.
If you want to add aws secrets or what not I would suggest to just download/install your local version, it is super easy: https://github.com/mikub/titanoboa#installation
Or if you need a public IP to integration just grab the docker instance and spin it in your AWS ECS or something.
Very cool to see how you use clojurescript for this.
Feel free to check out Titanoboa on Github: https://github.com/mikub/titanoboa
Also: This is an early beta so please do let me know if something breaks or you spot a bug. Atm I have load-balancers set up only on West coast and in Europe, so apologies to folks from down under & similar locations, let me know if it's too laggy :)
The main message I wanted to convey was that this runs on JVM and multiple JVM languages (java and clojure) could be used.
I have a deep interest in DAG-structured ETL tooling and had a couple of questions that the documentation didn't seem to address...
1. Can I execute workflows without a server running? Something like... $ java -jar titanoboa.jar MyWorkFlowName arg1 arg2 ... ...and then my workflow executes, as a program, on my machine, until it's done and then exits? Or does every workflow always execute within the context of a running server?
2. Is there any notion of resuming a partially failed workflow? As a point of comparison, Luigi structures its DAG concept using Tasks which create Targets, and invoking a Task whose Target already exists is a no-op, so if you have a big execution graph that gets 80% finished and then dies, you can easily restart it. I find that many competing tools are missing this concept.
1. Based on the Clojure REPL example in the main README, I think the answer is YES, though not exactly the way I had imagined. It seems what you would need to do is write a top-level script (or java "main") that starts a "system", starts "workers", runs your job using that system, then stops the system and exits. A little clunky compared to how Luigi does it, but usable.
2. Best as I can tell the answer is NO. Neither the documented API, nor the implementation of the API in src/clj/titanoboa/handler.clj contain any hint of an ability to operate on a job id, beyond retrieving the result of its execution.
Additional commentary: 1. Resuming failed jobs As implied by my question above, the ability to resume a failed job is essential. One of the major reasons to adopt DAG-structured code is parallel execution, and Titanoboa has that. But the OTHER major reason is to allow partially-failed computation to retry/resume without repeating already-completed work. In particular in the ETL space, we often have job graphs composed of hundreds of nodes, with total runtimes measured in hours. If my 100-node job graph fails due to an error in node #78, preventing an additional 15 downstream nodes from running, I don't want want to run all 100 nodes again after I fix the problem. I want to resume executing my graph at #78, and expect only the 16 total affected nodes to execute, since everything else ran correctly the first time (and presumably persisted their outputs). Luigi gets this one right. Airflow sorta tries but it's clunky and you can tell it's not a priority.
2. Flow/Dependency direction When designing a workflow, either in the GUI or as EDN, you tell Titanoboa what jobs are "next". This is intuitive because it comports with our notion of execution flow through a graph of jobs, but it gets things backwards. That is, when we write A->B->C, we are thinking that A will execute, and then B, and then C (perhaps results will be passed from step to step). It is often better though to describe this as A<-B<-C, which reads as C depends on B, B depends on A, and A depends on nothing. Structuring our thinking in this way focuses the mind on what inputs a node requires in order to perform its effect or compute its output, rather than on what operations should follow it in time. Luigi and Airflow both get this one right.
3. Properties The way Titanoboa defines workflow-level "properties", into which job-level properties are merged, and the way properties flow along the path of execution, is very nice. A constant problem with Luigi is how to flow values from one Task to the next without using an excessive number of Parameters. I can't say for sure that Titanoboa's properties construct doesn't have the same problems, without taking the time to actually use it to build a large project, but on the surface it looks good.
4. Logging I noticed that when a step's function returns a map, to be integrated into "properties", that return value is not logged. The message in the log is like "Step [my-cool-step-name] finshed with result []" which is both unhelpful, and not even literally true, as it most certainly did have a result! When a step returns a scalar value, it does get logged. I found this inconsistency frustrating.
Also, the stdout/stderr of each step function apparently goes to /dev/null. I find this odd as the placeholder function when you build a new workflow is (println "Hello World!") but if you actually execute that you'll discover that our classic greeting vanishes into the void. This is a major shortcoming. As a point of comparison, one of the biggest value-adds of using Jenkins as a job scheduler is how it automatically captures the output of anything you run, saves it in a durable log file, AND lets you view it in real time. Job orchestration systems that don't match that level of log-friendliness drive me nuts.
5. Versioning The built-in versioning system is great. Two thumbs up. I don't know how it would work if I were writing my jobs in proper Clojure or Java code in their own repo, but I kinda don't care because the value of storing and versioning what I do in the UI is so great.
6. UI -> data I love the way the interactive UI is just there to generate EDN. In a way this mirrors how Jenkins' UI builds its job XML files, but you have to go hunting for those and they're hard to read (because XML). Being able to see what EDN is generated by your actions in the UI, _right there in the UI_, is fantastic.
7. UI issues The UI is great but it has quite a bit of low-hanging-fruit improvements that could be made. - the run job popup forces you to choose a system every time, even if there is only one - being able to draw arrows in the visualization is cool, but I could not figure out how to delete them there. Needs work. - the UI doesn't lay out well on small screens (I'm on an old 13" Air), I had to zoom to 80% just to be able to see the X to delete a property. It would help if the Workflows panel on the left (the least important UI element by far!) could be collapsed (edit: it can be collapsed, but the collapse button is on the other side of the screen which makes no sense) - the box that pops up after starting a job has nothing clickable in it. I have to close it and go to the jobs tab - the jobs tab doesn't refresh when it loads, even if I just started a job, which needlessly adds clicks to the main workflow - the jobs tab has an "archived" sub-tab but no apparent way to actually move a job to the archive
Overall, there's a lot of promise here, and it's amazing to me that you built this by yourself. Still, it has a long way to go. I recommend spending some time with Luigi, which I still think is the best general way of DAG-structuring real world workflow code, and with Jenkins which remains far and away the best UI-driven job orchestration system. It seems you're already familiar with Airflow, but I would recommend you treat it mainly as an example of what not to do.
Obviously this is for JVM, plus I am not sure how Prefect addresses following two points I mainly focus on:
- make rapid prototyping of new steps possible during runtime, without any need to restart/redeploy
- focus on distributed processing, where in master-less Titanoboa cluster you can have pretty much any number of nodes
Congratulations on reaching public beta and for submitting to HN! Looks very promising!
But I would upvote you 100x or send you a medal or something if I could, this is really super nice of you letting me know!
But is it really an alternative to Zapier? I mean, while Zapier is obviously about automation, it always felt that their big selling point was the sheer number of integrations they provide.
Target audience might be slightly different ultimately, but we'll see - I would envision this to be more useful in an enterprise environment where you pretty much always have to customize the integrations that were provided out of the box.
There are few blog posts I wrote about some sample workflows, e.g.:
https://github.com/mikub/titanoboa/wiki/Getting-Started-with...
https://www.titanoboa.io/using-titanoboa-for-it-automation.h...
https://www.titanoboa.io/using-titanoboa-for-cloud-integrati...
I understand that the cloud hosted version should cost more for these features but without a way to get pricing for the self-hosted HA and clustered version it will be a tough sell for me to deploy it, use it and have to scrap it in the future.
So the title should be "Host your own aPaaS" instead.