I'm not knocking Caravel (it looks amazing) just curious why build vs buy in this case.
[1] Tableau, Looker, Periscope, Chartio, Qlikview, Gooddata are just some that come to mind.
Free as in beer is one incentive as licenses are not cheap, and vendors know when they have you locked down and tend to milk everything they can.
More importantly, software for which we don't have control over the source is a risk. In this day and age anyone that cares enough should be able to push a bugfix/hotfix overnight. What if you'd have to wait for entire quarters or years for Tableau to parallelize their "live mode", or to get connectivity to Presto to work?
What if you want to integrate a new type of visualization that isn't supported? What if you want to integrate with your anomaly detection framework or your A/B testing framework or other internal or external facing applications?
Since this is a common need for most companies, it makes sense to have an open source solution that we can all use and collaborate on.
Free as in beer is never the answer. A project like this takes multiple engineer-years to build and maintain. That's hundreds of thousands of dollars, at least. How much is a site license? Are you sure? Have you negotiated the rate?
Even for "expensive" services, buying it from someone else is almost always cheaper than paying someone to maintaining it yourself, because expensive services are usually expensive for a good reason: they're niche, and finding someone with the expertise to build it is expensive. And having the source is for a product so that you can customize it is certainly a better answer, but it rarely happens, in practice. It's why we have gobs of open-source Apache-foundation products that nobody in their right mind wants to host in-house, unless they absolutely have to.
Developers have a real, well-documented resistance to paying for things, and it sucks. Because in reality, most development of open-source tools happens when someone gets paid to maintain the tool. If they don't, the tool falls into disrepair. Open-source software isn't free -- it's just paid for by someone else.
Anyway, the ones where bring your own database can scale as far as the database can bring you.
Disclaimer: I work on Metabase.
Larger, data-driven companies with significant engineering teams prefer not relying on 3rd party, closed-sourced vendors. That can represent a significant risk and a blockage for deeper integration with other internal applications when needed.
Not that building always wins over buying, but the balance shifts relatively to the size of the company.
Also, when using open source on the receiving end of the equation, you want to be a good citizen and contribute back to the ecosystem. It ties to pride, passion, and reflect a strong engineering culture, which can help with recruiting.
71% of those who chose to build the BI tools said they built because "We can customize the functionality better"
51% of those who buy say "Buying enables us to provide best-in-class BI functionality"
The study: http://www.jaspersoft.com/sites/default/files/confirmation_f...
[0] https://en.wikipedia.org/wiki/Online_analytical_processing#M...
https://camo.githubusercontent.com/c22acad6c1302c5da3236cb8e...
Here is the original demo[1] from Mike Bostock, D3's author.
If you're ok pulling the data out of Postgres into memory locally and mostly care about manipulation and beautiful dataviz, then look at Tableau.
If you're mostly interested in more data sciency/ML stuff, then Shiny or something else that's R-based is a good option.
If you're interested in being able to embed your business logic into the tool so that non-SQL folks can build their own queries and everybody's relying on the same data definitions, that's where Looker (disclosure: where I work) excels.
Disclaimer: I am one of the founders.
Our site is here: https://www.periscopedata.com/ and if you have any questions, shoot me an email at jon@periscopedata.com.
I have done a fair bit of Ruby a few years ago but I'm new to python CRUD apps and trying to improve my knowledge here. Is defining all models in the same file[1] conventional in python apps? Rails used to have separate files for each model. And most Ruby apps that I have seen advocate the one-class-one-file convention.
[1] https://github.com/airbnb/caravel/blob/master/caravel/models...
If you use multiple apps within one Django project or the equivalent in Flask (Blueprints), that extends to one models.py per app (where a "project" is a collection of "apps").
Sometimes you'll see one file model per (with a models/__init__.py that imports them for use). While I think it keep dependency imports for each model very cleanly separated, you end up having a lot of redundancy importing the same basic pieces in every model file.
For example you have a comment app that could contain several models: Comment, Thread, Report, etc those can be in the same file. To continue on the django example, I would personally prefer having a models folder in the comments app and one file per model as some can get really big.
I also do 1 file / model in Flask, minus some specific cases where it just makes sense to have them in the same file
If you have a small number of models (e.g. <= 5), then it's fine to have them all in one file, as you will not benefit from multiple files, really.
When your application is growing, you have split the models into multiple files, grouped by features, etc (e.g. users.py, content.py, etc).
I prefer this as models usually a very small, and switching from one file to another can become quickly annoying when working on related models. However, it may be different for large classes.
Written documentation is vastly superior to videos in my opinion.
Got it up and running easily enough, and connected to Redshift. But seemed like creating a new "slice" required custom JSON params to define it. Unless I missed something?
edit: yep, missed something. Can "explore" a table by clicking it's link in the table listing.
Database Support
Caravel was originally designed on top of Druid.io, but quickly broadened its scope to support other databases through the use of SqlAlchemy, a Python ORM that is compatible with most common databases[1].
Grouping imports into: standard lib, third party, local is a strong pattern that I don't see done consistently in many repos. Likewise with your use of wrapping long imports with ()s and a single tab.
Any chance of sharing your Python style guide? My startup is Python based (Django and Flask) and would really appreciate it!
On your other point, though, to echo the build vs. buy discussion from above, I think it's a bit misleading to say "oh, we'll just use an open-source solution and that'll be cheaper." Because if open source means a couple of internal developers and an analyst, that's easily $300k+/year in salaries that you might not spend if you were using a vendor.
Anyway, given your particular statement of the problem you're facing, I'd humbly suggest you take a look at Looker. The data modeling layer that's core to Looker is meant to solve EXACTLY that problem, by leaving your data where it lives and then embedding your business logic in the layer that sits between end users and the data.
<yourPythonInstallDir>\Lib\site-packages\caravel\bin
then run as
python caravel db upgrade
A tutorial on how to link it to a mysql database would be greatly appreciated :)
So, you just need to set the config param SQLALCHEMY_DATABASE_URI like this:
https://github.com/airbnb/caravel/blob/1b4e750b2aa111445703d...
The configuration guide explains it further:
https://github.com/airbnb/caravel/blob/master/docs/installat...
This is just a data visualization platform. You need to bring your own data store and data.