Writing a data science pipeline with command line tools really just means writing a data science pipeline in your terminal's shell programming language. If that sounds like a good idea to you, I'd be curious to hear why.
Well, CSVs are ubiquitous. That being said, people tend to either buy in to the command line or not. It's basically a matter of opinion. To me, the ability to automate the use of a tool is a massive force multiplier which the cli is built for. To other people, the ease of use of a gui is preferable to googling for cli syntax or reading man pages.
> Writing a data science pipeline with command line tools really just means writing a data science pipeline in your terminal's shell programming language. If that sounds like a good idea to you, I'd be curious to hear why.
If by my terminals shell programming language you mean my terminals shell programming language, any program that can be called from the cli, any program that can export in csv, any program that can produce $format that can be converted to csv, and virtually limitless combinations of other tools and methods then yes!
RStudio is an IDE and IPython is basically a fancy REPL, they're not GUI tools at all.
curl | grep | awk > .csv
patterns I've used in real life, and having a way to train a model from the command line is a great complement.Currently I would call a python training script from the command line, and do orchestrate everything in bash. I'm not sure why thay is undesirable.
Two major benefits:
* Approachability. If the first step to trying ML on a problem is “download RStudio and learn how to use a GUI and unfamiliar language” that is a HUGE barrier to entry.
* Confidence - if I’m going to put a model into production and all I have to go on is the output of a self-described novice who “played around in RStudio” a bit, well, that’s not going to happen. There are a million ways they might have messed up.
Tangram nails both of these and I’m excited to see where it goes. Happy to talk to you more (ex startup VP Product, current ML researcher) - DM me on Twitter (yieldthought).
The reference in the name & the monospaced font work well to establish a sense of being a developer‘s company, but not just anyone but someone with a background in math(?). Perfect marketing to get me, although I‘m not a lucrative prospect. Is it possible that you’d catch bigger fish by going corporate? Possible, yes. But it’s my impression this has and still is shifting drastically.
One meta-issue for now: somehow, I always read the imprint when I am checking out some software or service. I can’t really make a case why it’s helpful, and you are obviously free to maintain an air of mystery. But if it’s just an oversight or you haven’t gotten around to it, just naming a city & country goes a long way to convincing me that you exist. Short bios are also useful, or links to your Twitter accounts, although J found those with only minimal stalking.(This preference may just be a result from my home country of Germany requiring such data and making it possible to get into the habit).
David: https://www.linkedin.com/in/david-yamnitsky Isabella: https://www.linkedin.com/in/isabella-tromba/
I promise we exist and are real humans! But that is exactly what the aliens would say.
> I promise we exist and are real humans! But that is exactly what the aliens would say
That… is a go-to joke I keep using on OKCupid, only with axe murderers instead of aliens.
I’ll stop hitting on you now, but might write up some impressions of the actual product in the next few days.
`tangram train --file data.csv --target price --date date`
This would predict the "price" column, using the date in the "date" column.
Internally, we can generate features from the date column like "hour of day", "day of week", "season", etc. that can help the model learn periodic patterns.
Thanks for the great suggestion.
Like the website design too.
Not sure how its modeling techniques have kept up with the state of the art.
Over the past few months we have been working with a handful of early users. A team at a small company had a TensorFlow model deployed as a Flask service consumed by their Elixir app. They replaced it with a Tangram model because they didn't want to maintain a server separate from their monolith. A team of front end engineers at a large company was looking for a way to to train and deploy models on their own, without the overhead of involving their data scientists, machine learning engineers, or backend engineers. They trained a model on their own and embedded it directly in their React front-end with the Tangram JavaScript library that makes predictions with WebAssembly.
Tangram is written entirely in Rust, from the core machine learning algorithms, to the bindings for each language, to the front and back end of the web application. We have benefited from Rust's fast performance, strong typing, convenient tooling, and high quality libraries (serde, tokio, hyper, sqlx, and more).
We hope to make Tangram a sustainable business with the open core business model. The CLI and language libraries are MIT licensed, while the web application is source available, free to use for testing, but requires a paid license to use in production.
We would love to hear your feedback. Give it a try and let us know what you think!
Im running ML in production for the last 4 years in the field of biology and need > million predictions per hour. Load-balacing on multiple servers (and staying with http request based predictions) is not an options as the latency would kill the application.
Am I in such niche? Or other question: are there many more people that run fine-tuned TF servings/TensorRT servers to keep up with the production need?