Show HN: Tangram – Train a model from a CSV file on the command line (opens in new tab)

(tangram.dev)

94 pointsnitsky4y ago41 comments

41 comments

38 comments · 12 top-level

civilized4y ago· 11 in thread

Why would it be desirable to train a model from a CSV on the command line? Is opening RStudio or IPython and running a couple commands the hard part of data science?

Writing a data science pipeline with command line tools really just means writing a data science pipeline in your terminal's shell programming language. If that sounds like a good idea to you, I'd be curious to hear why.

epr4y ago

> Why would it be desirable to train a model from a CSV on the command line?

Well, CSVs are ubiquitous. That being said, people tend to either buy in to the command line or not. It's basically a matter of opinion. To me, the ability to automate the use of a tool is a massive force multiplier which the cli is built for. To other people, the ease of use of a gui is preferable to googling for cli syntax or reading man pages.

> Writing a data science pipeline with command line tools really just means writing a data science pipeline in your terminal's shell programming language. If that sounds like a good idea to you, I'd be curious to hear why.

If by my terminals shell programming language you mean my terminals shell programming language, any program that can be called from the cli, any program that can export in csv, any program that can produce $format that can be converted to csv, and virtually limitless combinations of other tools and methods then yes!

cwyers4y ago

> To me, the ability to automate the use of a tool is a massive force multiplier which the cli is built for. To other people, the ease of use of a gui is preferable to googling for cli syntax or reading man pages.

RStudio is an IDE and IPython is basically a fancy REPL, they're not GUI tools at all.

civilized4y ago

Yeah, I mean, isn't "python script.py" or "R script.R" a command line tool?

version_five4y ago

I'm confused about the ipython reference, using a notebook for eda or results presentation doesn't have much to do with a data pipeline. I love the idea of being able to use shell scripts to orchestrate data prep and training. There are lots of

  curl | grep | awk > .csv

patterns I've used in real life, and having a way to train a model from the command line is a great complement.

Currently I would call a python training script from the command line, and do orchestrate everything in bash. I'm not sure why thay is undesirable.

isabellat4y ago

Yes! We want to make the whole ML process from training to deployment to monitoring radically more convenient. The CLI interface makes it easy to quickly train ML models without ever opening a Jupyter notebook, learning what pandas and numpy are etc. And after you’re done training, we want you to be able to incorporate your models into your existing stack: we have language libraries for programming languages traditionally underserved by the current ml tools: ruby, golang, elixir, nodejs, rust. (We also have python). All prediction happens in process with no network requests, meaning you don’t have to deploy a separate service to host your ml models :)

nitskyOP4y ago

You're right, CLI vs RStudio or IPython is not a big improvement. The real benefit here is that you can use your model from Go, JavaScript, Ruby, and other languages without having to include the Python or R ecosystems in your tech stack.

moconnor4y ago

I actually disagree, CLI is a big improvement. Vastly more developers are familiar with the command line than RStudio or IPython, including almost all of those who don’t already use some kind of ML.

Two major benefits:

* Approachability. If the first step to trying ML on a problem is “download RStudio and learn how to use a GUI and unfamiliar language” that is a HUGE barrier to entry.

* Confidence - if I’m going to put a model into production and all I have to go on is the output of a self-described novice who “played around in RStudio” a bit, well, that’s not going to happen. There are a million ways they might have messed up.

Tangram nails both of these and I’m excited to see where it goes. Happy to talk to you more (ex startup VP Product, current ML researcher) - DM me on Twitter (yieldthought).

civilized4y ago

Blindly running a CSV through a black box program is not the way to get confidence in your modeling.

1 more reply

civilized4y ago

Having a format that can be used in any language is great, and I agree this is the core problem. How does your .tangram format compare to existing attempts at standardized machine learning model formats? The only one I can think of off the top of my head is PMML and it seems like it never got much traction. Do you think tangram has a better approach to the problem?

nitskyOP4y ago

Unfortunately, tangram does not help here, it just adds yet another format to the list. The reason it needs its own format is because .tangram files include all the model reporting information that shows up in the web application.

1 more reply

simonw4y ago

If you run something it on the command-line, automating it will be trivial.

1 more reply

IAmEveryone4y ago· 3 in thread

Really like the initial impression. I’ve been playing around with Apple’s CreateML tool, which seems to be similar in capability, maybe(?), but is a GUI app that forces me into a process where I need to use switch from ruby to GUIs to XCode and Swift to the cLI…

The reference in the name & the monospaced font work well to establish a sense of being a developer‘s company, but not just anyone but someone with a background in math(?). Perfect marketing to get me, although I‘m not a lucrative prospect. Is it possible that you’d catch bigger fish by going corporate? Possible, yes. But it’s my impression this has and still is shifting drastically.

One meta-issue for now: somehow, I always read the imprint when I am checking out some software or service. I can’t really make a case why it’s helpful, and you are obviously free to maintain an air of mystery. But if it’s just an oversight or you haven’t gotten around to it, just naming a city & country goes a long way to convincing me that you exist. Short bios are also useful, or links to your Twitter accounts, although J found those with only minimal stalking.(This preference may just be a result from my home country of Germany requiring such data and making it possible to get into the habit).

nitskyOP4y ago

Hey, you are right, we should definitely add an "about" page to our site. In the mean time...

David: https://www.linkedin.com/in/david-yamnitsky Isabella: https://www.linkedin.com/in/isabella-tromba/

I promise we exist and are real humans! But that is exactly what the aliens would say.

IAmEveryone4y ago

Yeah, I’ve avoid LinkedIn precisely because I abhor the smell of corpses/corporates it gives off.

> I promise we exist and are real humans! But that is exactly what the aliens would say

That… is a go-to joke I keep using on OKCupid, only with axe murderers instead of aliens.

I’ll stop hitting on you now, but might write up some impressions of the actual product in the next few days.

nitskyOP4y ago

Awesome, please let us know where you write them, we would love to read it!

Bostonian4y ago· 3 in thread

You could give the user an option to specify that the CSV file represents equally-spaced time series data, in which case you fit a time series model to predict the target column.

nitskyOP4y ago

Great idea! We definitely want to do this. We have an open issue to track it: https://github.com/tangramdotdev/tangram/issues/28.

Bostonian4y ago

Good. A further thought is that if you allow time series models, you could allow the user to specify a time column in addition to the column of the response variable. Then you can fit seasonal time series models to the data.

nitskyOP4y ago

Yes! What we have in mind is this:

`tangram train --file data.csv --target price --date date`

This would predict the "price" column, using the date in the "date" column.

Internally, we can generate features from the date column like "hour of day", "day of week", "season", etc. that can help the model learn periodic patterns.

Thanks for the great suggestion.

anigbrowl4y ago· 3 in thread

This is very cool, finally the blue sky science of 30 years ago has descended to my humble command line.

Like the website design too.

isabellat4y ago

We're so happy you mentioned the command line! We are really focused on building ML tools with an interface that programmers are most comfortable with -- a lot of devs don't want to use jupyter notebooks. We've considered building a whole TUI showing all kinds of cool UI like CPU utilization, memory usage, training loss etc while training :)

anigbrowl4y ago

I'm really more a GUI person tbh but when I'm figuring out a new thing it's nice to play with a working binary before trying to write any code. I don't need to deal with ML so while I find it interesting and cool I've avoided getting very deep into it because it looked like a huge work/resource commitment just to indulge my curiosity. I'm really happy to have something simple I can just mess around with and figure out if/how I want to explore further. Thanks!

nitskyOP4y ago

Thanks for the kind comment. We spent a lot of time trying to make the website feel hacker friendly, so programmers know that tangram is not made only for someone with a PhD.

drewcoo4y ago· 3 in thread

name collision: https://mathigon.org/tangram

jcoder4y ago

Isn’t this a bit like saying Ruby collides with a gem, python with a snake, or Java with an island?

isabellat4y ago

Yes, the name collision was very intentional :).

jhgb4y ago

Personally I'm much more confused by the existence of https://github.com/tangrams/tangram .

tdeck4y ago· 1 in thread

Reminds me of vowpal-wabbit, which also has a command line. Their website looks very corporate now but it's a free tool: https://vowpalwabbit.org/tutorials/cmd_first_steps.html

Not sure how its modeling techniques have kept up with the state of the art.

nitskyOP4y ago

There are a lot of similarities with vowpal-wabbit. The biggest difference is that tangram has libraries for a lot of popular languages (js, go, ruby, elixir, etc.), and has a web application for learning about your model and monitoring it in production.

ujeezy4y ago· 1 in thread

I love the simplicity - this tickles my brain in the same way that Firebase did when I first saw it :) Well done! Looking forward to playing with it.

nitskyOP4y ago

Cool! Let us know how it works for you, and open an issue on GitHub if you run into any trouble!

waterfirezero4y ago· 1 in thread

great works! Really impress with your software. However, how to convert tangram model to scikit model and vice versa?

nitskyOP4y ago

Thanks! At the moment, we do not have any ability to convert to/from scikit, but this is definitely something we would like to consider in the future. This would give users of scikit the ability to make predictions in production from their preferred language, and access to tangram's reporting and monitoring app.

nitskyOP4y ago

Hi HN! We are Isabella and David, and we're excited to share Tangram, our attempt to make ML easy for programmers who are not experts. With Tangram, you train a model from a CSV file on the command line, use your model from one of many languages (so far we have libraries for Elixir, Go, JavaScript, Python, Ruby, and Rust), and learn about your models and monitor them in production from a web app. There's a video on our homepage (https://www.tangram.dev) and we're on GitHub at https://github.com/tangramdotdev/tangram.

Over the past few months we have been working with a handful of early users. A team at a small company had a TensorFlow model deployed as a Flask service consumed by their Elixir app. They replaced it with a Tangram model because they didn't want to maintain a server separate from their monolith. A team of front end engineers at a large company was looking for a way to to train and deploy models on their own, without the overhead of involving their data scientists, machine learning engineers, or backend engineers. They trained a model on their own and embedded it directly in their React front-end with the Tangram JavaScript library that makes predictions with WebAssembly.

Tangram is written entirely in Rust, from the core machine learning algorithms, to the bindings for each language, to the front and back end of the web application. We have benefited from Rust's fast performance, strong typing, convenient tooling, and high quality libraries (serde, tokio, hyper, sqlx, and more).

We hope to make Tangram a sustainable business with the open core business model. The CLI and language libraries are MIT licensed, while the web application is source available, free to use for testing, but requires a paid license to use in production.

We would love to hear your feedback. Give it a try and let us know what you think!

tobiasks4y ago

This is very nice and I really like the CLI aspect. I would have one feature request: Serve the model via GrPC.

Im running ML in production for the last 4 years in the field of biology and need > million predictions per hour. Load-balacing on multiple servers (and staying with http request based predictions) is not an options as the latency would kill the application.

Am I in such niche? Or other question: are there many more people that run fine-tuned TF servings/TensorRT servers to keep up with the production need?

pavlovskyi4y ago

Really like the simplicity! At my actual workplace we are dealing with curse of model monitoring for a long time, because there was always a issue with intepretability and ease of adding new features (indicators, etc). And also, expectation of use of monitoring tools are much higher that its usability in real case scenarios. Great work!

surajs4y ago

this is the kind of tools we need not gpt3 or copilot

j / k navigate · click thread line to collapse

41 comments

38 comments · 12 top-level

civilized4y ago· 11 in thread

Why would it be desirable to train a model from a CSV on the command line? Is opening RStudio or IPython and running a couple commands the hard part of data science?

epr4y ago

> Why would it be desirable to train a model from a CSV on the command line?

cwyers4y ago

RStudio is an IDE and IPython is basically a fancy REPL, they're not GUI tools at all.

civilized4y ago

Yeah, I mean, isn't "python script.py" or "R script.R" a command line tool?

version_five4y ago

  curl | grep | awk > .csv

patterns I've used in real life, and having a way to train a model from the command line is a great complement.

Currently I would call a python training script from the command line, and do orchestrate everything in bash. I'm not sure why thay is undesirable.

isabellat4y ago

nitskyOP4y ago

moconnor4y ago

Two major benefits:

* Approachability. If the first step to trying ML on a problem is “download RStudio and learn how to use a GUI and unfamiliar language” that is a HUGE barrier to entry.

Tangram nails both of these and I’m excited to see where it goes. Happy to talk to you more (ex startup VP Product, current ML researcher) - DM me on Twitter (yieldthought).

civilized4y ago

Blindly running a CSV through a black box program is not the way to get confidence in your modeling.

1 more reply

civilized4y ago

nitskyOP4y ago

1 more reply

simonw4y ago

If you run something it on the command-line, automating it will be trivial.

1 more reply

IAmEveryone4y ago· 3 in thread

nitskyOP4y ago

Hey, you are right, we should definitely add an "about" page to our site. In the mean time...

David: https://www.linkedin.com/in/david-yamnitsky Isabella: https://www.linkedin.com/in/isabella-tromba/

I promise we exist and are real humans! But that is exactly what the aliens would say.

IAmEveryone4y ago

Yeah, I’ve avoid LinkedIn precisely because I abhor the smell of corpses/corporates it gives off.

> I promise we exist and are real humans! But that is exactly what the aliens would say

That… is a go-to joke I keep using on OKCupid, only with axe murderers instead of aliens.

I’ll stop hitting on you now, but might write up some impressions of the actual product in the next few days.

nitskyOP4y ago

Awesome, please let us know where you write them, we would love to read it!

Bostonian4y ago· 3 in thread

You could give the user an option to specify that the CSV file represents equally-spaced time series data, in which case you fit a time series model to predict the target column.

nitskyOP4y ago

Great idea! We definitely want to do this. We have an open issue to track it: https://github.com/tangramdotdev/tangram/issues/28.

Bostonian4y ago

nitskyOP4y ago

Yes! What we have in mind is this:

`tangram train --file data.csv --target price --date date`

This would predict the "price" column, using the date in the "date" column.

Internally, we can generate features from the date column like "hour of day", "day of week", "season", etc. that can help the model learn periodic patterns.

Thanks for the great suggestion.

anigbrowl4y ago· 3 in thread

This is very cool, finally the blue sky science of 30 years ago has descended to my humble command line.

Like the website design too.

isabellat4y ago

anigbrowl4y ago

nitskyOP4y ago

Thanks for the kind comment. We spent a lot of time trying to make the website feel hacker friendly, so programmers know that tangram is not made only for someone with a PhD.

drewcoo4y ago· 3 in thread

name collision: https://mathigon.org/tangram

jcoder4y ago

Isn’t this a bit like saying Ruby collides with a gem, python with a snake, or Java with an island?

isabellat4y ago

Yes, the name collision was very intentional :).

jhgb4y ago

Personally I'm much more confused by the existence of https://github.com/tangrams/tangram .

tdeck4y ago· 1 in thread

Reminds me of vowpal-wabbit, which also has a command line. Their website looks very corporate now but it's a free tool: https://vowpalwabbit.org/tutorials/cmd_first_steps.html

Not sure how its modeling techniques have kept up with the state of the art.

nitskyOP4y ago

ujeezy4y ago· 1 in thread

I love the simplicity - this tickles my brain in the same way that Firebase did when I first saw it :) Well done! Looking forward to playing with it.

nitskyOP4y ago

Cool! Let us know how it works for you, and open an issue on GitHub if you run into any trouble!

waterfirezero4y ago· 1 in thread

great works! Really impress with your software. However, how to convert tangram model to scikit model and vice versa?

nitskyOP4y ago

We would love to hear your feedback. Give it a try and let us know what you think!

tobiasks4y ago

This is very nice and I really like the CLI aspect. I would have one feature request: Serve the model via GrPC.

Am I in such niche? Or other question: are there many more people that run fine-tuned TF servings/TensorRT servers to keep up with the production need?

pavlovskyi4y ago

surajs4y ago

this is the kind of tools we need not gpt3 or copilot

j / k navigate · click thread line to collapse