FDS is an open source command line wrapper around Git and DVC (https://dvc.org), meant to minimize the chances of human error, automate repetitive tasks, and provide a smoother landing for new users.
Quickstart: pip install fastds fds -h
Full blog post on the motivations and goals of the project:
https://dagshub.com/blog/fds-fast-data-science-with-git-and-dvc
Summary:
Why is it called "fds"?
Just take a look at your keyboard - it's so silky smoove to type "fds"! This is important for a command line tool that exists to improve ease of use and delight users.
*In fact, due to popular demand, you can also type "sdf" instead of "fds" for an even more epic experience!
Why did we do this?
As we were developing Open Source Data Science projects using DVC, we often found ourselves making the same mistakes over and over again, and constantly repeating pairs of commands like "git status" and "dvc status".
So, we set about creating FDS with these goals in mind:
1. Automate common tasks when working with git, DVC, and potentially other tools which work well together later on. 2. Provide a more interactive and opinionated UI and UX. Git and DVC are low level utilities which need to work well in scripts and support all possible use cases - this means interacting with them feels like interacting with a command line API, rather than a wizard or app. FDS orients itself to be used by humans, for convenience rather than total flexibility. This also means that FDS can afford to introduce workflows and features faster than DVC (and definitely faster than Git), since it is doesn't have to be stable infrastructure. 3. Provide a smoother landing for new users by making things easy by default and explaining what's going on.
Pull requests are Extremely Welcome!
One thing we kept running into is users being challenged by setting up the cloud environment necessary, to push their data to AWS S3, GCS (which we support), and other DVC remotes. Since our focus is on community projects, it was clear that this is a lot of friction that we don't want, as it gets in the way of users actually working on data science.
To fix this, we built DAGsHub Storage. It's the easiest DVC remote you can set up, and we made the process of setting it up as close as possible to the process of setting up a Git remote. That means that you don't need a credit card, define storage access permissions, IAM, etc. Just type in `dvc remote add origin [https://dagshub.com/<user>/<project>.dvc](https://dagshub.com/<user>/<project>.dvc)`.
Since our goal is to spur community projects and collaboration, DAGsHub storage is free for open source projects, and we also gave some free storage to private projects, because, heck, why not.
After you push your data to DAGsHub Storage, you'll be able to view it, compare versions and share it with collaborators easily, as well as control team access. This means that you can contribute data to projects, not just code. I hope this makes ML work better and promote Open Source Data Science.
Unlike GitLFS, we're offering significantly more storage, basing it on the much more data science oriented DVC, and actually letting you browse, compare and share the data files and artifacts via the UI.
As usual, I'd love to hear your feedback!
https://dagshub.com/blog/dagshub-storage-zero-configuration-dataset-model-hosting