Before beginning my PhD, I worked at DagsHub fine tuning vision models for domain specific deployments. I wanted to use ML models to help label the data, based on Label Studio's ML Backends. The goal was to use a model registered and tracked on MLflow.
I found it tedious and involved a lot of boilerplate code to pipeline an MLFlow registered model into Label Studio's ML backend . Part of the challenges was setting up the web server, adapting the model outputs and reading through a lot of documentation on all three tools (MLflow, Label Studio, and DagsHub). So I spent some streamlining the process.
The project is finally merged so I wanted to share it with you! Since DagsHub integrates both MLflow & Label Studio, it sets up an end-to-end pipeline for active learning.
Overview of functionality:
- Connects MLFlow-registered models into Label Studio. - Allows inference and labeling for your models with a single function call change. - Includes Pre-configured models for common tasks across vision / audio / text domains. - Makes it easy to customize with user-defined hooks - Integrates cleanly with DagsHub, making it straightforward to set up an active learning pipeline
I wanted to make auto-labeling easy for ML engineers without needing to learn web development stuff. The setup is simple:
1. Clone the repo and build the Docker container 2. Run the container or use the orchestrator 3. Use DagsHub’s Python client to connect your MLflow model to Label Studio
Would love for you all to try it out and share your thoughts. If anyone's interested in making it work independently of DagsHub, PRs are welcome!
Repo: https://github.com/DagsHub/ls-configurable-model
Docs: https://dagshub.com/docs/use_cases/auto_labeling/
Cheers :)
Hey everyone -
I've been working to reproduce [CheXNet](https://arxiv.org/pdf/1711.05225.pdf) - a fantastic paper describing research on a model capable of radiologist-grade pathology classification!
CheXNet uses Class Activation Mappings (CAMs for short) to generate heatmaps that identify what parts of the image the model uses to base its classification. In my case, I'm facing a bit of a struggle reproducing them - most of our classifications are derived from the diaphragm, instead of regions within the lung. Curiously, we are attaining a reasonable AUROC, with .773 on training and .749 on validation data - the paper reports .8062 AUROC.
My current model is being trained on a subsample of the main dataset, and I'm basically looking to this as a way to validate the architecture. I'd love to know if anyone has experienced similar issues and solved them, and could have any input here as well.
If you have a moment to spare - I'd be super grateful for some help from the hackernews community in solving the inaccurate localization issue! [https://dagshub.com/nirbarazida/Pneumonia-Classification/issues/58]
An incorrect localization, despite a correct classification: https://dagshub.com/nirbarazida/Pneumonia-Classification/raw/4c1414d3cd5cf8c693a4f7931843495bd4d96751/evaluating/heatmap_eval/00005532_000_Cardiomegaly.png