I'm curious about your ML stack that is also used in production. What has failed, what has given joy?
Have you managed to set up a reliable "MLOps" environment with a small(!) team? What are the ingredients?
To what extent do you monitor your model inference performance? Is there an automated KPI tracking in place to make sure the new model architecture or a new set of weights perform as expected?
How much of your deployment has moved to an "ML Cloud"? Whether it's an AWS, GCP or Azure ML-specific services. Which are the ingredients?