- compute a huge number of features, many of which are quite complex (involving data collected from throughout the payment process), in real-time: e.g. how many distinct IP addresses have we seen this card from over its entire history on Stripe, how many distinct cards have we seen from the IP address over its history, and do payments from this card usually come from this IP address?
- train custom models for all Stripe users who have enough data to make this feasible, necessitating the ability to train large numbers of models in parallel,
- provide human-readable explanations as to why we think a payment has the score that it does (which involves building simpler “explanation models”—which are themselves machine learning models—on top of the core fraud models),
- surface model performance and history in the Radar dashboard,
- allow users to customize the risk score thresholds at which we action payments in Radar for Fraud teams,
- and so forth.
We found that getting everything exactly right on the data-ML-product interactions necessitated our building most of the stack ourselves.
That said, we do use a number of open source tools—we use TensorFlow and pytorch for our deep learning work, xgboost for training boosted trees, and Scalding and Hadoop for our core data processing, among others.