- Where does it sit in relation to other data science components?
- What does it integrate with, and what is it agnostic to?
- Smallest self-contained use case? I see some examples in the repo readme but they are not self-contained so its harder for me to imagine its use
You could not be more wrong about my perspective though.
My question is from this perspective: I use tensorflow, keras, numpy, scikit-learn, pandas, and Im looking to understand how I would integrate these.
My assumption is that would be helpful to other readers and thus the author. Do you prefer having key information buried?
One more question, why treating feature engineering and ML models the same way is important?
* Apply a (say, SVM) classification model to each object (row) by producing a new column
* Generate a new column as a difference between its values and its average over all rows.
In both cases, you produce a new column (=feature) by applying some transformation. Also, in both cases, you need to find parameters of this transformation from the data. In the first case, by training SVM model. In the second case, by find the average value.
Conclusion: there is no essential difference between defining/training a feature and a ML model.
Very intriguing and thoughtful statement. I hadn't ever thought of it that way.
E.g. if you have an N-layer neural network, N-1 layers are doing feature learning, and the Nth layer is a simple {logistic, multinomial/softmax, gaussian, poisson, ...} model
"The kernel trick avoids the explicit mapping that is needed to get linear learning algorithms to learn a nonlinear function or decision boundary."
(what powers Support Vector Machines, the neural networks of the 90s, and still alive and kicking today)
[1] https://en.wikipedia.org/wiki/Kernel_method#Mathematics:_the...
We sometimes get hung up on correcting and contradicting people, often missing a deeper truth. It takes skill to find the grain of truth and build on it :-)
I see this tool as a something that can help with the deployment piece of the feature engineering. As things stand, it's "easy" to package and deploy modeling code, but much harder to package up your feature engineering workflow. In part because there is no agreed upon standard for developing feature engineering pipelines.
I'd be curious how this could be combined with a library like Featuretools (http://github.com/featuretools/featuretools/) which helps automate the discovery of features, but currently has less functionality related to deployment.
(full disclosure: I work on Featuretools)
* Feature engineering and machine learning: Lambdo does not distinguish them and treats them as data transformations
* Model training and prediction: both feature definitions and ML models can be trained as part of one workflow
* Table population and column evaluation: workflow consists of nodes of these two types. This makes it similar to Bistro: https://github.com/asavinov/bistro
Lambdo is intended for the following use cases:
* Numerous derived features with parameters derived from the data
* Regular re-training is required by using the same features as those to be used during prediction
* Time series analysis because it is where the quality of derived features is especially important
* Customization via user-defined Python functions