Show HN: Lambdo – Feature engineering and machine learning together (opens in new tab)

(github.com)

73 pointsasavinov7y ago24 comments

24 comments

15 comments · 4 top-level

mooneater7y ago· 6 in thread

Things I always want when looking at something new:

- Where does it sit in relation to other data science components?

- What does it integrate with, and what is it agnostic to?

- Smallest self-contained use case? I see some examples in the repo readme but they are not self-contained so its harder for me to imagine its use

srean7y ago

Your comment reminds me so much of my PM who asks questions like these to give an appearance that he is putting in some serious effort and energy to understand. Add a few words like 'value', 'leverage' and 'resonate' and the impersonation would be pitch perfect .... Looks up contact detail oops spoke to soon should have guessed.

mooneater7y ago

> 'value', 'leverage' and 'resonate'

You could not be more wrong about my perspective though.

My question is from this perspective: I use tensorflow, keras, numpy, scikit-learn, pandas, and Im looking to understand how I would integrate these.

1 more reply

mooneater7y ago

Haha :) My comment is precisely aimed at reducing the energy needed to understand the value prop here. Why is that bad?

My assumption is that would be helpful to other readers and thus the author. Do you prefer having key information buried?

mcrad7y ago

By the same token, your comment reminds me of an engineer who puts serious effort into making his work look more important than it is. Even the term "feature engineering" is pretty ridiculous. As if academics found real-world data to be too messy and decided to outsource this little cost to.....a department of expert feature-makers(?)

1 more reply

pplonski867y ago

I'm also reading the readme and trying to guess why I need this package?

One more question, why treating feature engineering and ML models the same way is important?

asavinovOP7y ago

Because they are so much similar that it is much easier to implement them as part of one mechanism. For example:

* Apply a (say, SVM) classification model to each object (row) by producing a new column

* Generate a new column as a difference between its values and its average over all rows.

In both cases, you produce a new column (=feature) by applying some transformation. Also, in both cases, you need to find parameters of this transformation from the data. In the first case, by training SVM model. In the second case, by find the average value.

Conclusion: there is no essential difference between defining/training a feature and a ML model.

1 more reply

ericand7y ago· 4 in thread

> Feature engineering is a mechanism of creating new levels of abstraction in knowledge representation because each (non-trivial) feature extract and makes explicit some piece of knowledge hidden in the data. It is almost precisely what deep learning is intended for. In this sense, feature engineering does what hidden layers of a neural network do or what the convolutional layer of a neural network does

Very intriguing and thoughtful statement. I hadn't ever thought of it that way.

tvladeck7y ago

This is essentially repeating your quote, but an "aha moment" clicked for me when I read that what successful neural networks are basically doing is such good feature learning that the problem can be solved by a simple linear model in the end.

E.g. if you have an N-layer neural network, N-1 layers are doing feature learning, and the Nth layer is a simple {logistic, multinomial/softmax, gaussian, poisson, ...} model

halflings7y ago

Related:"the kernel trick" [1].

"The kernel trick avoids the explicit mapping that is needed to get linear learning algorithms to learn a nonlinear function or decision boundary."

(what powers Support Vector Machines, the neural networks of the 90s, and still alive and kicking today)

[1] https://en.wikipedia.org/wiki/Kernel_method#Mathematics:_the...

jacquesm7y ago

A quick - and not 100% correct but it will do - way of looking at deep learning is to think of it as automated feature detection and weighing.

brylie7y ago

Have no worry. Little, if anything, is 100% correct.

We sometimes get hung up on correcting and contradicting people, often missing a deeper truth. It takes skill to find the grain of truth and build on it :-)

kmax127y ago· 1 in thread

I definitely see the need for packages like this. So much of a data scientist's time is spent on feature engineering, but there are relatively few tools out there that are trying to improve that step in the process compared to tools for the modeling step.

I see this tool as a something that can help with the deployment piece of the feature engineering. As things stand, it's "easy" to package and deploy modeling code, but much harder to package up your feature engineering workflow. In part because there is no agreed upon standard for developing feature engineering pipelines.

I'd be curious how this could be combined with a library like Featuretools (http://github.com/featuretools/featuretools/) which helps automate the discovery of features, but currently has less functionality related to deployment.

(full disclosure: I work on Featuretools)

pplonski867y ago

I think there is a need for deployment approaches for both: feature engineering and modeling. For example, please consider feature scaling and case when the mean of the feature is drifting. Then the feature engineering and probably ML model needs to be updated. I'm not aware of ready solutions for such problems.

asavinovOP7y ago

Lambdo is a workflow engine which simplifies data analysis by combining in one analysis pipeline

* Feature engineering and machine learning: Lambdo does not distinguish them and treats them as data transformations

* Model training and prediction: both feature definitions and ML models can be trained as part of one workflow

* Table population and column evaluation: workflow consists of nodes of these two types. This makes it similar to Bistro: https://github.com/asavinov/bistro

Lambdo is intended for the following use cases:

* Numerous derived features with parameters derived from the data

* Regular re-training is required by using the same features as those to be used during prediction

* Time series analysis because it is where the quality of derived features is especially important

* Customization via user-defined Python functions

j / k navigate · click thread line to collapse

24 comments

15 comments · 4 top-level

mooneater7y ago· 6 in thread

Things I always want when looking at something new:

- Where does it sit in relation to other data science components?

- What does it integrate with, and what is it agnostic to?

- Smallest self-contained use case? I see some examples in the repo readme but they are not self-contained so its harder for me to imagine its use

srean7y ago

mooneater7y ago

> 'value', 'leverage' and 'resonate'

You could not be more wrong about my perspective though.

My question is from this perspective: I use tensorflow, keras, numpy, scikit-learn, pandas, and Im looking to understand how I would integrate these.

1 more reply

mooneater7y ago

Haha :) My comment is precisely aimed at reducing the energy needed to understand the value prop here. Why is that bad?

My assumption is that would be helpful to other readers and thus the author. Do you prefer having key information buried?

mcrad7y ago

1 more reply

pplonski867y ago

I'm also reading the readme and trying to guess why I need this package?

One more question, why treating feature engineering and ML models the same way is important?

asavinovOP7y ago

Because they are so much similar that it is much easier to implement them as part of one mechanism. For example:

* Apply a (say, SVM) classification model to each object (row) by producing a new column

* Generate a new column as a difference between its values and its average over all rows.

Conclusion: there is no essential difference between defining/training a feature and a ML model.

1 more reply

ericand7y ago· 4 in thread

Very intriguing and thoughtful statement. I hadn't ever thought of it that way.

tvladeck7y ago

E.g. if you have an N-layer neural network, N-1 layers are doing feature learning, and the Nth layer is a simple {logistic, multinomial/softmax, gaussian, poisson, ...} model

halflings7y ago

Related:"the kernel trick" [1].

"The kernel trick avoids the explicit mapping that is needed to get linear learning algorithms to learn a nonlinear function or decision boundary."

(what powers Support Vector Machines, the neural networks of the 90s, and still alive and kicking today)

[1] https://en.wikipedia.org/wiki/Kernel_method#Mathematics:_the...

jacquesm7y ago

A quick - and not 100% correct but it will do - way of looking at deep learning is to think of it as automated feature detection and weighing.

brylie7y ago

Have no worry. Little, if anything, is 100% correct.

We sometimes get hung up on correcting and contradicting people, often missing a deeper truth. It takes skill to find the grain of truth and build on it :-)

kmax127y ago· 1 in thread

(full disclosure: I work on Featuretools)

pplonski867y ago

asavinovOP7y ago

Lambdo is a workflow engine which simplifies data analysis by combining in one analysis pipeline

* Feature engineering and machine learning: Lambdo does not distinguish them and treats them as data transformations

* Model training and prediction: both feature definitions and ML models can be trained as part of one workflow

* Table population and column evaluation: workflow consists of nodes of these two types. This makes it similar to Bistro: https://github.com/asavinov/bistro

Lambdo is intended for the following use cases:

* Numerous derived features with parameters derived from the data

* Regular re-training is required by using the same features as those to be used during prediction

* Time series analysis because it is where the quality of derived features is especially important

* Customization via user-defined Python functions

j / k navigate · click thread line to collapse