gkamradt on Hacker News

Show HN: I Built a Semantic De-Deduplicator

Hey HN Crew!

We all have lists...and they can be annoying to de-duplicate.

* User feedback * Groceries * Employee Surveys * Bug reports * You name it

Most ways to consolidate like-items work off of keywords or worse, exact phrases (Sheets/Excel).

But LLMs are much better at understanding an items semantic meaning and determining if two items should be combined or not.

I decided to build my first python package, The Semantic Deduplicator, to help me consolidate items based on their meaning, not keywords.

For Example On Groceries: ['We need more berries', 'I want more more milk', 'Can we get more carbonated water please?', 'We need more sparkling water'] ...deduplicated... ['Berries', 'Milk', 'Sparkling Water']

How it works:

1. Start with an empty list ready to populate

2. The first item you add will get 1) transformed into a clean name (user feedback > product request) and 2) added to the list

3. While you're adding more items

* Check to see if your new item's embedding is close to any existing item

* If so, ask the LLM to compare your two items to see if they should be combined

* If so, combine them

This package is more of an exploration and POC so be careful with it. I'd love to hear any feedback.

All the links:

* YT Explainer Video: https://www.youtube.com/watch?v=etLsNgkGbeM

* Twitter Thread: https://twitter.com/GregKamradt/status/1719760658936545336

* Pypi: https://pypi.org/project/semantic-deduplicator/

* Github: https://github.com/gkamradt/SemanticDeduplicator

2gkamradt2y ago2

Show HN: How to Make 3D Bronze Mountain Maps – 3D Printing and Bronze Casting (opens in new tab)

(gregkamradt.com)

5gkamradt6y ago3

Selling 3D Bronze Topography Maps: Idea Generation and Market Validation (opens in new tab)

(gregkamradt.com)

3gkamradt6y ago2

What's Blocking Your Data Scientists: Customer Empathy (opens in new tab)

(gregkamradt.com)

2gkamradt6y ago0

What's Blocking Your Data Scientists (opens in new tab)

(gregkamradt.com)

1gkamradt6y ago0

What I say to people who are looking for a job (opens in new tab)

(gregkamradt.com)

288gkamradt7y ago68

Lessons Learned the Hard Way: Data Science Interviews (opens in new tab)

(youtube.com)Video

1gkamradt10y ago0

Lessons learned: 30+ data science interviews from a 3 month immersive graduate [pdf] (opens in new tab)

(github.com)PDF

6gkamradt10y ago0

Show HN: Ryd.io – Manhattan Blocks Clustered via Taxi Drop-Offs

http://ryd.io/

As a capstone project for Galvanize's data science immersive I took another look at the NYC Taxi data set. A ton of analysis has been done on individual rides/cars and I was curious about what story would be told by looking at this data through the aggregate.

Through the clustered map you can identify different 'personalities' of the city with a birds eye view. Check it out here http://ryd.io/cluster_map

I've just spent past couple weeks working hard on this project and would love to talk to anyone about it if they are interested.

After the conclusion of the program I'm excited to join a new data team and work on awesome problems.

Feel free to contact me with any questions

Tech: Backend - Python, Flask, Jinja Front - Bootstrap, leaflet, ajax Graphic - Originally in matplotlib/Cartodb and styled in photoshop Data Analysis - Python + stats packages

gkamradt {at} gmail

5gkamradt10y ago1

Show HN: Ryd.io

Ryd.io

Through the clustered map you can identify different 'personalities' of the city with a birds eye view. Check it out here http://ryd.io/cluster_map

I've just spent past couple weeks working hard on this project and would love to talk to anyone about it if they are interested.

After the conclusion of the program I'm excited to join a new data team and work on awesome problems.

Feel free to contact me with any questions

gkamradt {at} gmail

9gkamradt10y ago4

Show HN: I Built a Semantic De-Deduplicator

Hey HN Crew!

We all have lists...and they can be annoying to de-duplicate.

* User feedback * Groceries * Employee Surveys * Bug reports * You name it

Most ways to consolidate like-items work off of keywords or worse, exact phrases (Sheets/Excel).

But LLMs are much better at understanding an items semantic meaning and determining if two items should be combined or not.

I decided to build my first python package, The Semantic Deduplicator, to help me consolidate items based on their meaning, not keywords.

How it works:

1. Start with an empty list ready to populate

2. The first item you add will get 1) transformed into a clean name (user feedback > product request) and 2) added to the list

3. While you're adding more items

* Check to see if your new item's embedding is close to any existing item

* If so, ask the LLM to compare your two items to see if they should be combined

* If so, combine them

This package is more of an exploration and POC so be careful with it. I'd love to hear any feedback.

All the links:

* YT Explainer Video: https://www.youtube.com/watch?v=etLsNgkGbeM

* Twitter Thread: https://twitter.com/GregKamradt/status/1719760658936545336

* Pypi: https://pypi.org/project/semantic-deduplicator/

* Github: https://github.com/gkamradt/SemanticDeduplicator

Show HN: Ryd.io – Manhattan Blocks Clustered via Taxi Drop-Offs

http://ryd.io/

Through the clustered map you can identify different 'personalities' of the city with a birds eye view. Check it out here http://ryd.io/cluster_map

I've just spent past couple weeks working hard on this project and would love to talk to anyone about it if they are interested.

After the conclusion of the program I'm excited to join a new data team and work on awesome problems.

Feel free to contact me with any questions

Tech: Backend - Python, Flask, Jinja Front - Bootstrap, leaflet, ajax Graphic - Originally in matplotlib/Cartodb and styled in photoshop Data Analysis - Python + stats packages

gkamradt {at} gmail

Show HN: Ryd.io

Ryd.io

Through the clustered map you can identify different 'personalities' of the city with a birds eye view. Check it out here http://ryd.io/cluster_map

I've just spent past couple weeks working hard on this project and would love to talk to anyone about it if they are interested.

After the conclusion of the program I'm excited to join a new data team and work on awesome problems.

Feel free to contact me with any questions

gkamradt {at} gmail

gkamradt

Recent submissions

Show HN: ARC-AGI-3 Toolkit (opens in new tab)

Arc-AGI-2 and ARC Prize 2025 (opens in new tab)

How the cofounder of Zapier recruited me to run a $1M AI competition (opens in new tab)

Scaling LLMs apps via accuracy, latency, cost (opens in new tab)

Show HN: I Built a Semantic De-Deduplicator

Show HN: How to Make 3D Bronze Mountain Maps – 3D Printing and Bronze Casting (opens in new tab)

Selling 3D Bronze Topography Maps: Idea Generation and Market Validation (opens in new tab)

What's Blocking Your Data Scientists: Customer Empathy (opens in new tab)

What's Blocking Your Data Scientists (opens in new tab)

What I say to people who are looking for a job (opens in new tab)

Lessons Learned the Hard Way: Data Science Interviews (opens in new tab)

Lessons learned: 30+ data science interviews from a 3 month immersive graduate [pdf] (opens in new tab)

Show HN: Ryd.io – Manhattan Blocks Clustered via Taxi Drop-Offs

Show HN: Ryd.io

Recent submissions

Show HN: ARC-AGI-3 Toolkit (opens in new tab)

Arc-AGI-2 and ARC Prize 2025 (opens in new tab)

How the cofounder of Zapier recruited me to run a $1M AI competition (opens in new tab)

Scaling LLMs apps via accuracy, latency, cost (opens in new tab)

Show HN: I Built a Semantic De-Deduplicator

Show HN: How to Make 3D Bronze Mountain Maps – 3D Printing and Bronze Casting (opens in new tab)

Selling 3D Bronze Topography Maps: Idea Generation and Market Validation (opens in new tab)

What's Blocking Your Data Scientists: Customer Empathy (opens in new tab)

What's Blocking Your Data Scientists (opens in new tab)

What I say to people who are looking for a job (opens in new tab)

Lessons Learned the Hard Way: Data Science Interviews (opens in new tab)

Lessons learned: 30+ data science interviews from a 3 month immersive graduate [pdf] (opens in new tab)

Show HN: Ryd.io – Manhattan Blocks Clustered via Taxi Drop-Offs

Show HN: Ryd.io