rudderdev | Better HN

rudderdev

25 karmaJoined July 2, 202549 submissions

Recent submissions

Uber and Walmart customer data at risk as its vendor Woflow gets compromised (opens in new tab)

(securityboulevard.com)

Tell HN: My Experience with Vertical Pod Autoscaler (VPA) in 2025

It was counter-intuitive to see this much cost saving by vertical scaling, by increasing CPU. VPA played a big role in this. If you are exploring to use VPA in production, I hope my experience helps you learn a thing or two. Do share your experience as well for a well-rounded discussion.

# Background (The challenge and the subject system)

My goal was to improve performance/cost ratio for my Kubernetes cluster. For performance, the focus was on increasing throughput.

The operations in the subject system were primarily CPU-bound, we had a good amount of spare memory available at our disposal. Horizontal scaling was not possible architecturally. If you want to dive deeper, here's the code for key components of the system (and architecture in readme) - https://github.com/rudderlabs/rudder-server, https://github.com/rudderlabs/rudder-transformer, https://github.com/rudderlabs/rudderstack-helm.

For now, all you need to understand is that the Network IO was the key concern in scaling as the system's primary job was to make API calls to various destination integrations. Throughput was more important than latency.

# Solution

Increasing CPU when needed. Kuberenetes Vertical Pod Autoscaler (VPA) was the key tool that helped me drive this optimization. VPA automatically adjusts the CPU and memory requests and limits for containers within pods.

# What I liked about VPA

* I like that VPA right-sizes from live usage and—on clusters with in-place pod resize—can update requests without recreating pods, which lets me be aggressive on both scale-up and scale-down improving bin-packing and cutting cost.

* Another thing I like about VPA is that I can run multiple recommenders and choose one per workload via spec.recommenders, so different usage patterns (frugal, spiky, memory-heavy) get different percentiles/decay without per-Deployment knobs.

# My challenge with VPA

One challenge I had with VPA is limited per-workload tuning (beyond picking the recommender and setting minAllowed/maxAllowed/controlledValues), aggressive request changes can cause feedback loops or node churn; bursty tails make safe scale-down tricky; and some pods (init-heavy etc) still need carve-outs.

That's all for today. Happy to hear your thoughts, questions, and probably your own experience with VPA. I did learn that k8s team is working on some of these feedback and we might see it get solved in 2026.

1rudderdev5mo ago0

Personalization from Matrix Factorization to LLMs (opens in new tab)

(rudderstack.com)

2rudderdev5mo ago0

Revolut hits $75B valuation (opens in new tab)

(news.crunchbase.com)

3rudderdev7mo ago3

Standard event schema for AI product analytics (opens in new tab)

(rudderstack.com)

1rudderdev8mo ago0

Ask HN: What metrics do you measure for your AI features

Do you measure metrics for the new AI feature or product you launched? What metrics do you measure? What do they help you achieve?

No need to make an exhaustive list, sharing even one can help the discussion.

1rudderdev8mo ago0

Ask HN: Why is Singapore leading in AI adoption

I was reading Anthropic's enterprise AI adoption report. And Singapore seems to lead (as per their economic index), adoption by working population. Why is that and how does it feel like? Anyone from Singapore, would love to hear your thoughts.

1rudderdev8mo ago0

Pleasantly surprised to see Docker MCP Hub, wdyt? (opens in new tab)

(hub.docker.com)

1rudderdev9mo ago0

Ask HN: Experienced the counter-intuitive cost reduction by increasing CPU?

Wanted to share the insights I learned while working on RudderStack. It was counter-intuitive to see this much cost saving by vertical scaling, by increasing CPU. Have you experienced something similar? In my story, the Kubernetes Vertical Pod Autoscaler (VPA) is the hero.

Anyone thinking about vertical scaling or using VPA in production, I hope my experience helps you learn a thing or two. Do share your experience as well for a well-rounded discussion.

-----

Background (The challenge and the subject system)

My goal was to improve performance/cost ratio for my Kubernetes cluster. For performance, the focus was on increasing throughput.

* rudder-server - https://github.com/rudderlabs/rudder-server

* rudder-transformer - https://github.com/rudderlabs/rudder-transformer

* rudderstack-helm - https://github.com/rudderlabs/rudderstack-helm

------

Solution

------

What I liked about VPA

------

My challenge with VPA

------

That's all for today. Happy to hear your thoughts, questions, and probably your own experience with tools like VPA, and dealing with challenges of scale.

1rudderdev9mo ago0

Ask HN: Experience automating E2E manual testing with AI

I see lots of discussions around using AI in testing. Let's make this discussion more objective and useful by sharing our experiences, here's my experience of using AI to automate e2e manual testing (especially where user interaction is required):

What I’m testing: RudderStack iOS SDK, it is used to track customer event data and send it to various product, marketing, and business tools.

The problem in my current testing workflow: Manual testing is important for quality assurance. In the case of testing RudderStack SDK, it requires multiple time-consuming and error-prone steps such as - plan specific steps for the test, perform interactions, review lengthy amounts of log text, and then verify logs which includes comparing long IDs.

The solution I experimented with: I leveraged LLM to plan test steps, used mobile-mcp to simulate user interactions (clicking some buttons such as track, reset, track, etc.), review logs using LLM (verify the event ID changes sent to the server), and prepare a final comprehensive report. All packaged as an MCP server that can work in my IDE (cursor) with test cases as prompt in plain English.

Result: My agent did click through track → reset → track and caught the anonymous ID change (something that ensures the tracking by the SDK worked properly)

What actually worked:

- Once set up, it did catch the regression correctly - Consistent results vs my manual testing where I sometimes miss things

Issues I ran into:

- Had to write extremely detailed step-by-step instructions and extensive context. If I missed anything, it just failed

- WebDriver setup on port 4723 was finicky

- It is slow. Took 2 minutes for what should be a 30-second manual test

Biggest problem: The amount of upfront work to get it running properly. I spent more time writing instructions than I would have just testing manually.

The real value might be in consistency for regression testing, not speed. But the initial investment is rough.

What would make this useful:

I need to create a workflow where, based on the feature or fixes, agents automatically generate test cases—including all edge cases—targeting the code impacted by the changes, and then perform a thorough end-to-end QA.

Has anyone else tried automating QA using AI? How was your experience and how did you resolve the challenges you faced? (I want to learn the practice that I can incorporate in my workflow)

1rudderdev9mo ago0

My experience with Apache Pulsar to solve PostgreSQL multi-tenant pain

Background: At RudderStack, I had been successfully using Postgres for the event streaming use case, scaled to 100k events/sec (note: there were good reasons to choose Postgres over Kafka). Nevertheless, we continue to further explore opportunities to optimize. So I and my team started experimenting with Pulsar (only for the parts of our system - data ingestion specifically). We experimented with Apache Pulsar for ingesting data vs having dedicated Postgres databases per customer (one customer can have 1+ Postgres databases, they would be all master nodes with no ability to share data which would need to be manually migrated each time a scaling operation happens).

Now that it's been quite some time using Pulsar, I feel that I can share some notes about my experience in replacing postgres-based streaming solutions with Pulsar and hopefully learn from your opinions/insights.

----

What I liked about Pulsar:

1. Tenant isolation is solid, auto load balancing works well: We haven't experienced so far a chatty tenant affecting others. We use the same cluster to ingest the data of all our customers (per region, one in US, one in EU). MultiTenancy along with cluster auto-scaling allowed us to contain costs.

2. No more single points of failure (data replicated across bookies): Data is replicated in at least two bookies now. This made us a lot more reliable when it comes to data loss.

3. Maintenance is easier: No single master constraint anymore, this simplified a lot of the infra maintenance (imagine having to move a Postgres pod into a different EC2 node, it could lead to downtime).

----

What's painful about Pulsar:

1. StreamNative licensing costs were significant

2. Network costs considerably increased with multi-AZ + replication

3. Learning curve was steeper than expected, also it was more complex to debug

----

Would love to hear your experience with Postgres/Pulsar, any opinions or insights on the approach/challenges. I hope this dialogue helps others in the community, feel free to ask me anything.

3rudderdev9mo ago0

Ask HN: What are some advanced fact check strategies

I have been reading a lot of technology news lately from diverse sources and observed some patterns that are easy to verify - outdated info presented as news, misleading title, etc. On the other hand, some news pieces look credible but when I start to verify it leads to a dead-end (or at least nothing that wouldn't take hours to verify). Google's fact-check tool seems to work only on old and popular news only.

The case in point right now, this article: https://techxplore.com/news/2025-07-vulnerability-packet-paralyze-smartphones.html

* I did not find any other reputed source linking back to this article. * The site has decent domain authority. * Searched the publishing institute (KAIST) website with google search (https://www.google.com/search?q=LLFuzz+RCE+site:https://www.kaist.ac.kr/) and did find the original article (likely the original source)

So far so good. But the concerns are

1. Is this enough fact-checking, what other techniques am I missing? 2. This whole process takes time, are there any other known techniques/tools to further reduce the time to fact-check and analyze

2rudderdev10mo ago2

A single packet can paralyze smartphones (opens in new tab)

(techxplore.com)

4rudderdev10mo ago0

Neural network for large-scale celestial object classification (opens in new tab)

(iopscience.iop.org)

4rudderdev10mo ago0

New advanced microscopy method is open-source and open-access (opens in new tab)

(phys.org)

9rudderdev11mo ago0

Recent submissions

Uber and Walmart customer data at risk as its vendor Woflow gets compromised (opens in new tab)

(securityboulevard.com)

1rudderdev3mo ago2

Tell HN: My Experience with Vertical Pod Autoscaler (VPA) in 2025

# Background (The challenge and the subject system)

My goal was to improve performance/cost ratio for my Kubernetes cluster. For performance, the focus was on increasing throughput.

# Solution

# What I liked about VPA

# My challenge with VPA

1rudderdev5mo ago0

Personalization from Matrix Factorization to LLMs (opens in new tab)

(rudderstack.com)

2rudderdev5mo ago0

Revolut hits $75B valuation (opens in new tab)

(news.crunchbase.com)

3rudderdev7mo ago3

Standard event schema for AI product analytics (opens in new tab)

(rudderstack.com)

1rudderdev8mo ago0

Ask HN: What metrics do you measure for your AI features

Do you measure metrics for the new AI feature or product you launched? What metrics do you measure? What do they help you achieve?

No need to make an exhaustive list, sharing even one can help the discussion.

1rudderdev8mo ago0

Ask HN: Why is Singapore leading in AI adoption

1rudderdev8mo ago0

Pleasantly surprised to see Docker MCP Hub, wdyt? (opens in new tab)

(hub.docker.com)

1rudderdev9mo ago0

Ask HN: Experienced the counter-intuitive cost reduction by increasing CPU?

Anyone thinking about vertical scaling or using VPA in production, I hope my experience helps you learn a thing or two. Do share your experience as well for a well-rounded discussion.

-----

Background (The challenge and the subject system)

My goal was to improve performance/cost ratio for my Kubernetes cluster. For performance, the focus was on increasing throughput.

* rudder-server - https://github.com/rudderlabs/rudder-server

* rudder-transformer - https://github.com/rudderlabs/rudder-transformer

* rudderstack-helm - https://github.com/rudderlabs/rudderstack-helm

------

Solution

------

What I liked about VPA

------

My challenge with VPA

------

That's all for today. Happy to hear your thoughts, questions, and probably your own experience with tools like VPA, and dealing with challenges of scale.

1rudderdev9mo ago0

Ask HN: Experience automating E2E manual testing with AI

What I’m testing: RudderStack iOS SDK, it is used to track customer event data and send it to various product, marketing, and business tools.

Result: My agent did click through track → reset → track and caught the anonymous ID change (something that ensures the tracking by the SDK worked properly)

What actually worked:

- Once set up, it did catch the regression correctly - Consistent results vs my manual testing where I sometimes miss things

Issues I ran into:

- Had to write extremely detailed step-by-step instructions and extensive context. If I missed anything, it just failed

- WebDriver setup on port 4723 was finicky

- It is slow. Took 2 minutes for what should be a 30-second manual test

Biggest problem: The amount of upfront work to get it running properly. I spent more time writing instructions than I would have just testing manually.

The real value might be in consistency for regression testing, not speed. But the initial investment is rough.

What would make this useful:

Has anyone else tried automating QA using AI? How was your experience and how did you resolve the challenges you faced? (I want to learn the practice that I can incorporate in my workflow)

1rudderdev9mo ago0

My experience with Apache Pulsar to solve PostgreSQL multi-tenant pain

----

What I liked about Pulsar:

2. No more single points of failure (data replicated across bookies): Data is replicated in at least two bookies now. This made us a lot more reliable when it comes to data loss.

----

What's painful about Pulsar:

1. StreamNative licensing costs were significant

2. Network costs considerably increased with multi-AZ + replication

3. Learning curve was steeper than expected, also it was more complex to debug

----

Would love to hear your experience with Postgres/Pulsar, any opinions or insights on the approach/challenges. I hope this dialogue helps others in the community, feel free to ask me anything.

3rudderdev9mo ago0

Ask HN: What are some advanced fact check strategies

The case in point right now, this article: https://techxplore.com/news/2025-07-vulnerability-packet-paralyze-smartphones.html

So far so good. But the concerns are

1. Is this enough fact-checking, what other techniques am I missing? 2. This whole process takes time, are there any other known techniques/tools to further reduce the time to fact-check and analyze

2rudderdev10mo ago2