# Background (The challenge and the subject system)
My goal was to improve performance/cost ratio for my Kubernetes cluster. For performance, the focus was on increasing throughput.
The operations in the subject system were primarily CPU-bound, we had a good amount of spare memory available at our disposal. Horizontal scaling was not possible architecturally. If you want to dive deeper, here's the code for key components of the system (and architecture in readme) - https://github.com/rudderlabs/rudder-server, https://github.com/rudderlabs/rudder-transformer, https://github.com/rudderlabs/rudderstack-helm.
For now, all you need to understand is that the Network IO was the key concern in scaling as the system's primary job was to make API calls to various destination integrations. Throughput was more important than latency.
# Solution
Increasing CPU when needed. Kuberenetes Vertical Pod Autoscaler (VPA) was the key tool that helped me drive this optimization. VPA automatically adjusts the CPU and memory requests and limits for containers within pods.
# What I liked about VPA
* I like that VPA right-sizes from live usage and—on clusters with in-place pod resize—can update requests without recreating pods, which lets me be aggressive on both scale-up and scale-down improving bin-packing and cutting cost.
* Another thing I like about VPA is that I can run multiple recommenders and choose one per workload via spec.recommenders, so different usage patterns (frugal, spiky, memory-heavy) get different percentiles/decay without per-Deployment knobs.
# My challenge with VPA
One challenge I had with VPA is limited per-workload tuning (beyond picking the recommender and setting minAllowed/maxAllowed/controlledValues), aggressive request changes can cause feedback loops or node churn; bursty tails make safe scale-down tricky; and some pods (init-heavy etc) still need carve-outs.
That's all for today. Happy to hear your thoughts, questions, and probably your own experience with VPA. I did learn that k8s team is working on some of these feedback and we might see it get solved in 2026.
No need to make an exhaustive list, sharing even one can help the discussion.
Anyone thinking about vertical scaling or using VPA in production, I hope my experience helps you learn a thing or two. Do share your experience as well for a well-rounded discussion.
-----
Background (The challenge and the subject system)
My goal was to improve performance/cost ratio for my Kubernetes cluster. For performance, the focus was on increasing throughput.
The operations in the subject system were primarily CPU-bound, we had a good amount of spare memory available at our disposal. Horizontal scaling was not possible architecturally. If you want to dive deeper, here's the code for key components of the system (and architecture in readme) -
* rudder-server - https://github.com/rudderlabs/rudder-server
* rudder-transformer - https://github.com/rudderlabs/rudder-transformer
* rudderstack-helm - https://github.com/rudderlabs/rudderstack-helm
For now, all you need to understand is that the Network IO was the key concern in scaling as the system's primary job was to make API calls to various destination integrations. Throughput was more important than latency.
------
Solution
Increasing CPU when needed. Kuberenetes Vertical Pod Autoscaler (VPA) was the key tool that helped me drive this optimization. VPA automatically adjusts the CPU and memory requests and limits for containers within pods.
------
What I liked about VPA
* I like that VPA right-sizes from live usage and—on clusters with in-place pod resize—can update requests without recreating pods, which lets me be aggressive on both scale-up and scale-down improving bin-packing and cutting cost.
* Another thing I like about VPA is that I can run multiple recommenders and choose one per workload via spec.recommenders, so different usage patterns (frugal, spiky, memory-heavy) get different percentiles/decay without per-Deployment knobs.
------
My challenge with VPA
One challenge I had with VPA is limited per-workload tuning (beyond picking the recommender and setting minAllowed/maxAllowed/controlledValues), aggressive request changes can cause feedback loops or node churn; bursty tails make safe scale-down tricky; and some pods (init-heavy etc) still need carve-outs.
------
That's all for today. Happy to hear your thoughts, questions, and probably your own experience with tools like VPA, and dealing with challenges of scale.
What I’m testing: RudderStack iOS SDK, it is used to track customer event data and send it to various product, marketing, and business tools.
The problem in my current testing workflow: Manual testing is important for quality assurance. In the case of testing RudderStack SDK, it requires multiple time-consuming and error-prone steps such as - plan specific steps for the test, perform interactions, review lengthy amounts of log text, and then verify logs which includes comparing long IDs.
The solution I experimented with: I leveraged LLM to plan test steps, used mobile-mcp to simulate user interactions (clicking some buttons such as track, reset, track, etc.), review logs using LLM (verify the event ID changes sent to the server), and prepare a final comprehensive report. All packaged as an MCP server that can work in my IDE (cursor) with test cases as prompt in plain English.
Result: My agent did click through track → reset → track and caught the anonymous ID change (something that ensures the tracking by the SDK worked properly)
What actually worked:
- Once set up, it did catch the regression correctly - Consistent results vs my manual testing where I sometimes miss things
Issues I ran into:
- Had to write extremely detailed step-by-step instructions and extensive context. If I missed anything, it just failed
- WebDriver setup on port 4723 was finicky
- It is slow. Took 2 minutes for what should be a 30-second manual test
Biggest problem: The amount of upfront work to get it running properly. I spent more time writing instructions than I would have just testing manually.
The real value might be in consistency for regression testing, not speed. But the initial investment is rough.
What would make this useful:
I need to create a workflow where, based on the feature or fixes, agents automatically generate test cases—including all edge cases—targeting the code impacted by the changes, and then perform a thorough end-to-end QA.
Has anyone else tried automating QA using AI? How was your experience and how did you resolve the challenges you faced? (I want to learn the practice that I can incorporate in my workflow)
Now that it's been quite some time using Pulsar, I feel that I can share some notes about my experience in replacing postgres-based streaming solutions with Pulsar and hopefully learn from your opinions/insights.
----
What I liked about Pulsar:
1. Tenant isolation is solid, auto load balancing works well: We haven't experienced so far a chatty tenant affecting others. We use the same cluster to ingest the data of all our customers (per region, one in US, one in EU). MultiTenancy along with cluster auto-scaling allowed us to contain costs.
2. No more single points of failure (data replicated across bookies): Data is replicated in at least two bookies now. This made us a lot more reliable when it comes to data loss.
3. Maintenance is easier: No single master constraint anymore, this simplified a lot of the infra maintenance (imagine having to move a Postgres pod into a different EC2 node, it could lead to downtime).
----
What's painful about Pulsar:
1. StreamNative licensing costs were significant
2. Network costs considerably increased with multi-AZ + replication
3. Learning curve was steeper than expected, also it was more complex to debug
----
Would love to hear your experience with Postgres/Pulsar, any opinions or insights on the approach/challenges. I hope this dialogue helps others in the community, feel free to ask me anything.