Show HN: Vibe coded an AI chat app with features I wanted, Poe (opens in new tab)

(github.com)

1 pointsSamInTheShell7mo ago6 comments

It supports local inference with Ollama and LM Studio (going to add other provider support in the future).

The big thing for me in this project was really telegraphing where the working directory is.

6 comments

5 comments · 2 top-level

mutant7mo ago· 2 in thread

would be interested in reading your outlook on the execution model, isolation, guardrails, tool calling. those are some of the baseline things i evaluate before try before trying an agentic env

SamInTheShellOP7mo ago

Isolation is a smart idea for these things, as you literally can't verify their behavior beyond "it's kinda doing the thing most of the time". My chat app just kinda settles for "you can require permissions and audit", which it's a fool proof bullet, when the AI can churn out more code than a human can read in a reasonable amount of time.

mutant7mo ago

might be good to get your hands around this early.

reading up on how crush, goose, and opencode handle this may be a good idea.

i've been trying to build a web native terminal assistant for a while (just a side project) and this is easily the thing that keeps me up at night.

### Primary Sources: - *Anthropic Engineering Blog: "Making Claude Code more secure and autonomous with sandboxing"* Detailed article on Claude Code's sandboxing features, including OS-level primitives (e.g., Linux Bubblewrap, macOS Seatbelt) for filesystem and network isolation. [Read here](https://www.anthropic.com/engineering/claude-code-sandboxing) (Published Oct 20, 2025).

- *Claude Code Documentation: Sandboxing* Official docs covering setup, configuration, security benefits (e.g., prompt injection protection), and limitations of filesystem/network isolation in Claude Code. [Read here](https://code.claude.com/docs/en/sandboxing).

- *Claude Blog: "Beyond permission prompts: making Claude Code more secure and autonomous"* Overview of sandboxing in Claude Code, emphasizing boundaries for safer agent execution. [Read here](https://claude.com/blog/beyond-permission-prompts-making-cla...) (Published Oct 31, 2025).

### Additional Resources: For broader context on sandboxing agentic AI: - *arXiv Paper: "Securing AI Agent Execution"* Research on isolation techniques for AI agents, including risk assessment. [Read here](https://arxiv.org/abs/2510.21236) (Published Oct 24, 2025). - *HopX Documentation* Practical guide to sandboxing for AI agents (e.g., using Firecracker micro-VMs). [Read here](https://hopx.ai/) (Open-source SDK available at [GitHub](https://github.com/hopx-ai/sdk)).

### Cursor Cursor uses local-first editing with optional sandboxing via Docker containers for isolated execution (no default vendor-owned sandboxes). It respects user-defined rules without overriding them.

- *Skywork AI Blog: Security in Cursor 2.0* Details Cursor's sandboxing for code execution, network protection, and isolation. [Read here](https://skywork.ai/blog/vibecoding/cursor-2-0-security-priva...) (Published Nov 1, 2025).

- *Skywork AI Blog: Cursor 2.0 vs Claude Code SDK* Compares isolation techniques, noting Cursor's local sandboxes vs. Claude's cloud-based ones. [Read here](https://skywork.ai/blog/vibecoding/cursor-2-0-vs-claude-code...) (Published Nov 1, 2025).

### OpenAI Codex Codex primarily relies on API-based execution with optional user-managed sandboxes (e.g., via Firecracker or custom proxies). It emphasizes provider retention policies but lacks built-in native sandboxing like Claude Code.

- *Render Blog: Testing AI Coding Agents (2025)* Benchmarks Codex's handling of isolation in production tasks, including Docker-based sandboxes. [Read here](https://render.com/blog/ai-coding-agents-benchmark) (Published Aug 12, 2025).

- *Medium: Claude Code vs Cursor* Indirect comparison noting Codex's API retention and sandbox limitations vs. Cursor/Claude. [Read here](https://open-data-analytics.medium.com/claude-code-vs-cursor...) (Published Aug 6, 2025).

### Goose AI (Codename Goose) Goose uses container-based isolation via tools like Container Use (built on Dagger) for git-branch-isolated environments, emphasizing safe experimentation without affecting the host.

- *Goose Blog: Isolated Dev Environments* Explains Goose's container-use for sandboxes, including lifecycle management and rollback. [Read here](https://block.github.io/goose/blog/2025/06/19/isolated-devel...) (Published Jun 19, 2025).

- *GitHub Discussion: Goose vs Claude Code* Community analysis comparing Goose's local isolation to Claude Code's cloud sandboxes. [Read here](https://github.com/block/goose/discussions/3133) (Ongoing, started Jun 27, 2025).

- *Slashdot: Compare Claude vs. Goose* High-level comparison including deployment isolation. [Read here](https://slashdot.org/software/comparison/Claude-vs-codename-...).

also: check out the open-source sandbox runtime from Anthropic: [GitHub Repo](https://github.com/anthropic-experimental/sandbox-runtime).

clearly i have a bias on this topic, lol

1 more reply

renevanpelt7mo ago· 1 in thread

Very cool, always great to build projects because you need them yourself.

From the screenshots it seems that there's a "tool" in the list of tools provided to the LLM for command line utilities like `rm`, `mkdir`, `ls` and so forth.

Just a small piece of advice: you might want to look into further. You can also expose the command line as a single tool, and most LLMs will be able to provide pretty good formatted commands. You could still filter invalid or allowed and non-allowed commands out within the tool that's actually being called by the LLM.

Just wanted to share that!

SamInTheShellOP7mo ago

I've seen that before. Idk how I feel about that pattern in general. When I see any of the tools I use do stuff like `bash(cmd... I didn't ask your permissions - hehe!~)`, I get a bit pissed that it wasn't a straight up standalone tool. The number of times it's gas lit me into panicing isn't zero.

j / k navigate · click thread line to collapse

6 comments

5 comments · 2 top-level

mutant7mo ago· 2 in thread

would be interested in reading your outlook on the execution model, isolation, guardrails, tool calling. those are some of the baseline things i evaluate before try before trying an agentic env

SamInTheShellOP7mo ago

mutant7mo ago

might be good to get your hands around this early.

reading up on how crush, goose, and opencode handle this may be a good idea.

i've been trying to build a web native terminal assistant for a while (just a side project) and this is easily the thing that keeps me up at night.

- *Slashdot: Compare Claude vs. Goose* High-level comparison including deployment isolation. [Read here](https://slashdot.org/software/comparison/Claude-vs-codename-...).

also: check out the open-source sandbox runtime from Anthropic: [GitHub Repo](https://github.com/anthropic-experimental/sandbox-runtime).

clearly i have a bias on this topic, lol

1 more reply

renevanpelt7mo ago· 1 in thread

Very cool, always great to build projects because you need them yourself.

From the screenshots it seems that there's a "tool" in the list of tools provided to the LLM for command line utilities like `rm`, `mkdir`, `ls` and so forth.

Just wanted to share that!

SamInTheShellOP7mo ago

j / k navigate · click thread line to collapse