spruce_tips on Hacker News

Ask HN: Is anyone using Devin/Cognition?

I've tried it, and wasn't really impressed. Curious if and how teams/individuals here are actually using it, and why!

Ask HN: Persisting LLM token streams through a page refresh?

Like many of you, I'm streaming responses token by token from LLMs using server sent events (SSEs). What's the best way to maintain the SSE connection through a page refresh?

I haven't seen a lot of documentation or examples covering this. In most LLM enabled apps I've used, if tokens are currently streaming and the page refreshes/changes, the stream gets interrupted.

One idea I had was writing the streamed tokens into some sort of queue or kafka topic, then connecting my UI to the queue and streaming tokens from there instead. But that seems like a lot of work.

How are most folks doing this?

1spruce_tips11mo ago0

Ask HN: Persisting LLM token streams through a page refresh?

Like many of you, I'm streaming responses token by token from LLMs using server sent events (SSEs).

What's the best way to maintain the SSE connection through a page refresh?

I haven't seen a lot of documentation or examples covering this. In most LLM enabled apps I've used, if tokens are currently streaming and the page refreshes/changes, the stream gets interrupted.

One idea I had was writing the streamed tokens into some sort of queue or kafka topic, then connecting my UI to the queue and streaming tokens from there instead. But that seems like a lot of work.

How are most folks doing this?

1spruce_tips11mo ago0

Ask HN: Strategies or tools for embedding multiple file types?

I've worked a good bit with embedding strategies for RAG. But they've only been for documents that are identical in structure i.e. interview transcripts.

I'm curious how others have thought about handling embeddings for multiple file types (txt, pdf, image, docx, ppt, etc.)? Obviously, I could handle each file type individually and then build a flexible search layer on top, but I'm concerned about the level of maintenance required.

One idea I had was to build a translation layer of sorts that would take some arbitrary file type in, map it onto a standardized text schema, and embed that. For images (which are much less common in my dataset), I would use an LLM to describe the image and cast that text into my standard format. The standard format would allow me to simplify the chunking and embedding logic for each file type, and make the vector search layer a lot easier to maintain.

I know this won't be perfect, but I think it could solve most of what I'm trying to achieve.

---

Curious what others think about this and what you have tried.

Cheers,

spruce_tips

3spruce_tips1y ago3

Ask HN: Devs using LLMs, how are you keeping costs low for LLM calls locally?

My project has a multi step LLM flow using gpt-4o.

While developing new features/testing locally, the LLM flow frequently runs, and I use a bunch of tokens. My openAI bill spikes.

I've made some efforts to stub LLM responses but it adds a decent bit of complexity and work. I don't want to run a model locally with ollama because I need to output to be high quality and fast.

Curious how others are handling similar situations.

5spruce_tips1y ago8

Ask HN: Persisting LLM token streams through a page refresh?

Like many of you, I'm streaming responses token by token from LLMs using server sent events (SSEs). What's the best way to maintain the SSE connection through a page refresh?

I haven't seen a lot of documentation or examples covering this. In most LLM enabled apps I've used, if tokens are currently streaming and the page refreshes/changes, the stream gets interrupted.

One idea I had was writing the streamed tokens into some sort of queue or kafka topic, then connecting my UI to the queue and streaming tokens from there instead. But that seems like a lot of work.

How are most folks doing this?

Ask HN: Persisting LLM token streams through a page refresh?

Like many of you, I'm streaming responses token by token from LLMs using server sent events (SSEs).

What's the best way to maintain the SSE connection through a page refresh?

I haven't seen a lot of documentation or examples covering this. In most LLM enabled apps I've used, if tokens are currently streaming and the page refreshes/changes, the stream gets interrupted.

One idea I had was writing the streamed tokens into some sort of queue or kafka topic, then connecting my UI to the queue and streaming tokens from there instead. But that seems like a lot of work.

How are most folks doing this?

Ask HN: Strategies or tools for embedding multiple file types?

I've worked a good bit with embedding strategies for RAG. But they've only been for documents that are identical in structure i.e. interview transcripts.

I know this won't be perfect, but I think it could solve most of what I'm trying to achieve.

---

Curious what others think about this and what you have tried.

Cheers,

spruce_tips

Ask HN: Devs using LLMs, how are you keeping costs low for LLM calls locally?

My project has a multi step LLM flow using gpt-4o.

While developing new features/testing locally, the LLM flow frequently runs, and I use a bunch of tokens. My openAI bill spikes.

I've made some efforts to stub LLM responses but it adds a decent bit of complexity and work. I don't want to run a model locally with ollama because I need to output to be high quality and fast.

Curious how others are handling similar situations.

spruce_tips

Recent submissions

Read at 600+ wpm using rapid serial visual representation (RSVP) (opens in new tab)

Ask HN: Is anyone using Devin/Cognition?

Ask HN: Persisting LLM token streams through a page refresh?

Ask HN: Persisting LLM token streams through a page refresh?

Ask HN: Strategies or tools for embedding multiple file types?

Ask HN: Devs using LLMs, how are you keeping costs low for LLM calls locally?

Show HN: Listen to TTS content in private RSS feeds (opens in new tab)

Recent submissions

Read at 600+ wpm using rapid serial visual representation (RSVP) (opens in new tab)

Ask HN: Is anyone using Devin/Cognition?

Ask HN: Persisting LLM token streams through a page refresh?

Ask HN: Persisting LLM token streams through a page refresh?

Ask HN: Strategies or tools for embedding multiple file types?

Ask HN: Devs using LLMs, how are you keeping costs low for LLM calls locally?

Show HN: Listen to TTS content in private RSS feeds (opens in new tab)