undefined | Better HN

0 pointsIanCal2y ago0 comments

Are you asking if system prompts change the output?

Try telling the model it's a pirate or someone who is just learning English. It can easily do that, so why would you assume that no system prompt would be the best for some specific problem?

You can tell them to be more critical, that's a useful one. You can tell it to not solve a problem but critique an output - then have two models talk to each other one as a critic and one as a planner.

I can help show the difference but I'm not sure quite what you think doesn't matter and feel like that's important to nail down first.

> You say that I should be "testing and measuring" as I go. How? What is the metric to measure?

Tools like promptfoo can help with some of this.

You can do comparisons, blind tests, measuring what your users prefer, you can use high quality models to test things like "does not mention it's an AI bot" or similar. It depends on what your task is.

Edit -

A lot of people don't properly test and have lots of things in their prompts that aren't necessarily helping, or may have been required in an earlier model but now aren't needed. Prompt engineering is more important in less powerful models or higher stakes situations.

0 comments

2 comments · 1 top-level

sanderjd2y ago· 1 in thread

> measuring what your users prefer

Ah, this is the disconnect. I don't have users. You're talking about creating an application; I'm talking about just using it to solve my own problems.

It's not that I don't realize that prompts like "you are a {whatever}" modify the output, it's that I'm skeptical that it does a better job of helping me solve my problems when I start out that way than it does when I just interactively ask it for what I want. I tried this kind of thing for awhile "you are an advanced planner assistant", but now I mostly just say "could you help me come up with a plan for XYZ?". So I've become somewhat skeptical that I'd see a difference if I tried to measure this for my own use somehow.

But thanks for the pointer to a tool! It might be interesting to see if I could make that work to measure my own intuitions better.

IanCalOP2y ago

Ah yeah if it's not automated you have fewer issues, particularly with good models which require much less badgering.

What I would say is that there are probably common patterns that you use and building up prompts can save some time. There's a lot of woo around though as people just copy patterns they see. It may be as simple for you as coding with a few style examples and an explanation of your level (e.g. in typescript I want code examples and help with syntax, python I've used for decades so need only higher level pointers)

Promptfoo should fit that quite well, you can give it some prompts and run them all and see output (with caching thankfully).

GPTs, the custom ones now, are a little different in that you can also give them files to refer to. I've done that with example code for an internal-ish framework and it generates useful code from that.

Edit - I'd also invite you to think about places you could use them in a more automated way. I had some I want to resurrect with newer models where I can take a recording, send it through whisper, then from the transcript take out:

* Key points

* Give me counter arguments

* Critique my thinking

* If I talk about coding options give advice or write code to solve the problems

* Create a title

j / k navigate · click thread line to collapse