Ask HN: What is Q* (Q star) at OpenAI and how does it threaten humanity

27 pointsquietthrow2y ago13 comments

Reuters broke that the precursor to all the recent drama at open AI started with researchers at open AI writing to the board about a recent breakthrough they made and the threat it poses humanity. (It could be possible that Sam was aware of this and didn’t care but that’s a tangent)

Given where AI stands today what kind of breakthroughs are possible. What are the big gaps to AGI that exist today?

would be great to know of the gaps to follow progress and closeness to achieving AGI

27 pointsquietthrow2y ago13 comments

Given where AI stands today what kind of breakthroughs are possible. What are the big gaps to AGI that exist today?

would be great to know of the gaps to follow progress and closeness to achieving AGI

13 comments

13 comments · 8 top-level

MAXPOOL2y ago· 2 in thread

Speculating from the name only.

Q* might be name derived from Q-learning and A* search algorithm.

In that case it would be informed best best-first search using reinforcement learning.

janalsncm2y ago

Speculating further, during decoding (when GPT is deciding which token to generate next) we typically use something like beam search. That is, we don’t just take the next most likely token, we take the next most likely sequence of tokens when you multiply all of their probabilities together.

In Q learning, a learner can take some set of actions to maximize a future reward. In this case, the set of actions at each step is the choice of token and the reward is something like 1 if the user liked the response and 0 if they didn’t. Or since it seems they’re applying this to arithmetic, the goal is some formulation of the solution.

Putting these together, it’s possible that Q* is some better way of decoding. Something built on top of the prior probabilities of GPT.

brianjking2y ago

correct, this is my understanding.

andyjohnson02y ago· 2 in thread

The Guardian is reporting [1] that Q* "was able to solve basic maths problems it had not seen before" and cites a paywalled article on The Information [2]. They also say "the pace of development behind the system had alarmed some safety researchers" and "The artificial intelligence model triggered such alarm with some OpenAI researchers that they wrote to the board of directors before Altman’s dismissal warning it could threaten humanity,"

Sounds like it might be something notable, perhaps related to Q-learning and A* search as others here have speciulated. How it represents a specific or general existential threat is less clear, to me at least.

[1] https://www.theguardian.com/business/2023/nov/23/openai-was-...

[2] https://www.theinformation.com/articles/openai-made-an-ai-br...

casualrandomcom2y ago

It never occurred to me that "being able to solve basic math problems I had not seen before", implied that I "could threaten humanity". Wow!

NicoJuicy2y ago

> was able to solve basic maths problems it had not seen before

That's the thing. ChatGPT is interesting, but if it can find more precise associations than "the second best token", that's worrysome for a lot of digital jobs and jobs that can be replaced by robots.

NoZZz2y ago· 1 in thread

I think it means that the letter Q is the answer to life the universe and everything. Notice the line entering the circle, it symbolises the initial act required to create life.

j4hdufd82y ago

fertilization?

DicksonX2y ago

I think it's nothing but an obvious first step to have AGI not limited to fine tuned with static biases and human feedbacks. It's the idea I was in my mind for last 2 to 3 years. We use tree of thoughts chaain them and use a massive q learning probability array to find the best path for decision making. Seems a common sense concept and a known idea for long time. Open AI now moving from static rewards to dynamic rewards . That's AGI and agents will have the truth aligned by its own . A good step in mimicking us.

wahnfrieden2y ago

Its threat to humanity is that VC-backed businesses will use it to justify regulatory capture and recommendations of total state authoritarianism under the guise of safety, leading us to autocrat rule and subsequent demise.

It’s all out in the open, you can look at the papers coming from the EA community which as Frontier AI Regulation and the freedoms it claims are necessary to strip from society to protect ourselves.

quietthrowOP2y ago

https://drpippa.substack.com/p/q-tigris

Interesting but not sure who this author is.

jonincanada2y ago

balderdash? "Q-star". Yes, the Q as in q-learning -- optimize a long term goal. The "star points" are the embedded algorithms discovered and joined within the transformer/NN architecture. Stars where formed after SGD discovered the best representation of said embedded alg type. I'm running a scaled down version myself -- somewhat impressive. Do it at 1k B parameters? hold my beer.

MrCoffee72y ago

Gary Marcus just put out a column about this: https://garymarcus.substack.com/p/about-that-openai-breakthr...

j / k navigate · click thread line to collapse