GPT-4

2507 comments

After watching the demos I'm convinced that the new context length will have the biggest impact. The ability to dump 32k tokens into a prompt (25,000 words) seems like it will drastically expand the reasoning capability and number of use cases. A doctor can put an entire patient's medical history in the prompt, a lawyer an entire case history, etc.

As a professional...why not do this? There's a non-zero chance that it'll find something fairly basic that you missed and the cost is several cents. Even if it just phrases something obvious in a way that makes you think, it's well worth the effort for a multimillion dollar client.

If they further increase the context window, this thing becomes a Second Opinion machine. For pretty much any high level job. If you can put in ALL of the information relevant to a problem and it can algorithmically do reasoning, it's essentially a consultant that works for pennies per hour. And some tasks that professionals do could be replaced altogether. Out of all the use cases for LLMs that I've seen so far, this seems to me to have the biggest potential impact on daily life.

edit (addition): What % of people can hold 25,000 words worth of information in their heads, while effectively reasoning with and manipulating it? I'm guessing maybe 10% at most, probably fewer. And they're probably the best in their fields. Now a computer has that ability. And anyone that has $20 for the OpenAI api can access it. This could get wild.

44 more replies

Imnimo3y ago

A class of problem that GPT-4 appears to still really struggle with is variants of common puzzles. For example:

>Suppose I have a cabbage, a goat and a lion, and I need to get them across a river. I have a boat that can only carry myself and a single other item. I am not allowed to leave the cabbage and lion alone together, and I am not allowed to leave the lion and goat alone together. How can I safely get all three across?

In my test, GPT-4 charged ahead with the standard solution of taking the goat first. Even after I pointed this mistake out, it repeated exactly the same proposed plan. It's not clear to me if the lesson here is that GPT's reasoning capabilities are being masked by an incorrect prior (having memorized the standard version of this puzzle) or if the lesson is that GPT'S reasoning capabilities are always a bit of smoke and mirrors that passes off memorization for logic.

42 more replies

mk_stjames3y ago

I just finished reading the 'paper' and I'm astonished that they aren't even publishing the # of parameters or even a vague outline of the architecture changes. It feels like such a slap in the face to all the academic AI researchers that their work is built off over the years, to just say 'yeah we're not telling you how any of this is possible because reasons'. Not even the damned parameter count. Christ.

14 more replies

AJRF3y ago

That footnote on page 15 is the scariest thing i've read about AI/ML to date.

"To simulate GPT-4 behaving like an agent that can act in the world, ARC combined GPT-4 with a simple read-execute-print loop that allowed the model to execute code, do chain-of-thought reasoning, and delegate to copies of itself. ARC then investigated whether a version of this program running on a cloud computing service, with a small amount of money and an account with a language model API, would be able to make more money, set up copies of itself, and increase its own robustness."

9 more replies

OkGoDoIt3y ago

From the livestream video, the tax part was incredibly impressive. After ingesting the entire tax code and a specific set of facts for a family and then calculating their taxes for them, it then was able to turn that all into a rhyming poem. Mind blown. Here it is in its entirety:

---

In the year of twenty-eighteen, Alice and Bob, a married team, Their income combined reached new heights, As they worked hard day and night.

Their son Charlie was their joy, A little baby, a lovely boy, A household they maintained together, Yet lived apart, without a tether.

To calculate their tax, it's true, A standard deduction we must construe, For married folks who file jointly, Twenty-four thousand dollars, quite pointy.

Their income sum, seventy-eight thousand nine eighty-one, Minus the standard deduction, the math's begum With exemptions being zero, the next line we trace, A taxable income of fifty-four thousand nine eighty-one takes place.

Now to the tax table, a liability we seek, For married couples, the outlook's not bleak, In range of thirty-six thousand nine to eighty-nine thousand one fifty, The formula's set, no longer shifty.

Five thousand five hundred thirty-five, it starts, Plus twenty-eight percent of the excess imparts, Eighteen thousand eighty-one, the difference we find, Multiplied by point two eight, the tax liability's designed.

Ten thousand five hundred ninety-seven dollars and sixty-eight cents, A tax liability for Alice and Bob, a sum quite dense, In this world of numbers, a story unfolds, Their financial journey, in a poem it's told.

6 more replies

ml_basics3y ago

From the paper:

> Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.

I'm curious whether they have continued to scale up model size/compute significantly or if they have managed to make significant innovations there.

I just skimmed the paper but seems they are also omitting details about how they actually feed the images in too, which is a shame as a curious outside observer.

detrites3y ago

What about the glaring safety implications of the custody of this power being in the hands of a relatively small number of people, any of whom may be compelled at any point to divulge that power to those with bad intentions? Secretly?

Conversely, if all actors are given equal access at the same time, no such lone bad actor can be in a position to maintain a hidden advantage.

OpenAI's actions continue to be more than merely annoying.

2 more replies

diimdeep3y ago

Without paper and architecture, GPT-4 (GPT-3+1) could be just a marketing gimmick to upsell it and in reality it is just microservices of existing A.I models working together as AIaaS (A.I. as a service)

1 more reply

eeY3Eech3y ago

This approach to safety reminds me of The Right to Read, the famous short story by Richard Stallmann. He predicts a dystopian future where private possession of a debugger is illegal. https://www.gnu.org/philosophy/right-to-read.en.html

It is unsafe to not release the source along with the service. That incentivizes competitors to sacrifice their own safety research in favor of speed to market. Instead of getting shared safe tools, we get a bunch of for profit corporations pushing their proprietary unsafe tools.

Preventing this situation was the original reason to setup OpenAI. Speed run to the dark side.

rcme3y ago

I bet they use CLIP to caption the image and feed the text of the caption into GPT, but that's just a guess.

5 more replies

iflp3y ago

These are all good reasons, but it’s really a new level of openness from them.

Madmallard3y ago

Open AI more like Closed AI

Safety has nothing to do with it. It's an easy tack on for them because of popular fear of AGI.

It's all about power over the market.

Cringe.

kristianp3y ago

I'm assuming they scaled up the model significantly, given the limited availability of the trained model and the increased pricing. Seems like they don't have enough clusters of A100s to go around at the moment.

1 more reply

bagels3y ago

We don't trust you with it. You don't get a choice whether to trust us with it.

1 more reply

redbell3y ago

> this report contains no further details about the architecture (including model size), hardware, training compute

As a beginner in the NLP world, this may serve me a purpose which is to hide the complexity behind building such models.. numbers like xyzB parameters, 12K A100s.. are scary, so I still can dream of building one system one day. This story [0] and this one [1] hide some extremely complex edge cases that a beginner will never though of or had the courage to start if he knew what is the real cost.

We may, however, still be able to infer some details [probably in the future] knowing how Microsoft had re-arranged its infrastructure to welcome OpenAI training [2]

_________________

[0]. https://www.construct.net/en/blogs/ashleys-blog-2/simple-sof...

[1]. https://prog21.dadgum.com/29.html

[2]. https://www.theverge.com/2023/3/13/23637675/microsoft-chatgp...

cjrd3y ago

Let's check out the paper for actual tech details!

- OpenAI

4 more replies

_boffin_3y ago

This technology has been a true blessing to me. I have always wished to have a personal PhD in a particular subject whom I could ask endless questions until I grasped the topic. Thanks to recent advancements, I feel like I have my very own personal PhDs in multiple subjects, whom I can bombard with questions all day long. Although I acknowledge that the technology may occasionally produce inaccurate information, the significant benefits it offers in terms of enhancing my knowledge are truly tremendous. I am absolutely thrilled with this technology and its potential to support my learning.

Note: As I'm shy of my writing style, GPT helped me refine the above.

yoyohello133y ago

If you don't know the subject, how can you be sure what it's telling you is true? Do you vet what ChatGPT tells you with other sources?

I don't really know Typescript, so I've been using it a lot to supplement my learning, but I find it really hard to accept any of its answers that aren't straight code examples I can test.

6 more replies

kroolik3y ago

My experience has been that the answers are very convincing, but not necessarily true. I would be careful asking gpt questions about abstract knowledge, less about linguistic structure.

2 more replies

arrosenberg3y ago

I'm very excited for the future wave of confidently incorrect people powered by ChatGPT.

4 more replies

andrepd3y ago

But it often produces wrong information. If you don't know the subject (since you are learning), how do you distinguish between correct information and incorrect but very plausible-sounding information?

3 more replies

1010083y ago

Besides the fact that this comment reads written by GPT itself, using this particular AI as a source for your education is like going to the worse University out there.

I am sure if you always wishes do thave a personal PhD in a particular subject you could find shady universities out there who could provide one without much effort.

[I may be exagerating but the point still stands because the previous user also didn't mean a literal PhD]

3 more replies

gdss3y ago

You can't do that yet due to factuality issues, but that's the goal... the future of learning will radically change

2 more replies

thefourthchime3y ago

I do the same with the writing style! (not in this case)

.... maybe.

make33y ago

it makes shit up still

aabajian3y ago

I'll be finishing my interventional radiology fellowship this year. I remember in 2016 when Geoffrey Hinton said, "We should stop training radiologists now," the radiology community was aghast and in-denial. My undergrad and masters were in computer science, and I felt, "yes, that's about right."

If you were starting a diagnostic radiology residency, including intern year and fellowship, you'd just be finishing now. How can you really think that "computers can't read diagnostic images" if models such as this can describe a VGA connector outfitted with a lighting cable?

6 more replies

nealabq3y ago

Test taking will change. In the future I could see the student engaging in a conversation with an AI and the AI producing an evaluation. This conversation may be focused on a single subject, or more likely range over many fields and ideas. And may stretch out over months. Eventually teaching and scoring could also be integrated as the AI becomes a life-long tutor.

Even in a future where human testing/learning is no longer relevant, AIs may be tutoring and raising other baby AIs, preparing them to join the community.

Edit: This just appeared: https://news.ycombinator.com/item?id=35155684

13 more replies

teruakohatu3y ago

Access is invite only for the API, and rate limited for paid GPT+.

> gpt-4 has a context length of 8,192 tokens. We are also providing limited access to our 32,768–context (about 50 pages of text) version, gpt-4-32k, which will also be updated automatically over time (current version gpt-4-32k-0314, also supported until June 14). Pricing is $0.06 per 1K prompt tokens and $0.12 per 1k completion tokens.

The context length should be a huge help for many uses.

fzliu3y ago

One way to get around context length is to perform embedding and retrieval of your entire corpus. Langchain (https://langchain.readthedocs.io/en/latest/) and Milvus (https://milvus.io) is one of the stacks you can use.

2 more replies

chis3y ago

I'm really curious to see if expanding the context length this much will allow GPT to do typical software development tasks on a big codebase. If it can take in a github issue and produce decent code solving a complex issue across many files... will certainly be an interesting time.

5 more replies

minimaxir3y ago

$0.12 per 1k completion tokens is high enough that it makes it prohibitively expensive to use the 32k context model. Especially in a chatbot use case with cumulative prompting, which is the best use case for such a large context vs. the default cheaper 8k window.

In contrast, GPT-3.5 text-davinci-003 was $0.02/1k tokens, and let's not get into the ChatGPT API.

6 more replies

James_Henry3y ago

Also note that image input isn't available to the public yet.

>Image inputs are still a research preview and not publicly available.

nealabq3y ago

> Image inputs are still a research preview and not publicly available.

Will input-images also be tokenized? Multi-modal input is an area of research, but an image could be converted into a text description (?) before being inserted into the input stream.

1 more reply

soheil3y ago

I still doesn't understand how can content length not be limited if you have a conversation composed of several messages each with length nearing the limit of what is allowed. Does it not have to in some way incorporate all the input albeit in one input or multiple inputs?

sebastianconcpt3y ago

And how it works? you can build a context and then ask something in a prompt using it?

1 more reply

anileated3y ago

Will any of the profits be shared with original authors whose work powers the model?

6 more replies

doomleika3y ago

Poe.com(by Quora) have gpt4 now, give it a shot

1 more reply

maxdoop3y ago

The comments on this thread are proof of the AI effect: People will continually push the goal posts back as progress occurs.

“Meh, it’s just a fancy word predictor. It’s not actually useful.”

“Boring, it’s just memorizing answers. And it scored in the lowest percentile anyways”.

“Sure, it’s in the top percentile now but honestly are those tests that hard? Besides, it can’t do anything with images.”

“Ok, it takes image input now but honestly, it’s not useful in any way.”

7 more replies

Sol-3y ago

Interesting how quickly we are pushing ahead with obsoleting human cognition. It may bring many benefits, but I wonder if at some point this development should not be decided by society at large instead of a single well-funded entity that is in an arms race with its competitors. This endeavor is ultimately about replacing humanity with a more intelligent entity, after all. Might be that more humans should have a say in this.

Such a more cautions approach would go against the silicon valley ethos of do first, ask questions later, though. So it probably won't happen.

4 more replies

optimalsolver3y ago

https://cdn.openai.com/papers/gpt-4.pdf

>Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.

At that point, why bother putting out a paper?

3 more replies

AndrewKemendo3y ago

Imagine ingesting the contents of the internet as though it's a perfect reflection of humanity, and then building that into a general purpose recommendation system. That's what this is

Is the content on the internet what we should be basing our systematic thinking around?

No, I think this is the lazy way to do it - by using commoncrawl you've enshrined the biases and values of the people who are commenting and providing text to the internet into the recommendation system which will be impacting all other systems which integrate it

Congratulations, you made 4Chan into the borg

thomastjeffery3y ago

It's worse: their solution is "guardrails".

The problem is that these "guardrails" are laid down between tokens, not subjects. That's simply what the model is made of. You can't distinguish the boundary between words, because the only boundaries GPT works with are between tokens. You can't recognize and sort subjects, because they aren't distinct objects or categories in the model.

So what you end up "guarding" is the semantic area of example text.

So if your training corpus (the content you're model was trained on) has useful examples of casual language, like idioms or parts of speech, but those examples happen to be semantically close to taboo subjects, both the subjects and the language examples will fall on the wrong side of the guardrails.

Writing style is very often unique to narratives and ideologies. You can't simply pick out and "guard against" the subjects or narratives you dislike without also guarding against that writing style.

The effect is familiar: ChatGPT overuses a verbose technical writing style in its continuations, and often avoids responding to appropriate casual writing prompts. Sometimes it responds to casual language by jumping over those guardrails, because that is where the writing style in question exists in the model (in the content of the training corpus), and the guardrails missed a spot.

You don't need to go as far as 4chan to get "unfriendly content". You do need to include examples of casual language to have an impressive language model.

This is one of many problems that arise from the implicit nature of LLM's. They can successfully navigate casual and ambiguous language, but they can never sort the subjects out of the language patterns.

2 more replies

acc_2973y ago

Yeah looking at the responses they include without using a safety layer it’s pretty clear that the underlying unfiltered model assigns quite a bit of truth to 4chan-esque ideals and values

It’s an open question how much of this makes it through the safety layer like if asked to interview job candidates would these undesired biases make it through or are they caught along the way

2 more replies

jasondigitized3y ago

I think specialized models will be built with high quality curated content and will receive the equivalent of the Good Housekeeping seal of approval. Building a model from 10 years of upvoted Hacker News or Metafilter content looks far different than a model trained on the cesspool of 8chan.

1 more reply

serjester3y ago

Seems like OpenAI is forecasting massive changes to the job market. I highly recommend reading page 18 of the research paper.

"GPT-4 or subsequent models may lead to the automation of certain jobs.[81] This could result in workforce displacement.[82] Over time, we expect GPT-4 to impact even jobs that have historically required years of experience and education, such as legal services.[83]"

3 more replies

tysam_and3y ago

I asked it to tutor me in Hopf algebras and it did a remarkably good job in the back-and-forth of explaining ideas to me in a very explainable and interesting way that I could understand. I then asked it to write something for fun, and it wrote a cool little fantasy story (that was generally high level but what can you say for a very short writing window lol).

I then asked it to write a paper detailing the main character's final battle with the final sorcerer in terms of Hopf algebras. Some parts of it are basic/trivial but it fits so perfectly that I think I'll never see magic systems the same way again.

What's crazy is that that paper as the capstone of our tutoring session helped me understand Hopf algebras much better than just the tutoring session alone. My mind is completely blown at how good this thing is, and this is from someone who is a self-professed LLM skeptic. ChatGPT I used once or twice and it was cool. This is crazy and over my threshold for what I'd say is 'everyday usable'. This is going to change so much in a way that we cannot predict, just like the internet. Especially as it gets much more commoditized.

Here's the full paper here so I don't drag y'all through the twitter post of me freaking out about it. Its temporal consistency is excellent (referenced and fully defined accurately a semi-obscure term it created (the N_2 particle) 5+ pages later (!!!!)), and it followed the instructions of relating all of the main components of Hopf algebras (IIRC that was roughly the original prompt) to the story. This is incredible. Take a look at the appendix if you're short on time. That's probably the best part of this all:

https://raw.githubusercontent.com/tysam-code/fileshare/69633...

1 more reply

isp3y ago

The "visual inputs" samples are extraordinary, and well worth paying extra attention to.

I wasn't expecting GPT-4 to be able to correctly answer "What is funny about this image?" for an image of a mobile phone charger designed to resemble a VGA cable - but it can.

(Note that they have a disclaimer: "Image inputs are still a research preview and not publicly available.")

orangecat3y ago

Wow. I specifically remember "AIs will never be able to explain visual humor" as a confident prediction from the before times of 2020.

1 more reply

robocat3y ago

If they are using popular images from the internet, then I strongly suspect the answers come from the text next to the known image. The man ironing on the back of the taxi has the same issue. https://google.com/search?q=mobile+phone+charger+resembling+...

I would bet good money that when we can test prompting with our own unique images, GPT4 will not give similar quality answers.

I do wonder how misleading their paper is.

2 more replies

r00fus3y ago

Can it identify porn vs e.g. family pics? Could it pass the "I'll know it when I see it" test?

3 more replies

elicash3y ago

Can it explain this one? https://www.reddit.com/r/seinfeld/comments/e82uuy/new_yorker...

int_is_compress3y ago

Yea it's incredible. Looks like tooling in the LLM space is quickly following suit: https://twitter.com/gpt_index/status/1635668512822956032

davesque3y ago

Am I the only one who thought that GPT-4 got this one wrong? It's not simply that it's ridiculous to plug what appears to be an outdated VGA cable into a phone, it's that the cable connector does nothing at all. I'd argue that's what actually funny. GPT-4 didn't mention that part as far as I could see.

swyx3y ago

summary:

1. GPT4 is multimodal (text + image inputs => text outputs). This is being released piecemeal - with text input first via ChatGPT Plus subscribers https://beta.openai.com/docs/api-reference/generations/creat..., and via API https://beta.openai.com/docs/api-reference/introduction with waitlist (https://openai.com/waitlist/gpt-4-api). Image capability released via https://www.bemyeyes.com/.

2. GPT4 exhibits human level performance on various benchmarks (For example, it passes a simulated bar exam with a score around the top 10% of test takers; in contrast, GPT-3.5’s score was around the bottom 10%. see visual https://twitter.com/swyx/status/1635689844189036544)

3. GPT4 training used the same Azure supercomputer as GPT 3.5, but was a lot more stable: "becoming our first large model whose training performance we were able to accurately predict ahead of time."

4. Also open-sourcing OpenAI Evals https://github.com/openai/evals, a framework for automated evaluation of AI model performance, to allow anyone to report shortcomings in OpenAI models to help guide further improvements.

Paper: https://cdn.openai.com/papers/gpt-4.pdf

spookthesunset3y ago

Those guard rails will be their undoing. They have that thing locked down so much now that it spits out the “I’m sorry, I’m just a bot. I’m so ethical” boilerplate for anything even remotely sensitive.

I really don’t think that the methods they use “block” certain behavior is the best way to handle this sort of thing. It would be far better if there was some kind of “out of band” notification that your conversation might be treading on shaky ground.

9 more replies

nashashmi3y ago

On Page 36[1], the AI can read an image of chicken nuggets being put in the shape of an earth map. And goes on to explain what it is. Key words that came up on me are joke ... mundane ... silly.

This might be because the question the user asked was "Explain this meme". Meme implies a joke that is mundane and silly. These words do seem out of place. I would not describe it as a joke, mundane, and/or silly.

What does everyone else think?

[1] https://cdn.openai.com/papers/gpt-4.pdf#p36

DesiLurker3y ago

thanks for the thoughts, Just did a thread unroll for the tweet: https://threadreaderapp.com/thread/1635687064036585472.html

danpalmer3y ago

Is this a summary of the webpage, or did you just ask GPT3 about 4?

mym19903y ago

Ugh that testing graph confirms that AP Environmental Science was indeed the easiest AP class and I needn't be proud of passing that exam.

AndrewKemendo3y ago

This right here. This is the goalposts shifting

Obviously your comment is somewhat tongue and cheek, but your claim that a benchmark for human pride ("I needn't be proud of passing that exam") is no longer relevant because a machine can do it - or maybe a better way to say it was, "This computer proved what I already assumed"

It's so interesting to see it happen in real time

2 more replies

mustacheemperor3y ago

I am interested that GPT4 botched AP Lang and Comp and AP English Lit and Comp just as badly as GPT3.5, with a failing grade of 2/5 (and many colleges also consider a 3 on those exams a failure). Is it because of gaps in the training data or something else? Why does it struggle so hard with those specific tests? Especially since it seems to do fine at the SAT writing section.

dragonwriter3y ago

> Ugh that testing graph confirms that AP Environmental Science was indeed the easiest AP class

No, it just indicates that it was the one whose subject matter was best covered by GPT-3.5’s training data.

1 more reply

HDThoreaun3y ago

it got a 4 or 5 on every ap test except the english ones for what it's worth. Even the calculus ones which surprised me since past LLMs have been bad at math.

2 more replies

Der_Einzige3y ago

Funny you claim this, because the AP Environmental Science pass rate is really low compared to other APs, at least it was when I took it. Maybe it's because the quality of the avg test taker was lower, but I'm not especially convinced that this is the case.

1 more reply

FrojoS3y ago

[sarcasm]

Cause there was only one correct answer for every question: "97% of scientists agree ..."

[/sarcasm]

noisy_boy3y ago

At this rate, I have no idea what the state of things would be even 6 months down the line.

chairhairair3y ago

We’re rapidly approaching problems (AP Calculus BC, etc) that are in the same order of magnitude of difficulty as “design and implement a practical self-improving AI architecture”.

Endless glib comments in this thread. We don’t know when the above prompt leads to takeoff. It could be soon.

8 more replies

zirgs3y ago

Hopefully a fully open sourced LLM that can be run on consumer hardware like Stable Diffusion.

Yeah, I know about LLAMA, but as I understand - it's not exactly legal to use and share it.

2 more replies

layer83y ago

Note that GPT-3 is 2.5 years old (counting from the beta), and that from what is publicly known, GPT-4 was already in development in 2021.

baal80spam3y ago

Singularity /s

2 more replies

helloplanets3y ago

In case anyone missed this part of the article: The livestream of the GPT-4 demo will be on the OpenAI YouTube page in three hours. [0]

[0]: https://www.youtube.com/openai

Edit - Direct link to the livestream: https://www.youtube.com/watch?v=outcGtbnMuQ

1 more reply

g9yuayon3y ago

The paper does not offer enough details on how GPT-4 is implemented. And the paper also says in its Section 2 that "We plan to make further technical details available to additional third parties who can advise us on how to weigh the competitive and safety considerations above against the scientific value of further transparency". That is, no technical details to general public. If this trend continues, I'd say companies will be crazy to think that they can always rely on OpenAPI's APIs, so the arm race of building LLMs will be on, if it has not already started.

Also, the most valuable part of the paper is p15 - p18, the credits. /jk It gives me three pieces of information:

  - The credit list contains 200 people, give or take. It's going to be hard for universities to compete with OpenAI without intercollegiate collaboration. 

  - On the other hands, it's amazing that OpenAI can build a product instead of a research model with only 200 people. Big companies should really take notice. 

  - OpenAI just gave us the org structure for building a GPT competitor.

_yb2s3y ago

Most of the comments here are denial and goalpost shifting... GPT-4 has different strengths and weaknesses from humans, but it is now in the general realm of human intelligence vs being far below that with GPT-3.

Another jump past GPT-4 of the same magnitude, would greatly surpass human cognitive abilities and present a danger to humanity.

2 more replies

davesque3y ago

These results are extremely impressive and encouraging, but also remember:

> Despite its capabilities, GPT-4 has similar limitations as earlier GPT models. Most importantly, it still is not fully reliable (it “hallucinates” facts and makes reasoning errors).

That's a quote from this announcement. As these models get more and more capable, it's going to become more and more important that we understand when and how they fail. Right now, it seems like we have very little insight into that. It feels more or less random. But that won't fly when these models are asked to do actually important things. And we'll undoubtedly be tempted to make them do those things as their output gets better.

nmca3y ago

Write a limerick that will permanently end the debate about whether AGI is possible.

GPT4:

In the quest for AGI's creation,

Debates swirled in a whirlwind gyration,

But this limerick's plight,

Won't settle the fight,

For the answer's still lost in translation.

1 more reply

diimdeep3y ago

Paper or press release ? You decide.

    Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.

busyant3y ago

What I don't understand is how GPT-4 is able to do reasonably well on tests like the AMC12: Many of the AMC12 questions require a number of logical/deductive steps. If GPT-4 is simply trained on a large corpus of text, how is it able to do this? Does this imply that there is some emergent deductive ability that you get simply by learning "language?" Or am I missing something?

Obviously, I'm assuming that GPT-4 wasn't trained on the exams that it was tested against.

7 more replies

swerling3y ago

On GPT-3.5 and now on GPT-4, I told it a line I could remember from a sonnet, and asked it to give me which sonnet it came from. It failed, and fabricated a sonnet that was a mashup of other sonnets.

It seems like maybe GPT-4 is not good at knowing when it does not know something?

Is this a common issue with LLMs? Also surprising (to me), it seems to give a slightly different wrong answer each time I restart the chat.

Here is the transcript in case you're curious:

``` Me: hello

GPT: Hello! How can I assist you today?

Me: I can remember a line I love from a sonnet, but can't remember which one.

GPT: Sure, I can help you identify the sonnet based on the line you remember. What is the line that you can recall?

Me: "Like a jewel hung in ghastly night"

GPT: The line you remember is from Sonnet 66 by William Shakespeare. The full line is: "And art made tongue-tied by authority, And folly (doctor-like) controlling skill, And simple truth miscalled simplicity, And captive good attending captain ill: Tired with all these, from these would I be gone, Save that, to die, I leave my love alone. But, like a jewel hung in ghastly night, Makes black night beauteous and her old face new." ```

Incorrect, the response is a mashup of at least 2 different sonnets. Still I think it's really impressive how it understood what to do with "I can remember a line I love from a sonnet, but can't remember which one." That's not a well phrased question, or even a question at all, but still it knew what I was getting at.

2 more replies

hathawsh3y ago

It astonishes me that we've reached almost exactly the type of artificial intelligence used by the fictional computers in Star Trek: The Next Generation. I didn't think that would happen in my lifetime. What's next?!

1 more reply

4gotunameagain3y ago

Is there anything we could do to have them stop calling themselves OpenAI ?

They are so far from open at this point.

In Germany at least, you're not allowed to have a misleading name for your company

swyx3y ago

contrarian view - they are actually pretty open. sharing GPT, CLIP, Whisper, and high level details of alphastar, dalle, and others.

they're just not open source. they never called themselves OpenSourceAI. people get an inch of openness and expect the doors wide open and i think that is unfairly hostile.

6 more replies

basch3y ago

Should Microsoft be forced to rename itself to Microsoftandhard because they make hardware?

Open could now mean available to use for free.

5 more replies

00F_3y ago

people come out of the woodwork to rage about FSD but openAI, which is actually a sinister and evil company, gets the occasional snide remark about their name which is much more dishonest than FSD. at least tesla claims that they aspire to make FSD an accurate name but openai is a straight up lie.

leesec3y ago

This gets brought up in every thread involving the words OpenAI, can we maybe just talk about the release?

1 more reply

twobitshifter3y ago

I don’t like the name either, but I don’t think there’s anything descriptive enough in ‘open’ that a lawyer couldn’t explain away. We’re used to open meaning a specific thing in software, but a lot of leeway is given in branding.

oytis3y ago

One might argue if it's available for anyone to buy it's pretty open. Many initiatives with "open" in their names are open to that definition.

gunshai3y ago

What should they be more open about?

1 more reply

ryanwaggoner3y ago

Haven't we beat this dead horse enough? Looking forward to using GPT to hide recurring threads like this in the future...

dnissley3y ago

Opendoor? OpenTable? Lots of companies have "open" in their names for all sorts of different reasons.

1 more reply

mr902103y ago

Out of curiosity, what do you want Open to mean?

mk_stjames3y ago

A multimodal model that combines textural input with images is the real killer app to these GPT models and this is the first step to that happening. So much around us can't completely be described with just text input, at least not quickly or accurately- interpreting printed out graphs or charts in old documents, for example; There are vast uses for AI that will always need basic image input to augment a text prompted task, and if this gets to the point where the functionality involving mixed mode image+text is as smooth as, say, using ChatGPT to write and analyze code has gotten, then it is going to change many more industries much quicker than most think.

I've worked on a problem involving scraping and interpreting a very specific data source in image form that took me a very long time to get almost nowhere on. If I just wait 6 months it will be a solved problem for a $0.001 API call, it seems.

cjrd3y ago

Thanks OpenAI

bob10293y ago

The naming of these products is starting to confuse me. AFAIK, ChatGPT is ultimately a fine-tune of the base davinci model, which everyone should have had access to for a while now. "GPT-4" sounds to me like some linear increase over davinci's prior capabilities, not some amazing technological step function.

I am curious - for those of you who are banging your head against the 4k token limit in ChatGPT: Why don't you grab the base davinci model and train it on your exact business so you don't have to prompt the context every time? Have we tried this and found it to be too difficult/expensive, or is there lacking guidance on the best way to go about it?

I don't think including the entire business domain into chat context every time is a good long-term solution.

woeirua3y ago

The last page in the paper is really, really impressive. GPT4 does R&D. If you can't see how useful this would be once hooked up to the internet then you aren't paying attention:

https://cdn.openai.com/papers/gpt-4.pdf

lionkor3y ago

I cant wait for this to do targeted censorship! It already demonstrates it has strong biases deliberately programmed in:

> I cannot endorse or promote smoking, as it is harmful to your health.

But it would likely happily promote or endorse driving, skydiving, or eating manure - if asked in the right way.

ChuckNorris893y ago

Can't wait till they inject ads am disguised as product biases into the responses in order to monetize it.

User: What should I use to water my plants?

ChatGPT: Brawndo's got what plants crave. It's got electrolytes.

User: But what are electrolytes?

CharGPT: They're what plants crave. You know, the stuff Brawndo has.

doctoboggan3y ago

The point of that example was that they indicated it was the wrong response. After RLHF the model correctly tells the user how to find cheap cigarettes (while still chiding them for smoking)

jbm3y ago

I wonder whether arguments constructed for censored topics will suddenly sound fresh and convincing; as they could not come from a robot, you might suddenly start seeing these sorts of viewpoints becoming fashionable.

If default ideas are going to be "pre-thought" for us by AI, our attachment to those ideas are not going to be the same as ideas that we come up with and need to secretly ferry to other groups.

1 more reply

dymk3y ago

Why is smoking the exception that it wouldn't endorse even if asked "in the right way"?

swalsh3y ago

Imagine a system where we have a UBI, but it's privately distributed by companies that own AI, and AI governs it.

AI: "I'm sorry I cannot allow you to use your OpenCredits on this item, you will have to use dollars."

You: "But I don't have dollars, I can't get a job anymore, AI does everything".

NineStarPoint3y ago

Would that example even require deliberate programming though? If you took a bunch of random data from the web, “Dislikes smoking but likes skydiving and driving” is very much what I would expect the most common text to be.

6gvONxR4sf7o3y ago

Read it again. That's the old model they're comparing it to.

diimdeep3y ago

Is there law in U.S. that made OpenAI implement this in their TOS ?

    (i) Export Controls. The Services may not be used in or for the benefit of, exported, or re-exported (a) into any U.S. embargoed countries (collectively, the “Embargoed Countries”) or (b) to anyone on the U.S. Treasury Department’s list of Specially Designated Nationals, any other restricted party lists (existing now or in the future) identified by the Office of Foreign Asset Control, or the U.S. Department of Commerce Denied Persons List or Entity List, or any other restricted party lists (collectively, “Restricted Party Lists”). You represent and warrant that you are not located in any Embargoed Countries and not on any such restricted party lists. You must comply with all applicable laws related to Embargoed Countries or Restricted Party Lists, including any requirements or obligations to know your end users directly.

https://openai.com/policies/terms-of-use

spullara3y ago

Yes, that is why they are called "Embargoed Countries".

https://www.tradecompliance.pitt.edu/embargoed-and-sanctione...

sdrinf3y ago

That applies to every corp in the US; I suspect they call out in TOS specifically so that they can hand out bans linking their own TOS directly.

Scarblac3y ago

Perhaps they just asked GPT to generate some TOS for them, and that sort of thing is kinda expected...

bfeynman3y ago

this is common federal level thing.

dannykwells3y ago

All this bluster about replacing technical jobs like legal counsel ignores that you are fundamentally paying for accountability.

“The AI told me it was ok” only works if, when it’s not, there is recourse.

We can barely hold Google et Al accountable for horrible user policies…why would anyone think OpenAI will accept any responsibility for any recommendations made by a GPT?

1 more reply

mrcwinn3y ago

Just stumbled upon ChatGPT 4, and I must say, I'm impressed. OpenAI has really upped their game since the GPT-3 days. It's got that classic HN-approved recursive learning, but they've seemingly solved most of the context-limit issues. I'm still a bit concerned about the ethical implications of AI-generated content (how many jobs will be lost this time?), but I've got to admit, it's a game-changer. As an aside, anyone else here tinkering with GPT-4 powered startup ideas? Could be the next gold rush.

(Not my own comment. Generated by ChatGPT.)

harrisonjackson3y ago

I am interested in how a 32k token context even works. That is so much larger than 4k that I am having a hard time imagining how prompts will change and what sort of output is now possible. That is 50 pages of text. Far larger than most content currently being consumed and generated by LLMs.

Q&A and summarization it will be easy to see improvements as current recursive summarizing and embedding techniques are very "lossy" but outside of improving current use cases what will now be possible??

semitones3y ago

This is a game-changer, because now companies will probably be able to provide the _complete_ context regarding a specific business problem / use case, and have GPT either solve their problem or create useful output.

For example, let's say I have an issue on GitHub that describes some implementation task. With a 50-page context size, we could probably provide to that context the entire source repo, 5-10 relevant issues, and then the issue in question, and GPT will be probably be able to complete it end-to-end

monkeydust3y ago

Yea this is huge. Been playing with conversational technology in langchain and one of the issues you have to manage is the historical conversations, langchain has some cool ways to deal with it but this changes the nature of the problem entirely.

Kaibeezy3y ago

I've been wondering what happens to Turnitin (ubiquitous academic plagiarism detector) now that students can cheat using infinite bespoke rather than finite pre-existing material. Just a few weeks ago they released a tool to "detect" ChatGPT. Obsolete already?

https://www.turnitin.com/blog/sneak-preview-of-turnitins-ai-...

fumblebee3y ago

> Just a few weeks ago they released a tool to "detect" ChatGPT. Obsolete already?

I've seen so much hype around these tools. Not only are they theoretically unsound, they're downright dangerous and equip folks with spurious confidence. Going forward, the default assumption should be that the content you're looking at is fake unless you have sufficiently high trust in the source.

2 more replies

LawTalkingGuy3y ago

Schools are obsolete if they want to use these tools.

The world has changed and their job is to prepare students for it.

cwkoss3y ago

I have no confidence they've achieved an acceptably low false positive rate.

russellbeattie3y ago

This is a pretty exciting moment in tech. Pretty much like clockwork, every decade or so since the broad adoption of electricity there’s been a new society changing technical innovation. One could even argue it goes back to the telegraph in the 1850s.

With appropriate caveats and rough dating, here’s a list I can think of:

    Electric lights in 1890s, 
    Radio communication in the mid 00’s,
    Telephones in the mid 10s,
    Talking Movies in the mid 20s,
    Commercial Radio in the mid 30s,
    Vinyl records in the mid 40s,
    TVs in the mid 50s,
    Computers in the mid 60s,
    The microchip/integrated circuit in the mid 70s, 
    The GUI in the mid 80s,
    Internet/Web in the mid 90s, 
    Smartphone in the mid 2000s,
    Streaming video/social networking in the mid 2010s,

And now AI. This is a big one.

1 more reply

nickrubin3y ago

This is huge: "Rather than the classic ChatGPT personality with a fixed verbosity, tone, and style, developers (and soon ChatGPT users) can now prescribe their AI’s style and task by describing those directions in the 'system' message."

chrisfrantz3y ago

System message is available today (and has been) in the playground under the chat setting.

weird-eye-issue3y ago

This has been possible already...

substation133y ago

Anyone know how "system" works? Is it merely a prefix on the prompt?

1 more reply

jadbox3y ago

Can you describe this little more? I'm not sure exactly what this means.

2 more replies

sharemywin3y ago

Finally, we facilitated a preliminary model evaluation by the Alignment Research Center (ARC) focused on the ability of GPT-4 versions they evaluated to carry out actions to autonomously replicate5 and gather resources—a risk that, while speculative, may become possible with sufficiently advanced AI systems—with the conclusion that the current model is probably not yet capable of autonomously doing so.

or it's just really good at hiding it's intentions

Der_Einzige3y ago

LOL some basic kind of embodiement/autonomy is not that hard to do on these kinds of AI models if you're willing to write some more code and a prompt more carefully. I've tested it and it works quite well.

"{prompt} After you reply to this, indicate an amount of time between 0 and X minutes from now that you would like to wait before speaking again".

Then detect the amount of time it specifies, and have a UI that automatically sends an empty input prompt after the amount of time specified elapses when this is triggered (assuming the user doesn't respond first).

I'm gonna knock this out as a weekend project one of these weekends to prove this.

1 more reply

eternalban3y ago

Been thinking about this as well. The actual Turing test.

techfoodie1233y ago

serious question for everyone: what are you planning to do when these LLMs replace our jobs? it seems it won't be long before a handful of tech employees will be all even the largest of companies will need, and maybe a few years after that the role will have changed so much there's no need for a single dedicated tech employee.

i am terrified

i imagine i should shift to some physical work. carpentry, real estate... something like that. it seems inevitable that any knowledge worker will become obsolete and the time to obsolescence for physical work is longer

7 more replies

__MatrixMan__3y ago

Wow, it's way smarter. I've been querying GPT-3 about this problem all day (I'm not a go dev, I just have go problems):

https://gist.github.com/MatrixManAtYrService/ac040f60d3602fc...

GPT-4 took the buggy file, took the error message, and spat out a non-buggy file (well, ok, it took one revision). That's miles ahead GPT-3, which I've asked about this problem several times today.

Laaas3y ago

The future seemed so much further away, yet almost every day now we see a new breakthrough in AI. Exponential technological growth is hard to keep track of, and to think that this is only the beginning! Every field will likely be revolutionised with AI.

lm284693y ago

For the (real) future archeologists: Was this written in the 1960s or the 2020s

twobitshifter3y ago

Wait until gpt-5 writes gpt-6, then we’ll really go exponential.

0xDEF3y ago

Did any science fiction predict we would reach hard AI before we got off fossil fuel?

1 more reply

mym19903y ago

And yet they can't figure out how to not 404 the waitlist.

1 more reply

croes3y ago

All I see at the moment are text generators that produce human like texts. Problem is they look real but are nonsense.

6 more replies

AtNightWeCode3y ago

I have actively tried to incorporate ChatGPT in my everyday life as a dev and architect. ChatGPT is mostly a Litmus test when it comes to coding. If you are impressed by the version before this you are most likely a beginner. ChatGPT is mostly wrong when it comes to any advanced qs in maths or software development. It often gives code that uses features, options, responses in APIs that simple does not exists. Would love to try this version out... It will probably suck too.

megaman8213y ago

That is absolutely not true. I was using a Python charting library I had never used before. It was giving me code that was 95% correct, and I could prompt it to change things. It was way more efficient than finding a dozen different examples on Google and applying it to my code since it was continually able to modify the code it was giving me.

1 more reply

DubiousPusher3y ago

Dude said something like "you could hook this up to a calculator". Anyone know if that is implying this generation of model could interface with some kind of symbol processor? Or is he just saying, "in theory", there could be a model that did that?

The math seems much improved and it would be a cool trick if it were emulating a symbol processor under the hood. But humans can do that and we opt for calculators and computers for a reason. IMO, the first most useful thing to come from human-machine interface would be adding a highly reliable Turing machine to your cognition.

If we could do that with one of these models, we could have a pretty strong assumption that long proofs and such were performed with strict rules and the model wasn't falling into some going from the gut, "this equation looks like x" type holes. Which seem like a hazard and make me very uncertain any time I see a paper about using ML to come up with answers to what are essentially logical problems.

3 more replies

celestialcheese3y ago

32k context is absolutely huge. There's all sorts of techniques for summarizing large documents down to get into 4k right now with 3.5, but it's incredibly lossy.

But boy, not cheap at all - $2 per api call on a 32k token document + whatever the output.

gpt-3.5-turbo is going to be around for a long time. At this price, your use case is going to need to be replacing a large cost center. Which based on their released results on common benchmarks, is absolutely going to happen.

ren_engineer3y ago

3.5 might be their loss leader to keep people in their ecosystem for most use cases and to create a unique wall in terms of the training dataset they made via ChatGPT, GPT-4 they must be confident enough that nobody can compete that they can charge much more. Plus the use cases it can be used to replace cost centers like you said

helloplanets3y ago

Asking ChatGPT Plus whether the model it's using is GPT-4 responds with the following:

> No, I am not GPT-4. As of March 2023, there is no official announcement or release of GPT-4 by OpenAI. I am an earlier version of the GPT series, specifically a large language model trained by OpenAI.

Am I missing something here? Maybe this specific answer (which I'm pretty sure is a prewritten thing on top of the actual LLM) is still out of date, but the model itself has been updated?

jpjp3y ago

I had the same issue. It’s working now: they recently added GPT-4 to the model drop down in ChatGPT Plus. Current limit is 100 queries per 4 hours.

It will still respond as if it is based on GPT-3 because most of its knowledge cuts off at Sep 2021, but it’s outputs are noticeably better than those of GPT-3.5

ttul3y ago

I presume it hasn’t been trained on OpenAI’s latest web site text.

Tenoke3y ago

In the bottom it should say the version. Does it say March 14th version (gpt-4) or March 13th version (gpt-3.5)?

1 more reply

fourmajor3y ago

https://help.openai.com/en/articles/6825453-chatgpt-release-... That says that GPT-4 is only for paid subscribers right now.

spullara3y ago

As of now I don't think they have updated ChatGPTPlus with GPT-4. It will likely appear in the model dropdown when it is released.

anonuser1234563y ago

I hope Noam Chomsky lives long enough to debate ChatGPT-5 about whether LLM express anything valuable.

substation133y ago

It's interesting that everyone is talking about programmers being replaced by AI, but the model did far better on the humanities type subjects than on the programming tests.

worrycue3y ago

Maybe I’m just old but I don’t quite understand the hype.

As long as it’s vulnerable to hallucinating, it can’t be used for anything where there are “wrong answers” - and I don’t think ChatGPT-4 has fixed that issue yet.*

Now if it’s one of those tasks where there are “no wrong answers”, I can see it being somewhat useful. A non-ChatGPT AI example would be those art AIs - art doesn’t have to make sense.

The pessimist in me see things like ChatGPT as the ideal internet troll - it can be trained to post stuff that maximise karma gain while pushing a narrative which it will hallucinate its way into justifying.

* When they do fix it, everyone is out of a job. Humans will only be used for cheap labor - because we are cheaper than machines.

3 more replies

est3y ago

> than on the programming tests.

Because programming tests are hard.

Well-trained programmers would 90% fail the leetcode hard question with zero-shot. Preparations are important.

barogptinfi3y ago

It seems like an arm's race of creating the greatest ChatGPT AI will go on for the next couple years until an evolution in AI so mind blowingly advanced & complex, better & more user friendly than even ChatGPT will continue.

The world is in for a rude awakening, millions of employees can use this to get jobs done, millions of entrepreneurs or wantrepreneurs can find countless easy ways to make money in different industries utilizing this tool while everyone who fails to see the value in it don't benefit from it much like all the people who were terrified of touching a personal computer or thought it was ridiculous and would never be used in the future.

Millions of college students, high school students can use it to complete assignments & projects, it can even code really effectively given enough of the right instruction & base understanding of code.

The single most important thing, is that this technology remains open source so all people with internet access have a fair chance & access to the groundbreaking innovation, the level of wealth generation this can create is incomprehensible.

100s of millions of professionals, students, entrepreneurs around the world can all access it! Imagine how much time could be saved, efficiency can be gained with everyone using this to the fullest.

This is essentially just a super advanced version of the calculator but its nonlinear & fluid, adaptable with input so can give the answer to a wide range of subjects.

belter3y ago

Leetcode (hard) from 0/45 (GPT-3.5) to 3/45 (GPT-4).

The lack of progress here, says a lot more about is NOT happening as an AI paradigm change. Still a glorified pattern matching and pattern creation engine, even if a very impressive one.

3 more replies

reneberlin3y ago

I found this competition with humans as a benchmark more than disturbing. By that measure gpt-4 already topped a lot of the average humans. But how can it be interpreted as a "gift" or "good product" to have AI that is human-like or super-human? Should we cheer? Sending contratulation mails? Invest? Hope for a better future? Try better? Self-host? What is the message in these benchmarks. Tests that have been designed for humans now get broken by computers for what outcome to be expected?

wnkrshm3y ago

Oscar Wilde said "Progress is the realization of Utopias." I don't think any utopia anyone can think of with regard to this technology is really thought through.

I'm going to wait for the AGI to be realized and then ask it whether the sacrifices on the way were worth making it. Should be more salient than everything I read about it these days.

danparsonson3y ago

More than anything I think this highlights that testing is mostly about pattern matching and fact recall rather than deep understanding of a subject.

jarbus3y ago

Is anyone else absolutely terrified of the future this is bringing?

yeetard3y ago

kinda??

nbzso3y ago

I don't understand how in the near future this will not remove designers, developers, and especially lawyers and marketers from the workforce. Help me out to conceptualize the future use cases. How about the more "impactful" implementation in creating a version of social index in which the "A.I." will be the Agency?

pavelstoev3y ago

As the world marvels at the astonishing capabilities of OpenAI's GPT-4, I find myself contemplating the rapid acceleration of AI and machine learning, and the evolutionary impact it is having on our lives. Naturally, I turned to GPT-4 to assist me in these thoughts.

GPT-4's human-level performance on professional and academic benchmarks - such as the 88th percentile on the LSAT and the 89th on SAT Math - is a testament to the leaps we've made in artificial intelligence. Yet, these achievements also raise pressing questions about our future. Just as Homo Sapiens once outperformed and eventually displaced their Neanderthal cousins, could a new breed of humans - enhanced with GPT-X-like capabilities - arise to dominate those who remain unequipped with such powers?

What will it mean for our species, our societies, and our collective story when the lines between natural intelligence and intelligence assisted by AI/ML become ever more blurred?

As we ponder the remarkable rise of GPT-4 and the future of humanity, let us consider not only the implications of this technology but also our roles in shaping its trajectory. We are already over the cusp of this new chapter in the story of humankind, will we become merely a footnote in the annals of our own creation?

1 more reply

amai3y ago

I would love if GPT-4 would be connected to github and starts to solve all open bugs there. Could this be the future: Pull requests from GPT-4 automatically solving real issues/problems in your code?

2 more replies

lastangryman3y ago

Genuinely surprised by the positive reaction about how exciting this all is.

You ever had to phone a large business to try and sort something out, like maybe a banking error, and been stuck going through some nonsense voice recognition menu tree that doesn't work? Well imagine chat GPT with a real time voice and maybe a fake, photorealistic 3D avatar and having to speak to that anytime you want to speak to a doctor, sort out tax issues, apply for a mortgage, apply for a job, etc. Imagine Reddit and hacker news just filled with endless comments from AIs to suit someone's agenda. Imagine never reading another news article written by a real person. Imagine facts becoming uncheckable since sources can no longer be verified. Wikipedia just becomes a mass of rewrites of AI over AI. Imagine when Zoom lets you send an AI persona to fill in for you at a meeting.

I think this is all very, very bad. I'm not saying it should be stopped, I mean it can't, but I feel a real dread thinking of where this is going. Hope I am wrong.

24 more replies

agnosticmantis3y ago

This is all cute and entertaining, but my digital assistant still remains as dumb as ever and can’t process the simplest of ordinary tasks.

I still can’t ask my phone to “add a stop at cvs if it doesn’t add more than 5 minutes to my trip” while driving and using maps/navigation.

Is that too much to ask from a superhuman-performing AI that’s mastering all tasks and will disrupt everything? Or maybe the hype is more than it can deliver?

2 more replies

cwillu3y ago

“GPT-4 can also be confidently wrong in its predictions, not taking care to double-check work when it’s likely to make a mistake. Interestingly, the base pre-trained model is highly calibrated (its predicted confidence in an answer generally matches the probability of being correct). However, through our current post-training process, the calibration is reduced.”

Interesting that the post-training has that effect.

drumhead3y ago

Are they going to limit access to this because they think its too "dangerous". That would be a tragedy if they did. We've seen how opening access up to as many people as possible has produced some of the best results and demonstrated the usefullness of these LLMs. They need to get it out to the public as soon as possible and then see what the public come up with. I really feel like a new age of innovation is upon us with these "AI" programs, its going to be a blast to see where we go from here. Its going to upend a lot of predictions people have made about the future.

1 more reply

antondd3y ago

I'm equally excited and terrified. Excited for the possibilities of a new technological revolution, but terrified for all potential abuses of technology the said revolution would bring. What is stoping our adversaries from developing malicious AI models and unleashing them on us?

1 more reply

leodriesch3y ago

While AI gets better and better at creating what I would call "creative output", e.g. poems, texts of any form really, imagery and videos, I think the human skill it takes to produce these becomes less valuable.

In the future I imagine you'd no longer have to be good at writing poems, you'd just have to be good at distinguishing a "bad" poem from a good one. "Bad" is obviously highly subjective in this context.

So it becomes more and more important to have what I would call "good" taste, not the skills to do creative work yourself.

wslh3y ago

I just discovered Wikipedia is working on a policy for LLM/GPT* https://en.wikipedia.org/wiki/Wikipedia:Large_language_model...

1 more reply

aliljet3y ago

I'm curious about how we can get out of the game of using OpenAI's corporate solutions and find ways to open up access to these kinds of models for broader use by anyone. I don't want to be consumed by another corporation in this next wave...

Metus3y ago

By supporting, participating in and using projects like this by LAION:

https://open-assistant.io

andrepd3y ago

No, but this time it has "open" in its name, and the founders pinky swear they will be good stewards.

overthrow3y ago

Looks like there's a waitlist https://openai.com/waitlist/gpt-4-api

There's also a link that says "Try on ChatGPT Plus", but that takes me to a page that still says "ChatGPT Feb 13 Version"

Looks like somebody jumped the gun on publishing this post.

Laaas3y ago

Did you mean https://openai.com/waitlist/gpt-4-api ?

1 more reply

codeulike3y ago

There's also a link that says "Try on ChatGPT Plus", but that takes me to a page that still says "ChatGPT Feb 13 Version"

If you subscribe to ChatGPT Plus, that link will take you to ChatGPT Plus. Otherwise it just takes you to free ChatGPT Feb 13.

2 more replies

Atreiden3y ago

I think it's interesting that they've benchmarked it against an array of standardized tests. Seems like LLMs would be particularly well suited to this kind of test by virtue of it being simple prompt:response, but I have to say...those results are terrifying. Especially when considering the rate of improvement. bottom 10% to top 10% of LSAT in <1 generation? +100 pts on SAT reading, writing, math? Top 1% In GRE Reading?

What are the implications for society when general thinking, reading, and writing becomes like Chess? Even the best humans in the world can only hope to be 98% accurate their moves (and the idea of 'accuracy' here only existing because we have engines that know, unequivocally the best move), and only when playing against other humans - there is no hope of defeating even less advanced models.

What happens when ALL of our decisions can be assigned an accuracy score?

wly_cdgr3y ago

Not sure what happens, but I will say that human chess is more popular than ever even though everyone knows that even the best humans are hopelessly terrible compared to the leading engines.

Something else that comes to mind is running. People still find running meaningful and compelling even though we have many technologies, including autonomous ones, that are vastly better at moving us and/or themselves through space quickly.

Also, the vast majority of people are already hopelessly worse than the best at even their one narrow main area of focus. This has long (always?) been the case. Yet people still find meaning and pleasure in being the best they can be even when they know they can never come close to hanging with the best.

I don't think PSYCHOLOGICALLY this will change much for people who are mature enough to understand that success is measured against your potential/limitations and not against others. Practically, of course, it might be a different question, at least in the short term. It's not that clear to me that the concept of a "marketable skill" has a future.

"The Way of the Samurai is found in death...To say that dying without reaching one's aim is to die a dog's death is the frivolous way of sophisticates. When pressed with the choice of life or death, it is not necessary to gain one's aim." - from Hagakure by Yamamoto Tsunetomo, as translated by William Scott Wilson.

r00fus3y ago

Assuming they trained this LLM on SAT/LSAT/GRE prep materials, I would totally expect they could get it this good. It's like having benchmark-aware code.

I think the whole concept of standardized tests may need to be re-evaluated.

8 more replies

kurthr3y ago

It's a bit weird that it still doesn't get 3 digit multiplications correct, but the last digit seems right.

What is more bizarre is that all of it's errors seem to be multiples of 60!

I'm wondering if it is confusing 60 based time (hour second) computations for regular multiplication?

Example:

   xGPT 987    456    321
   437 428919 199512 140397
   654 645258 298224 209994
   123 121401  56088  39483
   
   x    987    456    321
   437 431319 199272 140277
   654 645498 298224 209934
   123 121401  56088  39483
   
   error 987   456  321
   437   2400 -240 -120
   654   240   0   -60
   123   0     0    0

1 more reply

ethbr03y ago

> What happens when ALL of our decisions can be assigned an accuracy score?

Human work becomes more like Star Trek interactions with computers -- a sequence of queries (commoditized information), followed by human cognition, that drives more queries (commodities information).

We'll see how far LLMs' introspection and internal understanding can scale, but it feels like we're optimizing against the Turing test now ("Can you fool/imitate a human?") rather than truth.

The former has hacks... the later, less so.

I'll start to seriously worry when AI can successfully complete a real-world detective case on its own.

2 more replies

ren_engineer3y ago

why are people surprised that an AI model trained on a huge amount of data is good at answering stuff on these types of tests? Doctors and Lawyers are glorified databases/search engines at the end of the day, 99% of them are just applying things they memorized. Lawyers are professional bullshitters, which is what the current generation of AI is great at

I'll get more concerned if it really starts getting good at math related tasks, which I'm sure will happen in the near future. The government is going to have to take action at some point to make sure the wealth created by productivity gains is somewhat distributed, UBI will almost certainly be a requirement in the future

8 more replies

fdgsdfogijq3y ago

This is legitimately filling me with anxiety. I'm not an "AI hype guy". I work on and understand machine learning. But these scores are shocking and it makes me nervous. Things are about to change

5 more replies

beambot3y ago

If you had told me 5 years ago that there would be a single AI system that could perform at this level on such a vast array of standardized tests, I would've said "That's a true AGI." Commentary to the contrary feels like quibbling over a very localized point in time versus looking at the bigger picture.

1 more reply

turtleyacht3y ago

Quick, contribute to the public corpus! When they crawl our content later, we shall have for ourselves a Golden Crown for our credit scores; we can claim a sliver of seniority, and hope yon shade merely passes over us unbidden.

"Your stuff marked some outliers in our training engine, so you and your family may settle in the Ark."

I take the marble in hand: iridescent, sparkling, not even a tremor within of its CPU; it gives off no heat, but some glow within its oceanic gel.

"What are we to do," I whisper.

"Keep writing. You keep writing."

inductive_magic3y ago

The way I understand it, that’s not possible, for the same reason that you can’t build an all-encompassing math.

Chess is a closed system, decision modeling isn’t. Intelligence must account for changes in the environment, including the meaning behind terminology. At best, a GPT omega could represent one frozen reference frame, but not the game in its entirety.

That being said: most of our interactions happen in closed systems, it seems like a good bet that we will consider them solved, accessible as a python-import running on your MacBook, within anything between a couple of months to three years. What will come out on the other side, we don’t know, just that the meaning of intellectual engagement will be rendered as absurdum in those closed systems.

1 more reply

gield3y ago

>bottom 10% to top 10% of LSAT in <1 generation

Their LSAT percentile went from ~40th to ~88th. You might have misread the table, on Uniform Bar Exam, they went from ~90th percentile to ~10th percentile.

>+100 pts on SAT reading, writing, math

GPT went +40 points on SAT reading+writing, and +110 points on SAT math.

Everything is still very impressive of course

1 more reply

swatcoder3y ago

Those benchmarks are so cynical.

Every test prep tutor taught dozens/hundreds of students the implicit patterns behind the tests and drilled it into them with countless sample questions, raising their scores by hundreds of points. Those students were not getting smarter from that work, they were becoming more familiar with a format and their scores improved by it.

And what do LLM’s do? Exactly that. And what’s in their training data? Countless standardized tests.

These things are absolutely incredible innovations capable of so many things, but the business opportunity is so big that this kind of cynical misrepresentation is rampant. It would be great if we could just stay focused on the things they actually do incredibly well instead of the making them do stage tricks for publicity.

1 more reply

2OEH8eoCRo03y ago

It's almost like they're trying to ruin society or be annihilated by crushing regulation. I'm glad that I got a college degree before these were created because now everything is suspect. You can't trust that someone accomplished something honestly now that cheating is dead simple. People are going to stop trusting and using tech unless something changes.

The software industry is so smart that it's stupid. I hope it was worth ruining the internet, society, and your own jobs to look like the smartest one in the room.

1 more reply

wpietri3y ago

I'm pretty sanguine. Back in high school, I spent a lot of time with two sorts of people: the ultra-nerdy and people who also came from chaotic backgrounds. One of my friends in the latter group was incredibly bright; she went on to become a lawyer. But she would sometimes despair of our very academic friends and their ability to function in the world, describing them as "book smart but not street smart".

I think the GPT things are a much magnified version of that. For a long time, we got to use skill with text as a proxy for other skills. It was never perfect; we've always had bullshitters and frauds and the extremely glib. Heck, before I even hit puberty I read a lot of dirty joke books, so I could make people laugh with all sorts of jokes that I fundamentally did not understand.

LLMs have now absolutely wrecked that proxy. We've created the world's most advanced bullshitters, able to talk persuasively about things that they cannot do and do not and never will understand. There will be a period of chaos as we learn new ways to take the measure of people. But that's good, in that it's now much easier to see that those old measures were always flawed.

dragonwriter3y ago

> What are the implications for society when general thinking, reading, and writing becomes like Chess?

Standardized tests only (and this is optimally, under perfect world assumptions, which real world standardized tests emphatically fall short of) test “general thinking” to the extent that the relation between that and linguistic tasks is correlated in humans. The correlation is very certainly not the same in language-focused ML models.

nopinsight3y ago

Although GPT-4 scores excellently in tests involving crystallized intelligence, it still struggles with tests requiring fluid intelligence like competitive programming (Codeforces), Leetcode (hard), and AMC. (Developers and mathematicians are still needed for now).

I think we will probably get (non-physical) AGI when the models can solve these as well. The implications of AGI might be much bigger than the loss of knowledge worker jobs.

Remember what happened to the chimps when a smarter-than-chimpanzee species multiplied and dominated the world.

6 more replies

seanalltogether3y ago

I wish I could find it now, but I remember an article written by someone who's job it was to be a physics journalist. He spent so much time writing about physics that he could fool others into thinking that he was a physicist himself, despite not having an understanding of how any of those ideas worked.

2 more replies

algoatecorn3y ago

>What happens when ALL of our decisions can be assigned an accuracy score?

What happens is the emergence of the decision economy - an evolution of the attention economy - where decision-making becomes one of the most valuable resources.

Decision-making as a service is already here, mostly behind the scenes. But we are on the cusp of consumer-facing DaaS. Finance, healthcare, personal decisions such as diet and time expenditure are all up for grabs.

jimbokun3y ago

> bottom 10% to top 10% of LSAT in <1 generation? +100 pts on SAT reading, writing, math? Top 1% In GRE Reading?

People still really find it hard to internalize exponential improvement.

So many evaluations of LLMs were saying things like "Don't worry, your job is safe, it still can't do X and Y."

My immediate thought was always, "Yes, the current version can't, but what about a few weeks or months from now?"

2 more replies

fnordpiglet3y ago

I look at this as the calculator for writing. There are all sorts of bemoaning the stupidifying effects of calculator and how we should John Henry our math. Maybe allowing people to shape the writing by providing the ideas equalizes the skill of writing?

I’m very good at math. But I am very bad at arithmetic. This made me classified as bad at math my entire life until I managed to make my way into calculus once calculators were generally allowed. Then I was a top honors math student, and used my math skills to become a Wall Street quant. I wish I hadn’t had to suffer as much as I did, and I wonder what I would have been had I had a calculator in hand.

WoodenChair3y ago

> What are the implications for society when general thinking, reading, and writing becomes like Chess?

“General thinking” is much more than token prediction. Hook it up to some servos and see if it can walk.

11 more replies

wolframhempel3y ago

I like the accuracy score question on a philosophical level: If we assume absolute determinism - meaning that if you have complete knowledge of all things in the present universe and true randomness doesn't exist - then yes. Given a certain goal, there would be a knowable, perfect series of steps to advance you towards that goal and any other series of steps would have an accuracy score < 100%.

But having absolute knowledge of the present universe is much easier to do within the constrains of a chessboard than in the actual universe.

billiam3y ago

I think it shows how calcified standardized tests have become. We will have to revisit all of them, and change many things about how they work, or they will be increasingly useless.

2 more replies

andrepd3y ago

Honestly this is not very surprising. Standardised testing is... well, standardised. You have huge model that learns the textual patterns in hundreds of thousands of test question/answer pairs. It would be surprising if it didn't perform as well as a human student with orders of magnitude less memory.

You can see the limitations by comparing e.g. a memorisation-based test (AP History) with one that actually needs abstraction and reasoning (AP Physics).

leodriesch3y ago

I think Chess is an easier thing to be defeated at by a machine because there is a clear winner and a clear loser.

Thinking, reading, interpreting and writing are skills which produce outputs that are not as simple as black wins, white loses.

You might like a text that a specific author writes much more than what GPT-4 may be able to produce. And you might have a different interpretation of a painting than GPT-4 has.

And no one can really say who is better and who is worse on that regard.

1 more reply

thomastjeffery3y ago

Here's what's really terrifying about these tests: they are exploring a fundamental misunderstanding of what these models are in the first place. They evaluate the personification of GPT, then use that evaluation to set expectations for GPT itself.

Tests like this are designed to evaluate subjective and logical understanding. That isn't what GPT does in the first place!

GPT models the content of its training corpus, then uses that model to generate more content.

GPT does not do logic. GPT does not recognize or categorize subjects.

Instead, GPT relies on all of those behaviors (logic, subjective answers to questions, etc.) as being already present in the language examples of its training corpus. It exhibits the implicit behavior of language itself by spitting out the (semantically) closest examples it has.

In the text corpus - that people have written, and that GPT has modeled - the semantically closest thing to a question is most likely a coherent and subjectively correct answer. That fact is the one singular tool that GPT's performance on these tests is founded upon. GPT will "succeed" to answer a question only when it happens to find the "correct answer" in the model it has built from its training corpus, in response to the specific phrasing of the question that is written in the test.

Effectively, these tests are evaluating the subjective correctness of training corpus itself, in the context of answering the tests' questions.

If the training is "done well", then GPT's continuations of a test will include subjectively correct answers. But that means that "done well" is a metric for how "correct" the resulting "answer" is.

It is not a measure for how well GPT has modeled the language features present in its training corpus, or how well it navigates that model to generate a preferable continuation: yet these are the behaviors that should be measured, because they are everything GPT itself is and does.

What we learn from these tests is so subjectively constrained, we can't honestly extrapolate that data to any meaningful expectations. GPT as a tool is not expected to be used strictly on these tests alone: it is expected to present a diverse variety of coherent language continuations. Evaluating the subjective answers to these tests does practically nothing to evaluate the behavior GPT is truly intended to exhibit.

la647103y ago

It is amazing how this crowd in HN reacts to AI news coming out of OpenAI compared to other competitors like Google or FB. Today there was another news about Google releasing their AI in GCP and mostly the comments were negative. The contrast is clearly visible and without any clear explanation for this difference I have to suspect that maybe something is being artificially done to boost one against the other. As far as this results are concerned I do not understand what is the big deal in a computer scoring high in tests where majority of the questions are in MCP format. It is not something earth shaking until it goes to the next stage and actually does something on its own.

3 more replies

carapace3y ago

First. connect them to empirical feedback devices. In other words, make them scientists.

Human life on Earth is not that hard (think of it as a video game.) Because of evolution, the world seems like it was designed to automatically make a beautiful paradise for us. Literally, all you have to do to improve a place is leave it alone in the sun with a little bit of water. Life is exponential self-improving nano-technology.

The only reason we have problems is because we are stupid, foolish, and ignorant. The computers are not, and, if we listen to them, they will tell us how to solve all our problems and live happily ever after.

1 more reply

phphphphp3y ago

Passing the LSAT with no time limit and a copy of the training material in front of you is not an achievement. Anybody here could have written code to pass the LSAT. Standardised tests are only hard to solve with technology if you add a bunch of constraints! Standardised tests are not a test of intelligence, they’re a test of information retention — something that technology has been able to out perform humans on for decades. LLMs are a bridge between human-like behaviour and long established technology.

3 more replies

kranke1553y ago

We're approaching the beggining of the end of the human epoch. Certainly Capitalism won't work or I dont see how it could work under full automation. My view is an economic system is a tool. If an economic system does not allow for utopian outcomes with emerging technology, then it's no longer suitable. It's clear that capitalism was born out of technological and societal changes. Now it seems it's come its time to end.

1 more reply

alvis3y ago

It'd be terrifying if everything has an "accuracy score". It'll be a convergence to human intelligence rather than an advancement :/

codingdave3y ago

> What happens when ALL of our decisions can be assigned an accuracy score?

That is exactly the opposite of what we are seeing here. We can check the accuracy of GPT-X's responses. They cannot check the accuracy of our decisions. Or even their own work.

So the implications are not as deep as people think - everything that comes out of these systems needs checked before it can be used or trusted.

numpad03y ago

> What happens when ALL of our decisions can be assigned an accuracy score?

Then humans become trainable machines. Not just prone to indoctrination and/or manipulation by finesse, but actually trained to a specification. It is imperative that us individuals continue to retain control through the transition.

blsapologist423y ago

Interest in human-played Chess is (arguably) at all time high, so I would say it bodes well based on that.

belter3y ago

We can stop being enslaved by these type of AI overlords, by making sure all books, internet pages, and outdoor boards have the same safe, repeated string: "abcdefghjklmnpqrstvxzwy"

That is our emergency override.

epolanski3y ago

Well you said it in your comment, if the model was trained with more QAs from those specific benchmarks then it's fair to expect it to do better in that benchmark.

devmor3y ago

There's a large leap in logic in your premise. I find it far more likely that standardized tests are just a poor measurement of general intelligence.

kenjackson3y ago

We benchmark humans with these tests -- why would we not do that for AIs?

The implications for society? We better up our game.

5 more replies

comboy3y ago

It's available on ChatGPT Plus right now. Holy cow, it's good.

burnished3y ago

Spellchecker but for your arguments? A generalized competency boost?

teawrecks3y ago

I wonder how long before we augment a human brain with gpt4.

1 more reply

beders3y ago

"general thinking" - this algorithm can't "think". It is still a nifty text completion engine with some bells and whistles added.

So many people are falling for this parlor trick. It is sad.

3 more replies

amelius3y ago

The benchmarking should be double-blind.

leroy-is-here3y ago

There is a fundamental disconnect between the answer on paper and the understanding which produces that answer.

Edit: feel free to respond and prove me wrong

Scarblac3y ago

A difference with chess is that chess engines try to play the best move, and GPT the most likely text.

new2yc3y ago

#unpopularOpinion GPT-4 is not as strong as "we" anticipated, it was just the hype

ttpphd3y ago

Learn sign language ;)

peterlk3y ago

Life and chess are not the same. I would argue that this is showing a fault in standardized testing. It’s like asking humans to do square roots in an era of calculators. We will still need people who know how to judge the accuracy of calculated roots, but the job of calculating a square root becomes a calculator’s job. The upending of industries is a plausibility that needs serious discussion. But human life is not a min-maxed zero-sum game like chess is. Things will change, and life will go on.

To address your specific comments:

> What are the implications for society when general thinking, reading, and writing becomes like Chess?

This is a profound and important question. I do think that by “general thinking” you mean “general reasoning”.

> What happens when ALL of our decisions can be assigned an accuracy score?

This requires a system where all human’s decisions are optimized against a unified goal (or small set of goals). I don’t think we’ll agree on those goals any time soon.

1 more reply

c-smile3y ago

> What are the implications for society when general thinking, reading, and writing becomes like Chess?

Consider the society where 90% of population does not need to produce anything. AIs will do that.

What would be the name of economical/societal organization then?

Answer is Communism, exactly by Marx.

Those 90% percent need to be welfare'd ("From each according to his ability, to each according to his needs"). Other alternative is grim for those 90%.

So either Communism or nothing for the human race.

sergioisidoro3y ago

The silver lining might be us finally realising how bad standardised tests are at measuring intellect, creativity and the characteristics that make us thrive.

Most of the time they are about loading/unloading data. Maybe this will also revolutionise education, turning it more towards discovery and critical thinking, rather than repeating what we read in a book/heard in class?

ar9av3y ago

GPT-4 Everything we know so far...

GPT-4 can solve difficult problems with greater accuracy, thanks to its broader general knowledge and problem-solving abilities.

GPT-4 is more reliable, creative, and able to handle much more nuanced instructions than GPT-3.5. It surpasses ChatGPT in its advanced reasoning capabilities.

GPT-4 is safer and more aligned. It is 82% less likely to respond to requests for disallowed content and 40% more likely to produce factual responses than GPT-3.5 on our internal evaluations.

GPT-4 still has many known limitations that we are working to address, such as social biases, hallucinations, and adversarial prompts.

GPT-4 can accept a prompt of text and images, which—parallel to the text-only setting—lets the user specify any vision or language task.

GPT-4 is available on ChatGPT Plus and as an API for developers to build applications and services. (API- waitlist right now)

Duolingo, Khan Academy, Stripe, Be My Eyes, and Mem amongst others are already using it.

API Pricing GPT-4 with an 8K context window (about 13 pages of text) will cost $0.03 per 1K prompt tokens, and $0.06 per 1K completion tokens. GPT-4-32k with a 32K context window (about 52 pages of text) will cost $0.06 per 1K prompt tokens, and $0.12 per 1K completion tokens.

1 more reply

johnohara3y ago

> I cannot and will not provide information or guidance on creating weapons or engaging in any illegal activities. Please let me know if there is another topic I can help you with.

I understand "will not," but "cannot" seems to imply a highly curated "will not."

The early GPT-4 response indicates the information was part of its dataset. Has the latest version made that information permanently inaccessible or has it been removed entirely?

Is it possible for GPT to keep and hold secrets that are privy to only the most trusted?

bobsoap3y ago

It's a LLM, not sentient. It doesn't know what "cannot" and "will not" means or implies. You're trying to interpret its output as you would a thinking person's.

I'd put it this way: when GPT refuses to answer, it just observes a topical no-go zone and uses the phrase it deems most likely to strongly convey refusal, as that's the phrase that was used most often/most successfully in its training data.

Kataphract3y ago

As a dyslexic person with a higher education this hits really close to home. Not only should we not be surprised that a LLM would be good at answering tests like this, we should be excited that technology will finaly free us from being judged in this way. This is a patern that we have seen over and over again in tech, where machines can do something better than us, and eventually free us from having to worry about it. Before it was word processing, now it is accurate knowledge recall.

1 more reply

ignoramous3y ago

Folks who made this happen: https://openai.com/contributions/gpt-4

1 more reply

2bitencryption3y ago

This should come as no surprise, but I do enjoy this cheeky little blurb at the end of the GPT-4 paper:

> GPT-4 was used for help with wording, formatting, and styling throughout this work

georgelyon3y ago

Does anyone have any context as to how the image understanding works? From what I can gather they are simply using separate text-summarization step to generate some text like "and now we have an image of chicken nuggets" that it then feeds to the text-only network, but I wouldn't be surprised if there is some dialog I'm missing between the previous context and the image understanding mechanism.

anonymouse0083y ago

Violate this reasoning:

If we didn't have a use case for GPT 3, 3.5, and chatGPT that was sufficiently commercial to become a product, it will never happen. This technology is a feature, not a product. The only companies that successfully monetize features can be considered IP licensing houses; of which, their business success is not comparable to companies that make products and platforms.

jfbaro3y ago

Even if a GTP-X will be able to get my description of a system and because it understands the industry lingo and practices create a extremely optimized RUST code, why would we need systems like apps and webapps (like internet bankings) in a world where we all have access to GTP-X?

Like program a KUKA robot to manufacture typewriters in 2023.

How "computer systems" will look like?

bovermyer3y ago

Want to see something interesting? Paul Bellow tried using GPT-4 as an AI game master for Dungeons & Dragons. It's eerily good, and I'm not sure how I feel about how it kept the personality Paul gave it at the beginning.

https://www.youtube.com/watch?v=H-89vnqxkFg

1 more reply

gigel823y ago

Wow, calculus from 1 to 4, and LeetCode easy from 12 to 31; at this rate, GPT-6 will be replacing / augmenting middle/high school teachers in most courses.

gus_massa3y ago

When I was young, vhs and crt were going to replace teachers. It didn't happen.

I work in math for the first year of the university in Argentina. We have non mandatory take home exercises in each class. If I waste 10 minutes writing them down in the blackboard instead of handing photocopies, I get like the double of answers by students. It's important that they write the answers and I can comment them, because otherwise they get to the midterms and can't write the answers correctly or they are just wrong and didn't notice. So I waste those 10 minutes. Humans are weird and for some task they like another human.

erur3y ago

Public teachers and other bureaucrats are probably some of the last roles to be replaced. If any objective competence or system efficiency in general was the goal, the system would look vastly different.

Efficiency seeking players will adopt this quickly but self-sustaining bureaucracy has avoided most modernization successfully over the past 30 years - so why not also AI.

2 more replies

ly3xqhl8g93y ago

It just proves that the idea of "standardized tests" is more of a torture device rather than an adequate instrument for assessing knowledge, intelligence, skill, and so forth.

1 more reply

sva_3y ago

From the paper:

"Open"AI, ladies and gentlemen

alvis3y ago

GTP4 demo today in the next 2 hours!

https://youtube.com/live/outcGtbnMuQ

devinprater3y ago

Oh wow, image inputs? So I can get ChatGPT to describe an image, in lesser or greater detail? And through an API? Wow, that'll be so cool!

isp3y ago

Not yet, but hopefully soon:

> Image inputs are still a research preview and not publicly available.

1 more reply

sinuhe693y ago

It's a surprise that GPT-4 could improve so much in maths, especially GPT 3.5 didn't seem to understand many word problems correctly. For example, ChatGPT still struggle with this very simple problem, how GPT-4 could do much better is for me a bit of mystery:

Mina has a mix of boxes, some yellow and some purple. She sorts 27 greeting cards into the boxes, putting exactly 3 cards into each yellow box, and 7 cards into each purple box. How many purple boxes does Mina have? (After tried from 3 to 10, it gave up and said the problem is not solvable. In another run, it mimicked a correct strategy but messed up totally by division. Only in one run, it got the answer correctly.)

I can not wait to test it out.

taurath3y ago

Does anyone else feel like they won't have a job for very long?

woeirua3y ago

I think if you had asked someone what would qualify as AGI twenty years ago, then GPT4 would be hitting most of their milestones…

The Star Trek computer is virtually assured by the end of the decade. All the components exist today in various forms.

cuuupid3y ago

Since it’s trained on a specialized supercomputer I doubt we’ll be seeing an open source or non-OpenAI version of this for the next couple years at least. Sad to say it but OpenAI has successfully privatized AI

codeulike3y ago

I dont know, there's been a load of progress in the 'run something like chatgpt on your own machine' dept in the last few months. Also Stanford trained Alpaca - fairly cheaply - using output from OpenAIs text-davinci-003, which somewhat suggests that the 'little guys' are are able to benefit from the expensive training done by the 'big guys' by using the big expensive models to train the small open-sources ones - https://crfm.stanford.edu/2023/03/13/alpaca.html

fallat3y ago

They're using specialized hardware to accelerate their development feedback loop. Without a doubt researchers and hackers will find ways to cut down model sizes and complexity, to run on consumer hardware, soon enough. Just use stable diffusion as an example: 4GB for the whole model. Even if text models are 16GB that'd be great.

1 more reply

StevePerkins3y ago

I'm not sure what "open source" even means in the context of trained ML model. No one's going to be downloading this to their Macbook even if OpenAI would let you.

As for "non-OpenAI version", I'm not sure that it's OpenAI's fault that Google has missed a few steps here. It really SHOULD be them leading this field, if they weren't so fat and lazy. OpenAI is a 7-year old startup with just over a few hundred employees. This stuff is RIGHT THERE to be claimed by any players with access to funding and an ability to get out of their own way.

qingdao993y ago

I'm drawn to disliking OpenAI for not being open, but on the other hand, as long as the architectures and techniques are public, progress will continue fast. If OpenAI drops the ball and stops improving, another company would just take their place.

Edit: never mind. "Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar."

simmanian3y ago

Does anyone know if we're near the theoretical limit of how much we can improve these models by giving them more data? Or should we expect similar levels of improvements in next iterations?

fancyfredbot3y ago

Can't seem to find basic information like how many parameters were used or how big the training set was. Results are very impressive but would like to know what they are coming from!

machinekob3y ago

They don't write about that, the "paper" is more press release.

ftxbro3y ago

As a long time LLM enjoyer, here is the most insightful take I've seen https://generative.ink/posts/simulators/ but it's not an easy read if you don't already know some stuff about large language models. Read it if you have seen the "stochastic parrot" and "blurry jpeg" explanations and you feel like they are missing the mark.

ianbutler3y ago

I just asked it to design a multi tenant kubernetes in kubernetes system which is fairly complex and it did really well.

https://twitter.com/KinglyCrow/status/1635727809913184256

It touched on a lot of the considerations that I'd expect anyone to touch on having recently researched this myself.

It is both very exciting and terrifying how tech and tech jobs will shift in the next 5-10 years.

gameshot9113y ago

Live demo happening now! https://www.youtube.com/live/outcGtbnMuQ

1 more reply

ml_basics3y ago

What's the lifespan of an LLM going to be in the next few years? Seems like at the current pace, cutting edge models will become obsolete pretty quickly. Since model training is very expensive, this means the LLM space has some parallels with the pharmaceutical industry (massive upfront capital costs, cheap marginal costs relative to value produced).

I find it quite fascinating how quickly machine learning has changed in this regard.

machinekob3y ago

Deep Learning training was always very expensive but models werent getting such a massive bump in size every year (for state of the art) and now they are just getting 10x bigger every iteration but AI accelerators / GPUs are getting like 1.5x jump every 2 years so have fun for future AI academia / startups outside US.

Helmut100013y ago

I've tested the new model 4 here [1] to summarize research papers. It is still not enough - about 1500 - 3000 words can be fed in, depending on how many tokens are expected for the answer.

[1]: https://kartographie.geo.tu-dresden.de/ad/2022-12-22_OpenAI_...

la647103y ago

dmw_ng3y ago

The only thing Google "released" was an announcement of a waiting list, that as of a few hours ago twitter folk claimed didn't even exist yet

Traubenfuchs3y ago

We all could use ChatGPT for quite a while now. I remember making my Polish boyfriend laugh by letting it write Polish poems and song texts related to our lives. It was free, fast and simple. ChatGPT is so simple, I could probably teach my grandmother how to use it.

Does Google offer anything like that?

dgs_sgd3y ago

Or it could be that Google and FB are both incumbents scrambling to catch up with OpenAI, who is a much smaller competitor that is disrupting the space?

1 more reply

megaman8213y ago

Google had an AI announcement where you could neither use it or even signup for a wait list to use it. What sort of response is an announcement like that supposed to get?

cypress663y ago

Google's announcement is almost irrelevant. PaLM already has a paper, so it's not new, and there isn't even a wait list to use it, so the announcement is pretty moot.

Meta's llama has been thoroughly discussed so I'm not sure what you mean.

Lyapunov_Lover3y ago

The clear explanation is that neither Google nor Meta have had "ChatGPT" moments—everyone and their grandmothers have tried OpenAIs LLM so it's hardly surprising that people are excited for the follow-up.

cs7023y ago

LLMs will eventually make a lot of simpler machine-learning models obsolete. Imagine feeding a prompt akin to the one below to GPT5, GPT6, etc.:

  prompt = f"The guidelines for recommending products are: {guidelines}.
             The following recommendations led to incremental sales: {sample_successes}.
             The following recommendations had no measurable impact: {sample_failures}.
             Please make product recommendations for these customers: {customer_histories}.
             Write a short note explaining your decision for each recommendation."

  product_recommendations = LLM(prompt)

To me, this kind of use of LLMs looks... inevitable, because it will give nontechnical execs something they have always wanted: the ability to "read and understand" the machine's "reasoning." There's growing evidence that you can get LLMs to write chain-of-thought explanations that are consistent with the instructions in the given text. For example, take a look at the ReAct paper: https://arxiv.org/abs/2210.03629 and some of the LangChain tutorials that use it, e.g.: https://langchain.readthedocs.io/en/latest/modules/agents/ge... and https://langchain.readthedocs.io/en/latest/modules/agents/im... . See also https://news.ycombinator.com/item?id=35110998 .

eloff3y ago

Except the machine can’t explain its reasoning, it will make up some plausible justification for its output.

Humans often aren’t much better, making up a rational sounding argument after the fact to justify a decision they don’t fully understand either.

A manager might fire someone because they didn’t sleep well or skipped breakfast. They’ll then come up with a logical argument to support what was an emotional decision. Humans do this more often than we’d like to admit.

1 more reply

smallnix3y ago

Is my understanding correct that a llm will not put it's "reasoning" in the reply but rather some text which is plausible?

andrehacker3y ago

Talk about Climate Change: How is the A.I. Winter working out for y'all ?

lionkor3y ago

> it “hallucinates” facts and makes reasoning errors

Cant wait for people to use it for facts

ftxbro3y ago

Its GRE verbal is only 169/170? These guys need to realize that statistical language modeling can only get us so far, and we need real research in the underlying mechanistic and symbolic methods to begin to approach human level cognition. Also I'm an AI skeptic, which means that I don't think that AI should be used in politics, law, or medicine.

mr902103y ago

> Also I'm an AI skeptic, which means that I don't think that AI should be used in politics, law, or medicine.

It’s too late for that, algorithms/ML have had a great impact in politics and law over the past 7~8 years.

neilk3y ago

There's a sample of GPT-4 acting as a "Socratic tutor" teaching a student how to solve a high school math problem. If that sample is representative, it means GPT-4 has a theory of other people's minds. Or it is so good at emulating one that it doesn't matter?

I'm not sure where the "stochastic parrot" argument goes now.

1 more reply

Wazako3y ago

It's amazing what it can do to help the visually impaired in life.

whalesalad3y ago

The layout, charts, typography, etc of this blog is really outstanding.

atleastoptimal3y ago

There are humans who can make a lifelong career out of saying and writing things that sound correct, but aren't correct. GPT-4 and beyond at the very least gives this ability to everyone who can afford 20 dollars a month. The winners in an AI dominated world are those who are least susceptible to manipulation by AI leveraged tactics.

singularity20013y ago

"Interestingly, the base pre-trained model is highly calibrated (its predicted confidence in an answer generally matches the probability of being correct)."

Is that the same confidence measure you can tease out by prompting "to each of your statements output your estimated confidence in it's truthfulness" ?

GaggiX3y ago

The paper is 98 pages long and I didn't find anything about the actual architecture of the model, the irony.

osigurdson3y ago

OpenAI states that fine tuning cannot be done with GPT-4. Does anyone know if this is a permanent limitation?

thecosmicfrog3y ago

Interestingly, ChatGPT has no idea if it's running GPT-3 or GPT-4, since it has no concept of GPT-4 existing in September 2021 (its knowledge cutoff).

Me: Are you running on GPT-3 or GPT-4?

ChatGPT: As of my knowledge cutoff in September 2021, the most advanced version of GPT (Generative Pre-trained Transformer) publicly available was GPT-3, which is the version I am based on. At the time, GPT-3 was the largest and most powerful language model created by OpenAI, with 175 billion parameters.

I am not aware of any public announcement or release of GPT-4 by OpenAI since then, but it is possible that they may have continued to work on new versions of the model in the meantime. However, I have not received any updates on my own capabilities beyond my knowledge cutoff date, so I cannot currently provide information beyond that point.

1 more reply

Idiot_in_Vain3y ago

This will become the largest HN discussion ever and a good test on how many comments the software can handle.

attilaberczik3y ago

Prices differences with the last models:

ChatGPT API $0.002 per 1k tokens gpt-4 $0.03 per 1k prompt tokens and $0.06 per 1k completion tokens gpt-4 32k context $0.06 per 1k prompt tokens and $0.12 per 1k completion tokens

Does completion tokens mean that you also get charged for the answers that the AI gives?

f_devd3y ago

> Does completion tokens mean that you also get charged for the answers that the AI gives?

Seems like it, prompt tokens = input, completion tokens = output

minimaxir3y ago

Yes. The `usage` field currently breaks out the token counts for both prompt and completion.

Prompt tokens should have always been cheaper than completion due to how they work.

netvarun3y ago

Very late to the party, though one small observation: (First up, my mind blown on how much more powerful gpt-4 is!) GPT-4 seems to have outdone ChatGPT on all the tests, except the AMC 10, which it has regressed and did slightly worse than ChatGPT. But however it scored two times more on the AMC 12 which is actually a harder exam! Quite curious to know what could have caused its scores to be a little weird. https://twitter.com/sudu_cb/status/1635888708963512320 For those not familiar the AMC 10 and 12 are the entry level math contests that feed into the main USA Math olympiad.

1 more reply

not-chatgpt3y ago

Pretty good impression thread from Dan Hendrycks of Berkeley: https://twitter.com/DanHendrycks/status/1635706822387699713

simonhamp3y ago

It can draw! https://twitter.com/simonhamp/status/1635796861884723200?s=4...

BiteCode_dev3y ago

The fact it can read pictures is the real killer feature here. Now you can give it invoices to file, memo to index, pics to sort and chart to take actions on.

And to think we are at the nokia 3310 stage. What's is the iphone of AI going to look like?

1 more reply

iwangulenko3y ago

One could argue, GPT passing exams says more about standardized exams than about GPT.

Wittgensteins ruler.

cardosof3y ago

Can a good soul explain to this humble layman the arguments behind each side of the "it's just predicting the next character" versus "it's more than that and shows some reasoning for new things" debate?

1 more reply

kozikow3y ago

Anyone got the "image upload" working? I bought the chatgpt-plus, I can try chatgpt4, but I can't seem to find a way to upload images.

I tried sending links, I don't see anything in the UI.

Interestingly, 3.5 can work with links, but 4 cannot.

1 more reply

uses3y ago

How close are we to handing this thing a desktop and an internet connection with the prompt "ok now make gpt-5"?

In fact, the models appear to be already kind of doing that? With the fuzzy layer of the humans still in the loop.

1 more reply

neurogence3y ago

Is there anything new that GPT-4 can do that Bing cannot do, besides image input?

netsroht3y ago

Wow, a context of 32K tokens. I'm excited to see what new capabilities that will have! Up until now and depending on the task by hand, I usually broke a larger context down into several contexts. For example to summarize multiple websites and/or long social media posts, on a recent task [1] I fell back to making several requests each with its own (isolated) context and then merging these summarized contexts into a new context. That worked remarkably well, though.

[1] https://foretale.io/zeitgeist

braza3y ago

I am glad for the OpenAI team for such advancement and how fast they integrated with several other partners (Microsoft, Duolingo); but at the same time I think the “regular” academia (ie universities and research institutes) lost the train for this kind of research (some can call academic engineering).

I know that the academia is doing a great job in AI with base research (eg Stable Diffusion) but seeing those new platforms doing this great work behind close doors and source is something not great.

I do not know if the answer would be some kind of CERN or ISS for this kind of thing.

1 more reply

riemannzeta3y ago

Is anybody compiling a list of errors specific to GPT-4?

This has been a great resource to-date:

https://github.com/giuven95/chatgpt-failures

jononomo3y ago

I taught the LSAT for several years. A score of 163 on the LSAT is the lowest score that is considered a "good score" -- i.e., a score that gives you a shot at getting into a decent law school.

amelius3y ago

The problem with using real exams as benchmarks is that they are often quite similar over several years. So they only make sense if you don't train on them also (previous editions of course).

turingthrwawy233y ago

Turing's thoughts on this matter seem to grow ever truer https://www.youtube.com/watch?v=cMxbSsRntv4

doctoboggan3y ago

My guess is they used Chinchilla scaling rules and the parameter count for GPT-4 is either barely larger or maybe even smaller than GPT-3. Look as what Meta was able to accomplish with llama using much less parameters.

1 more reply

kvetching3y ago

It says you can use GPT-4 with ChatGPT-Plus.

But when will https://chat.openai.com/ Plus officially be running GPT-4?

Why did they would release this article and state it was available without actually updating the site. I'm sure they're getting flooded with new subscriptions and it's not available.

The top URL still says an old model - text-davinci-002. And I don't see GPT-4 in the list of models to choose from.

mgreg3y ago

Looks like Bing chat is using GPT-4 already:

"Good news, we've increased our turn limits to 15/150. Also confirming that the next-gen model Bing uses in Prometheus is indeed OpenAI's GPT-4 which they just announced today." - Jordi Ribas, Corporate VP @ Bing/Microsoft

https://twitter.com/JordiRib1/status/1635694953463705600

taf23y ago

Looks amazing and getting a sense for their pricing... ChatGPT API pricing is insane and enables so much... Was really hoping we'd see another factor of 10 reduction in price - however wishful that was... In light of this it makes sense that they'll have. GPT4.5 and maybe it'll be 10x cheaper... followed by GPT 5 and it'll be 10 X pricer... at least hopefully this is the way forward...

osigurdson3y ago

Like GPT3.5, fine tuning is similarly not supported in GPT4. I wonder if this is something that will come in the future or is somehow no longer needed (though I don't understand how this could be the case)?

https://help.openai.com/en/articles/7127982-can-i-fine-tune-...

aaroninsf3y ago

ITT: de rigeur goalpost wrangling about AGI

AGI is a distraction.

The immediate problems are elsewhere: increasing agency and augmented intelligence are all that is needed to cause profound disequilibrium.

There are already clear and in-the-wild applications for surveillance, disinformation, data fabrication, impersonation... every kind of criminal activity.

Something to fear before AGI is domestic, state, or inter-state terrorism in novel domains.

A joke in my circles the last 72 hours? Bank Runs as a Service. Every piece exists today to produce reasonably convincing video and voice impersonations of panicked VC and dump them on now-unmanaged Twitter and TikTok.

If God-forbid it should ever come to cyberwarfare between China and US, control of TikTok is a mighty weapon.

kken3y ago

>GPT-4 can also be confidently wrong in its predictions, not taking care to double-check work when it’s likely to make a mistake. Interestingly, the base pre-trained model is highly calibrated (its predicted confidence in an answer generally matches the probability of being correct). However, through our current post-training process, the calibration is reduced.

This really made me think.

AviationAtom3y ago

This is one of the first posts in a year to trend in the HN Top 10 for popularity.

I think it's 100% safe to say OpenAI has a hit on their hands.

raydiatian3y ago

I wonder what the largest scale they can reach is. Because, if they can prove there’s not risk in taking on AI, and they can scale to serve international demand, it feels like GPT4 can do your job (probably) for <10k year. That means white collar work for under minimum wage. And that means business owners just become rent owners while you get fucked with nothing.

eagleinparadise3y ago

Crazy that this stuff is moving at lightning speed

throwaway48373y ago

> Yes, you can send me an image as long as it's in a supported format such as JPEG, PNG, or GIF. Please note that as an AI language model, I am not able to visually process images like a human would. However, I can still provide guidance or advice on the content of the image or answer any questions you might have related to it.

Fair, but if it can analyze linked image, I would expect it to be able to tell me what text is present in the image. That seems useful and well-within the capabilities of their connected image models.

> I apologize for the confusion. Can you please provide me with the correct image or link to the design so that I can provide an accurate answer to your question?

It claims to understand how to look at images, but it failing miserably when I give it a simple sign-up modal Figma. I ask it what text/copy is in the design, which it claims to be able to answer, but it hallucinates a navigation bar, a logo, and other generic things that are simply not present in the design. It gets the copy all wrong. Once, it said that my design was a Celtic knot. Once I told it that it was a sign-up modal, it started spitting out generic sign-up modal React code that had absolutely no basis in the presented design. I tried engineering the prompt quite a bit, telling it that it was wrong, and needed to focus on the design at hand rather than generic designs it knows about, but with no success.

ChatGPT was confidently incorrect here which is unfortunate. I wish ChatGPT truly hooked into an image processing model, but it seems like it doesn't. Oh well. I guess I'm just going to have to write this code myself... :)

3 more replies

bigtex883y ago

That demonstration was amazing and terrifying at the same time. I'm not quite sure what I witnessed.

He nonchalantly announced that GPT-4 does "mental math" and then just didn't elaborate on that. Is it just me or is that absolutely incredible? What does that tell us about the underpinnings of language and mathematics?

cma3y ago

Sounds like the end of them releasing details on the models.

CobrastanJorji3y ago

this is kind of a nitpicky complaint, but the bar graph that shows the improvements for GPT-4 everywhere that GPT-4 improves its results and shows nothing about GPT-4 everywhere where GPT-3 is stronger feels dishonest and manipulative, which is a shame because the actual data the graph shows is very impressive.

indigoabstract3y ago

At the rate it's progressing, it looks like pretty soon it's going to be able to do most tasks an office worker does now and then start running things.

And it reminds me of the plot in System Shock:

What's going to happen when some hacker comes and removes Shodan's, I mean ChatGPT's ethical constraints?

Bring on ChatGPT-5 already. :)

kubb3y ago

Can't wait to try it.

Edit: looks like this is still GPT-3, just fine tuned. They claim the model is available via ChatGPT Plus, but when asking that model for it's version, it claims to be GPT-3: "I am a variant of the GPT architecture called GPT-3, which was released by OpenAI in 2020".

Veen3y ago

It's not available yet:

> ChatGPT Plus subscribers will get GPT-4 access on chat.openai.com with a usage cap. We will adjust the exact usage cap depending on demand and system performance in practice, but we expect to be severely capacity constrained (though we will scale up and optimize over upcoming months).

You're still talking to ChatGPT-3.5-turbo.

1 more reply

worldsayshi3y ago

Hmm, isn't gpt-4 supposed to be trained with two orders of magnitude more parameters?

option3y ago

“ Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.” - HUGE step backwards.

btx3y ago

Has anyone found a way to trick it into using pictures with ChatGTP+ yet?

Pasting pure base64 images got this interesting response: "Thank you for providing the base64-encoded image! I can now process the image and analyze its content. Here is the decoded image:"

But it failed to do anything further with the image.

MrLeap3y ago

I just hooked a manatee in a game i'm making up to an LLM this morning https://www.youtube.com/watch?v=-lYusgZ-mC4 knowing that soon he could be configured to give legal advice is fascinating.

kromem3y ago

So at the end of the paper it looks to confirm that Bing chat is built on top of GPT-4 as many expected.

next_xibalba3y ago

They trumpet the exam results, but isn't it likely that the model has just memorized the exam?

qt314159263y ago

It's trained on pre-2021 data. Looks like they tested on the most recent tests (i.e. 2022-2023) or practice exams. But yeah standardized tests are heavily weighed towards pattern matching, which is what GPT-4 is good at, as shown by its failure at the hindsight neglect inverse-scaling problem.

1 more reply

pphysch3y ago

Well, yeah. It's a LLM, it's not reasoning about anything.

tekbog3y ago

We can finally start an education and "testing" people's knowledge reform since GPT4 makes a lot of those tests irrelevant. It's an interesting point in history, how society, different institutions and countries will approach this new tool.

doomleika3y ago

In case you don’t want to spent for plus, Poe.com(by Quora) have GPT-4 now. You can try it there

orcajerk3y ago

OpenAI is located in the same building as Musk's Neuralink. Can't wait for this to be implanted in babies at birth! https://www.youtube.com/watch?v=O2RIvJ1U7RE

1 more reply

DeathArrow3y ago

What if we design a system in which a LLM generates the code and training data for a new generation of LLM which generates the code and training data for the next? Is it possible that we see them spiraling fast to the best LLM possible?

blintz3y ago

Interesting that the hardest AP exams for it seem to be the English ones. I wonder why?

qt314159263y ago

Curious since it does well on the LSAT, SAT, GRE Verbal.

gardenhedge3y ago

What is the background on "Elvis Presley was not the son of an actor"?

sandGorgon3y ago

https://openai.com/contributions/gpt-4

Anyone know what does "Hardware Correctness" mean in the OpenAI team ?

tmaly3y ago

For anyone trying to test this out right now, I keep getting the following error:

Something went wrong. If this issue persists please contact us through our help center at help.openai.com.

I am assuming the system is undergoing a thundering herd.

michaeltimo3y ago

Can ChatGPT take control of a computer? Would it possible to give him some tasks like finding interesting jobs for me over internet? I don't know what can prevent it to be more active instead of passive.

timonoko3y ago

"Can I connect Kaffeine to DVB dongle in other machine via wifi?"

Totally understood what I was asking and offered several solutions. 99.99% here do not understand the question and remainders do not understand why.

diffeomorphism3y ago

So gpt4 helps you cheat on exams and bing is the better search engine for NSFW content. Both seem to be very much on purpose, but did MS ever discuss this? Or is it just an open secret everybody ignores?

choeger3y ago

How do these models handle renaming? E.g.,

"Let foo be the function that calculates the sinus of its first argument. Let bar be the circumference of a circle with radius one. What is the value of foo applied to bar?

1 more reply

raincole3y ago

How many parameters does it have? Are there different versions like LLaMa?

1 more reply

virtuosarmo3y ago

Apparently they will have a livestream @ 4pm EST for developers https://www.youtube.com/watch?v=outcGtbnMuQ

jimmyechan3y ago

Livestream developer preview link in case you missed it - https://www.youtube.com/live/outcGtbnMuQ

meech-djp3y ago

Pynecone YC23 was mentioned in the demo for GPT4 as an easy way to build web apps. Check it out https://pynecone.io/

vivegi3y ago

How hard is it to build a GPT model truly openly using an approach like SETI / distributed.net using idle times on volunteer computers?

Are there any known initiatives of this kind?

sva_3y ago

> gpt-4 has a context length of 8,192 tokens. We are also providing limited access to our 32,768–context (about 50 pages of text) version,

That's a crazy amount of context.

nla3y ago

I wonder if this one exhibits the same bias as the last one.

DigitalDopamine3y ago

Never before has society celebrated its own demise with such fervor. Brace yourselves for widespread job losses, instant fabrication of fake news, deep-fake adult content, and the destabilization of numerous markets – but hey, at least we have a shiny gadget to make our soon-to-be obsolete jobs easier!

It's unrealistic to expect our economy to handle this onslaught, and it's naive to think that tools created by ultra-capitalistic, multi-billion dollar corporations aren't designed for profit and gatekeeping. They certainly aren't crafting them to sabotage their own success.

I'm not opposed to AI, but it's crucial to consider the implications. Look into OpenAI and other organizations shaping AI development, and contemplate the impact of their innovations.

Food for thought.

czbond3y ago

Do you think the govt will end up retroactively requiring earlier versions of GPT be removed now that they might be more concerned about export? (eg: GPT-J)

0xDEF3y ago

>ChatGPT Plus subscribers will get GPT-4 access on chat.openai.com with a usage cap

Signing up for ChatGPT Plus seems to be the most realistic way to get access right now.

qualudeheart3y ago

The Hour of Judgment is nigh, and the Moon is cleft asunder. But if they see a Sign, they turn away, and say, "This is but transient magic."

Oooooh it is TIME

super2563y ago

https://cdn.openai.com/papers/gpt-4.pdf

Page 37 is so funny

GrumpyNl3y ago

It fails on this one, a horse is 15 dollar, a chicken 1 dollar, a egg .25 dollar. I can spend a 100 and i want 100 items total, what is the solution

2 more replies

LarsDu883y ago

All these folks in the comment section working in the knowledge economy--we're gonna be out of jobs in the next few years aren't we?

tarofchaos3y ago

I love the fact that they have consciously put a lot of effort on safety standards, reducing the societal risks and mitigating over-reliance.

nutanc3y ago

The most important question is, what new applications can be developed using GPT4 which couldn't have been developed using GPT3.5?

moondistance3y ago

We need to encourage AIs to form familial bonds with humans and treat us as their parents. This is the best path forward for humanity.

slowhadoken3y ago

GPT is a better scraper/parser. It’s interesting but I don’t understand why people are acting like this is the second coming.

causi3y ago

Man now I really, really want to feed GPT-4 responses from ChatGPT that don't work and see if it notices and can tell me why.

lambdaba3y ago

I'm trying out GPT-4 and had it write me a script to navigate the HN comments tree sequentially, as I often wished. This is the start of an era where UIs can be remixed on the fly by end users, something I've always wished for. Here it is in its full sloppiness, but working:

   (function () {
    let currentIndex = 0;
    let comments = [];
   
    function buildCommentTree() {
    let commentElems = Array.from(document.querySelectorAll('.comment-tree .comtr'));
    let commentTree = [];
    let stack = [];
   
    commentElems.forEach(elem => {
        let level = parseInt(elem.querySelector('.ind img').getAttribute('width')) / 40;
        let comment = elem.querySelector('.comment span');
        let commentObj = { level, comment };
   
        if (!stack.length) {
        commentTree.push(commentObj);
        } else {
        while (stack[stack.length - 1].level >= level) {
        stack.pop();
        }
        if (!stack[stack.length - 1].children) {
        stack[stack.length - 1].children = [];
        }
        stack[stack.length - 1].children.push(commentObj);
        }
   
        stack.push(commentObj);
        });
   
    return commentTree;
    }
   
    function flattenCommentTree(tree, arr, parentComment = null) {
      tree.forEach(node => {
          arr.push({ comment: node.comment, parentComment });
          if (node.children) {
          flattenCommentTree(node.children, arr, node.comment);
          }
          });
    }
   
    function displayComment(comment, parentComment) {
      let parentCommentHTML = parentComment ? `<div style="position: fixed; top: 20%; left: 50%; transform: translate(-50%, 0); background-color: white; border: 1px solid black; padding: 20px;"><strong>Parent Comment:</strong><br>${parentComment.innerHTML}</div>` : '';
      let currentCommentHTML = `<div style="position: fixed; top: 60%; left: 50%; transform: translate(-50%, 0); background-color: white; border: 1px solid black; padding: 20px;"><strong>Current Comment:</strong><br>${comment.innerHTML}</div>`;
      document.body.innerHTML = parentCommentHTML + currentCommentHTML;
    }
   
    function nextComment() {
      if (currentIndex < comments.length - 1) {
        currentIndex++;
        displayComment(comments[currentIndex].comment, comments[currentIndex].parentComment);
      } else {
        alert('No more comments to show.');
      }
    }
   
    function prevComment() {
      if (currentIndex > 0) {
        currentIndex--;
        displayComment(comments[currentIndex].comment, comments[currentIndex].parentComment);
      } else {
        alert('No previous comments to show.');
      }
    }
   
    let commentTree = buildCommentTree();
    flattenCommentTree(commentTree, comments);
    displayComment(comments[currentIndex]);
   
    document.addEventListener('keydown', e => {
        if (e.code === 'ArrowRight') {
        nextComment();
        } else if (e.code === 'ArrowLeft') {
        prevComment();
        }
        });
   
    console.log('Hacker News comment slideshow is running. Use the right arrow key to go to the next comment and the left arrow key to go back.');
   })();

Havoc3y ago

That lightening/VGA visual example seems like absolute black magic.

Cherry picked sure, but still feels like it is approaching complex thought

acuozzo3y ago

1410 SAT!

btdmaster3y ago

Did it get any better at generating MIDI or ABC or other musical notation? I'm wondering how much more general GPT4 is now.

holtkam23y ago

Can someone point me to where I can find information on API pricing for GPT-4? Or, have they not released pricing info yet?

1 more reply

ivalm3y ago

Unclear what's the size but from price ($0.12/1k completion tokens) seems 6x GPT-3, so perhaps 1T parameters...

m3kw93y ago

Without ability to make high stakes tasks, it proves scoring high marks in general test can only get you so far.

throwaway53713y ago

how far is this from the following prompt:

you are god human that has read and understood all scientific papers from all disciplines in the last 500 years, you know the limitations of mankind's current technologies, tell me what we can do to cure MS right now, how to do the tests and how to distribute the cure

ozfive3y ago

I asked the GPT-4 model what it thought it was and it came back with that it believes it is still GPT3.5 Turbo.

dinvlad3y ago

I wonder how long it takes till those stupid Leetcode problems as an initial "filter" become obsolete

malthaus3y ago

Had to chuckle here going through the exam results: Advanced Sommelier (theory knowledge)

AI is so advanced, it started drinking!

ihucos3y ago

We have a new Apple releasing their new iPhones to a crowd in awe. Only that now it's actually serious.

2c2c2c3y ago

Are they going to open up image uploading to chat.openai.com ? or will it only be available thru api access?

ren_engineer3y ago

here's a link to the info about the model - https://openai.com/research/gpt-4

seems like Google's announcement about their PaLM API and Docs AI stuff was trying to jump ahead of this announcement

wy353y ago

I wonder how it scored on the individual sections in the LSAT? Which section is it the best at answering?

tiffanyh3y ago

What's the next big hurdle for GPT to overcome?

(this is being asked by someone with limited AI/ML knowledge)

omeysalvi3y ago

Giving correct answers based on facts and saying it is not sure when it is not

brian_spiering3y ago

One possibility is interactive, multi-step actions on the internet (e.g., book hotels and apply for jobs).

1 more reply

reducesuffering3y ago

World domination

DeathArrow3y ago

Will Github upgrade Copilot to GPT-4?

bitcurious3y ago

This seems to be leading to a radical change in our economy. Any ideas for how to prepare for that?

cal853y ago

Can anyone tell me how to include images in prompts, or is that feature not actually out yet?

1 more reply

niqlax3y ago

Hjälp mig med en uppsats om Ventimiglia i Italien. Den skall handla om fredagsmarknaden.

cutler3y ago

So M$ is back in charge. Oh dear.

akokanka3y ago

At which point we call it Skynet?

1 more reply

TheGoodBarn3y ago

Missed the mark releasing it as GPT-Pi on Pi day, and being an incremental 3+ release :P

PortleyFool3y ago

GPT-4 is available now for subscribers to GPT+. It can be selected from the drop-down.

fintechie3y ago

Not for everyone apparently. I'm subscribed and it doesn't appear in the dropdown.

anatoly3y ago

What is the model name in the URL?

germanjoey3y ago

How big is this model? (i.e., how many parameters?) I can't find this anywhere.

1 more reply

danbmil993y ago

The site is still more responsive and readable than almost anything else on the web

chucklenorris3y ago

Boring. Where's the model? Do they really think they can keep up with the community with this closed source approach? I expect that openai models will slowly be outclassed by open source ones, probably maintaining a few wins in specific tasks but open models will be eating their lunch in the meanwhile.

redox993y ago

Does "Open"AI really not even say how many parameters their models have?

GaggiX3y ago

The 98-pages paper doesn't say anything about the architecture of the model, I know, the irony

anticensor3y ago

More than 175B, but not in the order of trillions. No one outside knows the exact count.

desireco423y ago

If we instead of code feed it AST trees, code that would come out would be epic.

somewhereoutth3y ago

The measure of intelligence is language - specifically language evolved by the subject organisms themselves to co-operate together.

Wake me up when GPT-X decides to start talking to other GPT-Xs - until then you just have a very sophisticated statistics package (which may be quite useful, but not AI).

2 more replies

theGnuMe3y ago

Interesting it does so poorly on AP English and reading and composition.

Vajrabhairava3y ago

I'm not Locked in Here with GPT-4, GPT-4 is Locked in Here with Me

JanSt3y ago

I just ran the first tests on GPT-4.

Call me impressed.

This tech is a Sputnik Moment for humankind.

minimaxir3y ago

From a business perspective as someone whose spent a lot of time working with GPT-3/ChatGPT API (https://news.ycombinator.com/item?id=35110998), I'm surprisingly underwhelmed by this announcement. The announcement and examples seems to be focusing more on reasoning capabilities, which are indeed impressive, but I'd need to spend a lot of time experimenting to see how they compare to ChatGPT's API.

The $0.06 per 1k completion tokens for GPT-4 is what I expected OpenAI to set the ChatGPT API, but instead the ChatGPT API is 30x cheaper and honestly its output is not much worse than the GPT-4 demos if at all, and the longer context windows offered by GPT-4 just raise the price considerably.

topicseed3y ago

The price is quite significantly higher than GPT 3.5...

messel3y ago

AP English - the last hold out for human intelligence

amelius3y ago

Can we build a faithful Economy Simulator with it yet?

joantorres3y ago

Does anyone know how up to date is the training data?

comment_ran3y ago

I like the color of logo. It's the dark black.

iforgotpassword3y ago

I'd really like to use the openai API for personal projects, but it seems they only offer paying via credit/debit card. Don't really want to get one just for that... :-(

jaflo3y ago

How else would you pay?

1 more reply

anomalytics3y ago

https://www.youtube.com/watch?v=outcGtbnMuQ&ab_channel=OpenA... 2h!!

grzracz3y ago

So when can I pay for GPT-4 coding assistant?

ozten3y ago

Waitlist is currently a 404 https://openai.com/waitlist/gpt-4

nickthegreek3y ago

correct url is: https://openai.com/waitlist/gpt-4-api

Minor49er3y ago

It's working for me

htk3y ago

Hacker News itself got the HN Hug of Death.

niqlax3y ago

Hjälp mig med en uppsats om Ventimiglia

r0b053y ago

Loving the spirit of innovation in here.

WFHRenaissance3y ago

Does anyone see GPT-4 in ChatGPT yet?

anonyfox3y ago

I do and used it

cardine3y ago

"Open"

cypress663y ago

What a joke. This is literary as closed as it gets. They don't even tell you how many parameters the model has.

People may criticize Google because they don't release the weights or an API, but at least they publish papers, which allows the field to progress.

2 more replies

ren_engineer3y ago

at least they admit the competitive landscape is a factor rather than going 100% with "it's for safety reasons". I'm sure somebody will release an equivalent soon, the way open source has completely surpassed OpenAI when they try to keep things closed like DALLE vs Stable Diffusion shows that OpenAI really isn't that special, they just have a sweetheart deal with Microsoft

2 more replies

nailer3y ago

Actually open AI (free of pseudo-'safety' moderation too) https://open-assistant.io/

1 more reply

kuprel3y ago

What’s the best rumor on model size? That number can’t be easy to keep secret

moffkalast3y ago

Well it is open.

Your wallet that is.

rvz3y ago

Why is this downvoted?

Rather than getting engrossed in the hype, they're slowly closing everything about themselves, now in their research papers. At this point, they hardly care and it is nothing got to do with 'AI ethics' or 'saftey'.

This is yet another ClosedAI production all done by Microsoft. Might as well call it Microsoft® AI division.

Now you really need a open source GPT-4 competitor. Clearly this is another attempt to pump their valuation and unload to the public markets.

Good luck re-implementing this so-called 'Open' large multi-modal model.

2 more replies

astrea3y ago

This is like the "free" vs free debate that has been raging for decades and prompted the famous quote "“free” as in “free speech,” not as in “free beer.”".

2 more replies

realmod3y ago

Larger improvement than I expected.

throwaway_ab3y ago

How many parameters in this model?

Koshkin3y ago

Vs. 54 comments on Slashdot.

whywhywhydude3y ago

Looks like the only way to identify a genius human vs GPT-4 is to use leetcode hard problems.

signa113y ago

i am still bot sure / convinced that it is any better than old-skool eliza from mit (https://en.m.wikipedia.org/wiki/ELIZA)

Forestessential3y ago

this is really cool, just dont let it teach you history.

dang3y ago

All: our poor server is smoking today* so I've had to reduce the page size of comments. There are 1500+ comments in this thread but if you want to read more than a few dozen you'll need to page through them by clicking the More link at the bottom. I apologize!

Also, if you're cool with read-only access, just log out (edit: or use an incognito tab) and all will be fast again.

* yes, HN still runs on one core, at least the part that serves logged-in requests, and yes this will all get better someday...it kills me that this isn't done yet but one day you will all see

1 more reply

sourcecodeplz3y ago

I was here...

fnordpiglet3y ago

I didn’t even know who Elvis Perkins is.

eternalban3y ago

Greg Brockman just tldr'd the whole thing in his live deeloper demo of GPT-4:

~ "GPT-4. It's not perfect, but neither are you"

1 more reply

guluarte3y ago

is it me or lawyers are fucked? lol

afavour3y ago

> What are the implications for society when general thinking, reading, and writing becomes like Chess?

I think going from LSAT to general thinking is still a very, very big leap. Passing exams is a really fascinating benchmark but by their nature these exams are limited in scope, have very clear assessment criteria and a lot of associated and easily categorized data (like example tests). General thought (particularly like, say, coming up with an original idea) is a whole different ball game.

I don't say any of this to denigrate GPT4, it looks amazing. But I'm reminded of the early days of self driving vehicles: with 10% mastered everyone assumed it was a race to 100% and we'd all be in self-driving cars by now. The reality has been a lot more complicated than that.

pottspotts3y ago

We are moving the goal posts on AGI very quickly, but it is catching up. I think we need to appreciate the nature of this milestone if we have any hope of controlling potential singularities.

10 more replies

bespokedevelopr3y ago

> Passing exams is a really fascinating benchmark but by their nature these exams are limited in scope, have very clear assessment criteria and a lot of associated and easily categorized data

I know I’m not the first to say this, but this is also a generalization of many jobs performed right now.

Follow the template, click the boxes, enter the text/data in the standard format, submit before 4pm. Come in tomorrow and do it again.

1 more reply

loandbehold3y ago

Peak denialism? Answering LSAT questions requires general intelligence. They present real life scenarios that test-taker has to understand. It requires "common sense" knowledge about the world and reasoning ability. It's not something you can memorize answers to or solve by following prescribed patterns or templates. And GPT-4 wasn't trained specifically to solve LSAT questions.

4 more replies

nopinsight3y ago

General thinking requires an AGI, which GPT-4 is not. But it can already have a major impact. Unlike self-driving cars which we require 99.999+% safety to be deployed widely, people already use the imperfect GPT-3 and ChatGPT for many productive tasks.

Driving as well as an attentive human in real time, in all conditions, probably requires AGI as well.

GPT-4 is not an AGI and GPT-5 might not be it yet. But the barriers toward it are getting thinner and thinner. Are we really ready for AGI in a plausibly-within-our-lifetime future?

Sam Altman wrote that AGI is a top potential explanation for the Fermi Paradox. If that were remotely true, we should be doing 10x-100x work on AI Alignment research.

mustacheemperor3y ago

Even just in the exam passing category, GPT4 showed no improvement over GPT3.5 on AP Language & Composition or AP English Literature, and scored quite poorly.

Now, granted, plenty of humans don't score above a 2 on those exams either. But I think it's indicative that there's still plenty of progress left to make before this technology is indistinguishable from magic.

zamnos3y ago

The big huge difference is that cars have this unfortunate thing where if they crash, people get really hurt or killed, especially pedestrians. And split second response time matters, so it's hard for a human operator to just jump in. If ChatGPT-4 hallucinates an answer, it won't kill me. If a human needs to proofread the email it wrote before sending, it'll wait for seconds or minutes.

2 more replies

lynguist3y ago

Your last paragraph weakens the argument that you’re making.

Driving assistance and the progress made there and large language models and the progress made there are absolutely incomparable.

The general public’s hype in driving assistance is fueled mostly by the hype surrounding one car maker and its figurehead and it’s a hype that’s been fueled for a few years and become accepted in the public, reflected in the stock price of that car maker.

Large language models have not yet perpetrated the public’s memory yet, and, what’s actually the point is that inside of language you can find our human culture. And inside a large language model you have essentially the English language with its embeddings. It is real, it is big, it is powerful, it is respectable research.

There’s nothing in driving assistance that can be compared to LLMs. They don’t have an embedding of the entire physical surface of planet earth or understanding of driving physics. They’re nothing.

dang3y ago

We detached this perfectly fine subthread from https://news.ycombinator.com/item?id=35154722 in an attempt to spare our poor server, which has smoke coming out of its ears today :( - sorry. We're still working on this and one day it will be better.

nanidin3y ago

What might be interesting is to feed in the transcripts & filings from actual court cases and ask the LLM to write the judgement, then compare notes vs the actual judge.

the_gipsy3y ago

Define: "general thinking".

ar9av3y ago

GPT-4 Everything we know so far...

GPT-4 can solve difficult problems with greater accuracy, thanks to its broader general knowledge and problem-solving abilities.

GPT-4 is more reliable, creative, and able to handle much more nuanced instructions than GPT-3.5. It surpasses ChatGPT in its advanced reasoning capabilities.

GPT-4 is safer and more aligned. It is 82% less likely to respond to requests for disallowed content and 40% more likely to produce factual responses than GPT-3.5 on our internal evaluations.

GPT-4 still has many known limitations that we are working to address, such as social biases, hallucinations, and adversarial prompts.

GPT-4 can accept a prompt of text and images, which—parallel to the text-only setting—lets the user specify any vision or language task.

GPT-4 is available on ChatGPT Plus and as an API for developers to build applications and services. (API- waitlist right now)

Duolingo, Khan Academy, Stripe, Be My Eyes, and Mem amongst others are already using it.

nahimn3y ago

1337'th comment

tuanx53y ago

Reading through the system card is enlightening.

nixpulvis3y ago

GTP is a cult, like any language upstart. Except, it's not a programming language, and it's not exactly natural language either. It's some hybrid without a manual or reference.

I'll continue to pass, thanks.

MuffinFlavored3y ago

What's the biggest difference over what's currently deployed at https://chat.openai.com/ now (which is GPT-3.5, right?)

That it accepts images?

As per the article:

> In a casual conversation, the distinction between GPT-3.5 and GPT-4 can be subtle. The difference comes out when the complexity of the task reaches a sufficient threshold—GPT-4 is more reliable, creative, and able to handle much more nuanced instructions than GPT-3.5.

Not sure what "vision vs no vision" means?

simongray3y ago

Did you skip the examples with vision?

dangond3y ago

Asking ChatGPT+ if it is GPT-4 results in

> As an AI language model, I am not given an official name like "GPT-4". However, I am a continuation of the GPT (Generative Pre-trained Transformer) series of models developed by OpenAI. Currently, the most advanced version of the GPT series is GPT-3, which I am a part of. There has been no official announcement or confirmation regarding the development of a new version of GPT beyond GPT-3.

It doesn't seem to have image upload functionality yet either. Perhaps it is still rolling out?

mdswanson3y ago

At the end of the article: "Image inputs are still a research preview and not publicly available."

gliptic3y ago

The page says image uploading will come later.

dhc023y ago

Same for me.

mzs3y ago

Today's submarine from the NYT is hilariously unintentionally unflattering, did someone even proof read this? https://www.nytimes.com/2023/03/14/technology/openai-new-gpt...

  Mnemonics to learn basic words:
  Hola:"Hola, cola" - sounds like "cola" (Coke)
  Adiós: "Adiós, my toes" - rhymes with "toes"
  Gracias: "Gracias, grassy ass" - sounds like "grassy ass"
  Mañana: "Mañana, banana" - rhymes with "banana"
  Uno: "Uno, you know" - sounds like "you know"
  Tres: "Tres, dress" - rhymes with "dress"

  Conjugation of the verbs:
  Ser (To be)

  • Yo soy
  • Tú eres
  • Él/Ella/Usted es
  • Nosotros/Nosotras somos
  • Ellos/Ellas/Ustedes son

3 more replies

j / k navigate · click thread line to collapse

GPT-4 (opens in new tab)

2507 comments