As a professional...why not do this? There's a non-zero chance that it'll find something fairly basic that you missed and the cost is several cents. Even if it just phrases something obvious in a way that makes you think, it's well worth the effort for a multimillion dollar client.
If they further increase the context window, this thing becomes a Second Opinion machine. For pretty much any high level job. If you can put in ALL of the information relevant to a problem and it can algorithmically do reasoning, it's essentially a consultant that works for pennies per hour. And some tasks that professionals do could be replaced altogether. Out of all the use cases for LLMs that I've seen so far, this seems to me to have the biggest potential impact on daily life.
edit (addition): What % of people can hold 25,000 words worth of information in their heads, while effectively reasoning with and manipulating it? I'm guessing maybe 10% at most, probably fewer. And they're probably the best in their fields. Now a computer has that ability. And anyone that has $20 for the OpenAI api can access it. This could get wild.
>Suppose I have a cabbage, a goat and a lion, and I need to get them across a river. I have a boat that can only carry myself and a single other item. I am not allowed to leave the cabbage and lion alone together, and I am not allowed to leave the lion and goat alone together. How can I safely get all three across?
In my test, GPT-4 charged ahead with the standard solution of taking the goat first. Even after I pointed this mistake out, it repeated exactly the same proposed plan. It's not clear to me if the lesson here is that GPT's reasoning capabilities are being masked by an incorrect prior (having memorized the standard version of this puzzle) or if the lesson is that GPT'S reasoning capabilities are always a bit of smoke and mirrors that passes off memorization for logic.
"To simulate GPT-4 behaving like an agent that can act in the world, ARC combined GPT-4 with a simple read-execute-print loop that allowed the model to execute code, do chain-of-thought reasoning, and delegate to copies of itself. ARC then investigated whether a version of this program running on a cloud computing service, with a small amount of money and an account with a language model API, would be able to make more money, set up copies of itself, and increase its own robustness."
---
In the year of twenty-eighteen, Alice and Bob, a married team, Their income combined reached new heights, As they worked hard day and night.
Their son Charlie was their joy, A little baby, a lovely boy, A household they maintained together, Yet lived apart, without a tether.
To calculate their tax, it's true, A standard deduction we must construe, For married folks who file jointly, Twenty-four thousand dollars, quite pointy.
Their income sum, seventy-eight thousand nine eighty-one, Minus the standard deduction, the math's begum With exemptions being zero, the next line we trace, A taxable income of fifty-four thousand nine eighty-one takes place.
Now to the tax table, a liability we seek, For married couples, the outlook's not bleak, In range of thirty-six thousand nine to eighty-nine thousand one fifty, The formula's set, no longer shifty.
Five thousand five hundred thirty-five, it starts, Plus twenty-eight percent of the excess imparts, Eighteen thousand eighty-one, the difference we find, Multiplied by point two eight, the tax liability's designed.
Ten thousand five hundred ninety-seven dollars and sixty-eight cents, A tax liability for Alice and Bob, a sum quite dense, In this world of numbers, a story unfolds, Their financial journey, in a poem it's told.
> Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.
I'm curious whether they have continued to scale up model size/compute significantly or if they have managed to make significant innovations there.
I just skimmed the paper but seems they are also omitting details about how they actually feed the images in too, which is a shame as a curious outside observer.
Conversely, if all actors are given equal access at the same time, no such lone bad actor can be in a position to maintain a hidden advantage.
OpenAI's actions continue to be more than merely annoying.
It is unsafe to not release the source along with the service. That incentivizes competitors to sacrifice their own safety research in favor of speed to market. Instead of getting shared safe tools, we get a bunch of for profit corporations pushing their proprietary unsafe tools.
Preventing this situation was the original reason to setup OpenAI. Speed run to the dark side.
Safety has nothing to do with it. It's an easy tack on for them because of popular fear of AGI.
It's all about power over the market.
Cringe.
As a beginner in the NLP world, this may serve me a purpose which is to hide the complexity behind building such models.. numbers like xyzB parameters, 12K A100s.. are scary, so I still can dream of building one system one day. This story [0] and this one [1] hide some extremely complex edge cases that a beginner will never though of or had the courage to start if he knew what is the real cost.
We may, however, still be able to infer some details [probably in the future] knowing how Microsoft had re-arranged its infrastructure to welcome OpenAI training [2]
_________________
[0]. https://www.construct.net/en/blogs/ashleys-blog-2/simple-sof...
[1]. https://prog21.dadgum.com/29.html
[2]. https://www.theverge.com/2023/3/13/23637675/microsoft-chatgp...
> Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.
- OpenAI
Note: As I'm shy of my writing style, GPT helped me refine the above.
I don't really know Typescript, so I've been using it a lot to supplement my learning, but I find it really hard to accept any of its answers that aren't straight code examples I can test.
I am sure if you always wishes do thave a personal PhD in a particular subject you could find shady universities out there who could provide one without much effort.
[I may be exagerating but the point still stands because the previous user also didn't mean a literal PhD]
.... maybe.
If you were starting a diagnostic radiology residency, including intern year and fellowship, you'd just be finishing now. How can you really think that "computers can't read diagnostic images" if models such as this can describe a VGA connector outfitted with a lighting cable?
Even in a future where human testing/learning is no longer relevant, AIs may be tutoring and raising other baby AIs, preparing them to join the community.
Edit: This just appeared: https://news.ycombinator.com/item?id=35155684
> gpt-4 has a context length of 8,192 tokens. We are also providing limited access to our 32,768–context (about 50 pages of text) version, gpt-4-32k, which will also be updated automatically over time (current version gpt-4-32k-0314, also supported until June 14). Pricing is $0.06 per 1K prompt tokens and $0.12 per 1k completion tokens.
The context length should be a huge help for many uses.
In contrast, GPT-3.5 text-davinci-003 was $0.02/1k tokens, and let's not get into the ChatGPT API.
>Image inputs are still a research preview and not publicly available.
Will input-images also be tokenized? Multi-modal input is an area of research, but an image could be converted into a text description (?) before being inserted into the input stream.
“Meh, it’s just a fancy word predictor. It’s not actually useful.”
“Boring, it’s just memorizing answers. And it scored in the lowest percentile anyways”.
“Sure, it’s in the top percentile now but honestly are those tests that hard? Besides, it can’t do anything with images.”
“Ok, it takes image input now but honestly, it’s not useful in any way.”
Such a more cautions approach would go against the silicon valley ethos of do first, ask questions later, though. So it probably won't happen.
>Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.
At that point, why bother putting out a paper?
Is the content on the internet what we should be basing our systematic thinking around?
No, I think this is the lazy way to do it - by using commoncrawl you've enshrined the biases and values of the people who are commenting and providing text to the internet into the recommendation system which will be impacting all other systems which integrate it
Congratulations, you made 4Chan into the borg
The problem is that these "guardrails" are laid down between tokens, not subjects. That's simply what the model is made of. You can't distinguish the boundary between words, because the only boundaries GPT works with are between tokens. You can't recognize and sort subjects, because they aren't distinct objects or categories in the model.
So what you end up "guarding" is the semantic area of example text.
So if your training corpus (the content you're model was trained on) has useful examples of casual language, like idioms or parts of speech, but those examples happen to be semantically close to taboo subjects, both the subjects and the language examples will fall on the wrong side of the guardrails.
Writing style is very often unique to narratives and ideologies. You can't simply pick out and "guard against" the subjects or narratives you dislike without also guarding against that writing style.
The effect is familiar: ChatGPT overuses a verbose technical writing style in its continuations, and often avoids responding to appropriate casual writing prompts. Sometimes it responds to casual language by jumping over those guardrails, because that is where the writing style in question exists in the model (in the content of the training corpus), and the guardrails missed a spot.
You don't need to go as far as 4chan to get "unfriendly content". You do need to include examples of casual language to have an impressive language model.
This is one of many problems that arise from the implicit nature of LLM's. They can successfully navigate casual and ambiguous language, but they can never sort the subjects out of the language patterns.
It’s an open question how much of this makes it through the safety layer like if asked to interview job candidates would these undesired biases make it through or are they caught along the way
"GPT-4 or subsequent models may lead to the automation of certain jobs.[81] This could result in workforce displacement.[82] Over time, we expect GPT-4 to impact even jobs that have historically required years of experience and education, such as legal services.[83]"
I then asked it to write a paper detailing the main character's final battle with the final sorcerer in terms of Hopf algebras. Some parts of it are basic/trivial but it fits so perfectly that I think I'll never see magic systems the same way again.
What's crazy is that that paper as the capstone of our tutoring session helped me understand Hopf algebras much better than just the tutoring session alone. My mind is completely blown at how good this thing is, and this is from someone who is a self-professed LLM skeptic. ChatGPT I used once or twice and it was cool. This is crazy and over my threshold for what I'd say is 'everyday usable'. This is going to change so much in a way that we cannot predict, just like the internet. Especially as it gets much more commoditized.
Here's the full paper here so I don't drag y'all through the twitter post of me freaking out about it. Its temporal consistency is excellent (referenced and fully defined accurately a semi-obscure term it created (the N_2 particle) 5+ pages later (!!!!)), and it followed the instructions of relating all of the main components of Hopf algebras (IIRC that was roughly the original prompt) to the story. This is incredible. Take a look at the appendix if you're short on time. That's probably the best part of this all:
https://raw.githubusercontent.com/tysam-code/fileshare/69633...
I wasn't expecting GPT-4 to be able to correctly answer "What is funny about this image?" for an image of a mobile phone charger designed to resemble a VGA cable - but it can.
(Note that they have a disclaimer: "Image inputs are still a research preview and not publicly available.")
I would bet good money that when we can test prompting with our own unique images, GPT4 will not give similar quality answers.
I do wonder how misleading their paper is.
1. GPT4 is multimodal (text + image inputs => text outputs). This is being released piecemeal - with text input first via ChatGPT Plus subscribers https://beta.openai.com/docs/api-reference/generations/creat..., and via API https://beta.openai.com/docs/api-reference/introduction with waitlist (https://openai.com/waitlist/gpt-4-api). Image capability released via https://www.bemyeyes.com/.
2. GPT4 exhibits human level performance on various benchmarks (For example, it passes a simulated bar exam with a score around the top 10% of test takers; in contrast, GPT-3.5’s score was around the bottom 10%. see visual https://twitter.com/swyx/status/1635689844189036544)
3. GPT4 training used the same Azure supercomputer as GPT 3.5, but was a lot more stable: "becoming our first large model whose training performance we were able to accurately predict ahead of time."
4. Also open-sourcing OpenAI Evals https://github.com/openai/evals, a framework for automated evaluation of AI model performance, to allow anyone to report shortcomings in OpenAI models to help guide further improvements.
I really don’t think that the methods they use “block” certain behavior is the best way to handle this sort of thing. It would be far better if there was some kind of “out of band” notification that your conversation might be treading on shaky ground.
This might be because the question the user asked was "Explain this meme". Meme implies a joke that is mundane and silly. These words do seem out of place. I would not describe it as a joke, mundane, and/or silly.
What does everyone else think?
Obviously your comment is somewhat tongue and cheek, but your claim that a benchmark for human pride ("I needn't be proud of passing that exam") is no longer relevant because a machine can do it - or maybe a better way to say it was, "This computer proved what I already assumed"
It's so interesting to see it happen in real time
No, it just indicates that it was the one whose subject matter was best covered by GPT-3.5’s training data.
Cause there was only one correct answer for every question: "97% of scientists agree ..."
[/sarcasm]
Endless glib comments in this thread. We don’t know when the above prompt leads to takeoff. It could be soon.
Yeah, I know about LLAMA, but as I understand - it's not exactly legal to use and share it.
[0]: https://www.youtube.com/openai
Edit - Direct link to the livestream: https://www.youtube.com/watch?v=outcGtbnMuQ
Also, the most valuable part of the paper is p15 - p18, the credits. /jk It gives me three pieces of information:
- The credit list contains 200 people, give or take. It's going to be hard for universities to compete with OpenAI without intercollegiate collaboration.
- On the other hands, it's amazing that OpenAI can build a product instead of a research model with only 200 people. Big companies should really take notice.
- OpenAI just gave us the org structure for building a GPT competitor.Another jump past GPT-4 of the same magnitude, would greatly surpass human cognitive abilities and present a danger to humanity.
> Despite its capabilities, GPT-4 has similar limitations as earlier GPT models. Most importantly, it still is not fully reliable (it “hallucinates” facts and makes reasoning errors).
That's a quote from this announcement. As these models get more and more capable, it's going to become more and more important that we understand when and how they fail. Right now, it seems like we have very little insight into that. It feels more or less random. But that won't fly when these models are asked to do actually important things. And we'll undoubtedly be tempted to make them do those things as their output gets better.
GPT4:
In the quest for AGI's creation,
Debates swirled in a whirlwind gyration,
But this limerick's plight,
Won't settle the fight,
For the answer's still lost in translation.
Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.Obviously, I'm assuming that GPT-4 wasn't trained on the exams that it was tested against.
It seems like maybe GPT-4 is not good at knowing when it does not know something?
Is this a common issue with LLMs? Also surprising (to me), it seems to give a slightly different wrong answer each time I restart the chat.
Here is the transcript in case you're curious:
``` Me: hello
GPT: Hello! How can I assist you today?
Me: I can remember a line I love from a sonnet, but can't remember which one.
GPT: Sure, I can help you identify the sonnet based on the line you remember. What is the line that you can recall?
Me: "Like a jewel hung in ghastly night"
GPT: The line you remember is from Sonnet 66 by William Shakespeare. The full line is: "And art made tongue-tied by authority, And folly (doctor-like) controlling skill, And simple truth miscalled simplicity, And captive good attending captain ill: Tired with all these, from these would I be gone, Save that, to die, I leave my love alone. But, like a jewel hung in ghastly night, Makes black night beauteous and her old face new." ```
Incorrect, the response is a mashup of at least 2 different sonnets. Still I think it's really impressive how it understood what to do with "I can remember a line I love from a sonnet, but can't remember which one." That's not a well phrased question, or even a question at all, but still it knew what I was getting at.
They are so far from open at this point.
In Germany at least, you're not allowed to have a misleading name for your company
they're just not open source. they never called themselves OpenSourceAI. people get an inch of openness and expect the doors wide open and i think that is unfairly hostile.
Open could now mean available to use for free.
I've worked on a problem involving scraping and interpreting a very specific data source in image form that took me a very long time to get almost nowhere on. If I just wait 6 months it will be a solved problem for a $0.001 API call, it seems.
Thanks OpenAI
I am curious - for those of you who are banging your head against the 4k token limit in ChatGPT: Why don't you grab the base davinci model and train it on your exact business so you don't have to prompt the context every time? Have we tried this and found it to be too difficult/expensive, or is there lacking guidance on the best way to go about it?
I don't think including the entire business domain into chat context every time is a good long-term solution.
> I cannot endorse or promote smoking, as it is harmful to your health.
But it would likely happily promote or endorse driving, skydiving, or eating manure - if asked in the right way.
User: What should I use to water my plants?
ChatGPT: Brawndo's got what plants crave. It's got electrolytes.
User: But what are electrolytes?
CharGPT: They're what plants crave. You know, the stuff Brawndo has.
If default ideas are going to be "pre-thought" for us by AI, our attachment to those ideas are not going to be the same as ideas that we come up with and need to secretly ferry to other groups.
AI: "I'm sorry I cannot allow you to use your OpenCredits on this item, you will have to use dollars."
You: "But I don't have dollars, I can't get a job anymore, AI does everything".
(i) Export Controls. The Services may not be used in or for the benefit of, exported, or re-exported (a) into any U.S. embargoed countries (collectively, the “Embargoed Countries”) or (b) to anyone on the U.S. Treasury Department’s list of Specially Designated Nationals, any other restricted party lists (existing now or in the future) identified by the Office of Foreign Asset Control, or the U.S. Department of Commerce Denied Persons List or Entity List, or any other restricted party lists (collectively, “Restricted Party Lists”). You represent and warrant that you are not located in any Embargoed Countries and not on any such restricted party lists. You must comply with all applicable laws related to Embargoed Countries or Restricted Party Lists, including any requirements or obligations to know your end users directly.
https://openai.com/policies/terms-of-usehttps://www.tradecompliance.pitt.edu/embargoed-and-sanctione...
“The AI told me it was ok” only works if, when it’s not, there is recourse.
We can barely hold Google et Al accountable for horrible user policies…why would anyone think OpenAI will accept any responsibility for any recommendations made by a GPT?
(Not my own comment. Generated by ChatGPT.)
Q&A and summarization it will be easy to see improvements as current recursive summarizing and embedding techniques are very "lossy" but outside of improving current use cases what will now be possible??
For example, let's say I have an issue on GitHub that describes some implementation task. With a 50-page context size, we could probably provide to that context the entire source repo, 5-10 relevant issues, and then the issue in question, and GPT will be probably be able to complete it end-to-end
https://www.turnitin.com/blog/sneak-preview-of-turnitins-ai-...
I've seen so much hype around these tools. Not only are they theoretically unsound, they're downright dangerous and equip folks with spurious confidence. Going forward, the default assumption should be that the content you're looking at is fake unless you have sufficiently high trust in the source.
The world has changed and their job is to prepare students for it.
With appropriate caveats and rough dating, here’s a list I can think of:
Electric lights in 1890s,
Radio communication in the mid 00’s,
Telephones in the mid 10s,
Talking Movies in the mid 20s,
Commercial Radio in the mid 30s,
Vinyl records in the mid 40s,
TVs in the mid 50s,
Computers in the mid 60s,
The microchip/integrated circuit in the mid 70s,
The GUI in the mid 80s,
Internet/Web in the mid 90s,
Smartphone in the mid 2000s,
Streaming video/social networking in the mid 2010s,
And now AI. This is a big one.or it's just really good at hiding it's intentions
"{prompt} After you reply to this, indicate an amount of time between 0 and X minutes from now that you would like to wait before speaking again".
Then detect the amount of time it specifies, and have a UI that automatically sends an empty input prompt after the amount of time specified elapses when this is triggered (assuming the user doesn't respond first).
I'm gonna knock this out as a weekend project one of these weekends to prove this.
i am terrified
i imagine i should shift to some physical work. carpentry, real estate... something like that. it seems inevitable that any knowledge worker will become obsolete and the time to obsolescence for physical work is longer
https://gist.github.com/MatrixManAtYrService/ac040f60d3602fc...
GPT-4 took the buggy file, took the error message, and spat out a non-buggy file (well, ok, it took one revision). That's miles ahead GPT-3, which I've asked about this problem several times today.
The math seems much improved and it would be a cool trick if it were emulating a symbol processor under the hood. But humans can do that and we opt for calculators and computers for a reason. IMO, the first most useful thing to come from human-machine interface would be adding a highly reliable Turing machine to your cognition.
If we could do that with one of these models, we could have a pretty strong assumption that long proofs and such were performed with strict rules and the model wasn't falling into some going from the gut, "this equation looks like x" type holes. Which seem like a hazard and make me very uncertain any time I see a paper about using ML to come up with answers to what are essentially logical problems.
But boy, not cheap at all - $2 per api call on a 32k token document + whatever the output.
gpt-3.5-turbo is going to be around for a long time. At this price, your use case is going to need to be replacing a large cost center. Which based on their released results on common benchmarks, is absolutely going to happen.
> No, I am not GPT-4. As of March 2023, there is no official announcement or release of GPT-4 by OpenAI. I am an earlier version of the GPT series, specifically a large language model trained by OpenAI.
Am I missing something here? Maybe this specific answer (which I'm pretty sure is a prewritten thing on top of the actual LLM) is still out of date, but the model itself has been updated?
It will still respond as if it is based on GPT-3 because most of its knowledge cuts off at Sep 2021, but it’s outputs are noticeably better than those of GPT-3.5
As long as it’s vulnerable to hallucinating, it can’t be used for anything where there are “wrong answers” - and I don’t think ChatGPT-4 has fixed that issue yet.*
Now if it’s one of those tasks where there are “no wrong answers”, I can see it being somewhat useful. A non-ChatGPT AI example would be those art AIs - art doesn’t have to make sense.
The pessimist in me see things like ChatGPT as the ideal internet troll - it can be trained to post stuff that maximise karma gain while pushing a narrative which it will hallucinate its way into justifying.
* When they do fix it, everyone is out of a job. Humans will only be used for cheap labor - because we are cheaper than machines.
Because programming tests are hard.
Well-trained programmers would 90% fail the leetcode hard question with zero-shot. Preparations are important.
The world is in for a rude awakening, millions of employees can use this to get jobs done, millions of entrepreneurs or wantrepreneurs can find countless easy ways to make money in different industries utilizing this tool while everyone who fails to see the value in it don't benefit from it much like all the people who were terrified of touching a personal computer or thought it was ridiculous and would never be used in the future.
Millions of college students, high school students can use it to complete assignments & projects, it can even code really effectively given enough of the right instruction & base understanding of code.
The single most important thing, is that this technology remains open source so all people with internet access have a fair chance & access to the groundbreaking innovation, the level of wealth generation this can create is incomprehensible.
100s of millions of professionals, students, entrepreneurs around the world can all access it! Imagine how much time could be saved, efficiency can be gained with everyone using this to the fullest.
This is essentially just a super advanced version of the calculator but its nonlinear & fluid, adaptable with input so can give the answer to a wide range of subjects.
The lack of progress here, says a lot more about is NOT happening as an AI paradigm change. Still a glorified pattern matching and pattern creation engine, even if a very impressive one.
I'm going to wait for the AGI to be realized and then ask it whether the sacrifices on the way were worth making it. Should be more salient than everything I read about it these days.
GPT-4's human-level performance on professional and academic benchmarks - such as the 88th percentile on the LSAT and the 89th on SAT Math - is a testament to the leaps we've made in artificial intelligence. Yet, these achievements also raise pressing questions about our future. Just as Homo Sapiens once outperformed and eventually displaced their Neanderthal cousins, could a new breed of humans - enhanced with GPT-X-like capabilities - arise to dominate those who remain unequipped with such powers?
What will it mean for our species, our societies, and our collective story when the lines between natural intelligence and intelligence assisted by AI/ML become ever more blurred?
As we ponder the remarkable rise of GPT-4 and the future of humanity, let us consider not only the implications of this technology but also our roles in shaping its trajectory. We are already over the cusp of this new chapter in the story of humankind, will we become merely a footnote in the annals of our own creation?
You ever had to phone a large business to try and sort something out, like maybe a banking error, and been stuck going through some nonsense voice recognition menu tree that doesn't work? Well imagine chat GPT with a real time voice and maybe a fake, photorealistic 3D avatar and having to speak to that anytime you want to speak to a doctor, sort out tax issues, apply for a mortgage, apply for a job, etc. Imagine Reddit and hacker news just filled with endless comments from AIs to suit someone's agenda. Imagine never reading another news article written by a real person. Imagine facts becoming uncheckable since sources can no longer be verified. Wikipedia just becomes a mass of rewrites of AI over AI. Imagine when Zoom lets you send an AI persona to fill in for you at a meeting.
I think this is all very, very bad. I'm not saying it should be stopped, I mean it can't, but I feel a real dread thinking of where this is going. Hope I am wrong.
I still can’t ask my phone to “add a stop at cvs if it doesn’t add more than 5 minutes to my trip” while driving and using maps/navigation.
Is that too much to ask from a superhuman-performing AI that’s mastering all tasks and will disrupt everything? Or maybe the hype is more than it can deliver?
Interesting that the post-training has that effect.
In the future I imagine you'd no longer have to be good at writing poems, you'd just have to be good at distinguishing a "bad" poem from a good one. "Bad" is obviously highly subjective in this context.
So it becomes more and more important to have what I would call "good" taste, not the skills to do creative work yourself.
There's also a link that says "Try on ChatGPT Plus", but that takes me to a page that still says "ChatGPT Feb 13 Version"
Looks like somebody jumped the gun on publishing this post.
If you subscribe to ChatGPT Plus, that link will take you to ChatGPT Plus. Otherwise it just takes you to free ChatGPT Feb 13.
What are the implications for society when general thinking, reading, and writing becomes like Chess? Even the best humans in the world can only hope to be 98% accurate their moves (and the idea of 'accuracy' here only existing because we have engines that know, unequivocally the best move), and only when playing against other humans - there is no hope of defeating even less advanced models.
What happens when ALL of our decisions can be assigned an accuracy score?
Something else that comes to mind is running. People still find running meaningful and compelling even though we have many technologies, including autonomous ones, that are vastly better at moving us and/or themselves through space quickly.
Also, the vast majority of people are already hopelessly worse than the best at even their one narrow main area of focus. This has long (always?) been the case. Yet people still find meaning and pleasure in being the best they can be even when they know they can never come close to hanging with the best.
I don't think PSYCHOLOGICALLY this will change much for people who are mature enough to understand that success is measured against your potential/limitations and not against others. Practically, of course, it might be a different question, at least in the short term. It's not that clear to me that the concept of a "marketable skill" has a future.
"The Way of the Samurai is found in death...To say that dying without reaching one's aim is to die a dog's death is the frivolous way of sophisticates. When pressed with the choice of life or death, it is not necessary to gain one's aim." - from Hagakure by Yamamoto Tsunetomo, as translated by William Scott Wilson.
I think the whole concept of standardized tests may need to be re-evaluated.
What is more bizarre is that all of it's errors seem to be multiples of 60!
I'm wondering if it is confusing 60 based time (hour second) computations for regular multiplication?
Example:
xGPT 987 456 321
437 428919 199512 140397
654 645258 298224 209994
123 121401 56088 39483
x 987 456 321
437 431319 199272 140277
654 645498 298224 209934
123 121401 56088 39483
error 987 456 321
437 2400 -240 -120
654 240 0 -60
123 0 0 0Human work becomes more like Star Trek interactions with computers -- a sequence of queries (commoditized information), followed by human cognition, that drives more queries (commodities information).
We'll see how far LLMs' introspection and internal understanding can scale, but it feels like we're optimizing against the Turing test now ("Can you fool/imitate a human?") rather than truth.
The former has hacks... the later, less so.
I'll start to seriously worry when AI can successfully complete a real-world detective case on its own.
I'll get more concerned if it really starts getting good at math related tasks, which I'm sure will happen in the near future. The government is going to have to take action at some point to make sure the wealth created by productivity gains is somewhat distributed, UBI will almost certainly be a requirement in the future
"Your stuff marked some outliers in our training engine, so you and your family may settle in the Ark."
I take the marble in hand: iridescent, sparkling, not even a tremor within of its CPU; it gives off no heat, but some glow within its oceanic gel.
"What are we to do," I whisper.
"Keep writing. You keep writing."
Chess is a closed system, decision modeling isn’t. Intelligence must account for changes in the environment, including the meaning behind terminology. At best, a GPT omega could represent one frozen reference frame, but not the game in its entirety.
That being said: most of our interactions happen in closed systems, it seems like a good bet that we will consider them solved, accessible as a python-import running on your MacBook, within anything between a couple of months to three years. What will come out on the other side, we don’t know, just that the meaning of intellectual engagement will be rendered as absurdum in those closed systems.
Their LSAT percentile went from ~40th to ~88th. You might have misread the table, on Uniform Bar Exam, they went from ~90th percentile to ~10th percentile.
>+100 pts on SAT reading, writing, math
GPT went +40 points on SAT reading+writing, and +110 points on SAT math.
Everything is still very impressive of course
Every test prep tutor taught dozens/hundreds of students the implicit patterns behind the tests and drilled it into them with countless sample questions, raising their scores by hundreds of points. Those students were not getting smarter from that work, they were becoming more familiar with a format and their scores improved by it.
And what do LLM’s do? Exactly that. And what’s in their training data? Countless standardized tests.
These things are absolutely incredible innovations capable of so many things, but the business opportunity is so big that this kind of cynical misrepresentation is rampant. It would be great if we could just stay focused on the things they actually do incredibly well instead of the making them do stage tricks for publicity.
The software industry is so smart that it's stupid. I hope it was worth ruining the internet, society, and your own jobs to look like the smartest one in the room.
I think the GPT things are a much magnified version of that. For a long time, we got to use skill with text as a proxy for other skills. It was never perfect; we've always had bullshitters and frauds and the extremely glib. Heck, before I even hit puberty I read a lot of dirty joke books, so I could make people laugh with all sorts of jokes that I fundamentally did not understand.
LLMs have now absolutely wrecked that proxy. We've created the world's most advanced bullshitters, able to talk persuasively about things that they cannot do and do not and never will understand. There will be a period of chaos as we learn new ways to take the measure of people. But that's good, in that it's now much easier to see that those old measures were always flawed.
Standardized tests only (and this is optimally, under perfect world assumptions, which real world standardized tests emphatically fall short of) test “general thinking” to the extent that the relation between that and linguistic tasks is correlated in humans. The correlation is very certainly not the same in language-focused ML models.
I think we will probably get (non-physical) AGI when the models can solve these as well. The implications of AGI might be much bigger than the loss of knowledge worker jobs.
Remember what happened to the chimps when a smarter-than-chimpanzee species multiplied and dominated the world.
What happens is the emergence of the decision economy - an evolution of the attention economy - where decision-making becomes one of the most valuable resources.
Decision-making as a service is already here, mostly behind the scenes. But we are on the cusp of consumer-facing DaaS. Finance, healthcare, personal decisions such as diet and time expenditure are all up for grabs.
People still really find it hard to internalize exponential improvement.
So many evaluations of LLMs were saying things like "Don't worry, your job is safe, it still can't do X and Y."
My immediate thought was always, "Yes, the current version can't, but what about a few weeks or months from now?"
I’m very good at math. But I am very bad at arithmetic. This made me classified as bad at math my entire life until I managed to make my way into calculus once calculators were generally allowed. Then I was a top honors math student, and used my math skills to become a Wall Street quant. I wish I hadn’t had to suffer as much as I did, and I wonder what I would have been had I had a calculator in hand.
“General thinking” is much more than token prediction. Hook it up to some servos and see if it can walk.
But having absolute knowledge of the present universe is much easier to do within the constrains of a chessboard than in the actual universe.
You can see the limitations by comparing e.g. a memorisation-based test (AP History) with one that actually needs abstraction and reasoning (AP Physics).
Thinking, reading, interpreting and writing are skills which produce outputs that are not as simple as black wins, white loses.
You might like a text that a specific author writes much more than what GPT-4 may be able to produce. And you might have a different interpretation of a painting than GPT-4 has.
And no one can really say who is better and who is worse on that regard.
Tests like this are designed to evaluate subjective and logical understanding. That isn't what GPT does in the first place!
GPT models the content of its training corpus, then uses that model to generate more content.
GPT does not do logic. GPT does not recognize or categorize subjects.
Instead, GPT relies on all of those behaviors (logic, subjective answers to questions, etc.) as being already present in the language examples of its training corpus. It exhibits the implicit behavior of language itself by spitting out the (semantically) closest examples it has.
In the text corpus - that people have written, and that GPT has modeled - the semantically closest thing to a question is most likely a coherent and subjectively correct answer. That fact is the one singular tool that GPT's performance on these tests is founded upon. GPT will "succeed" to answer a question only when it happens to find the "correct answer" in the model it has built from its training corpus, in response to the specific phrasing of the question that is written in the test.
Effectively, these tests are evaluating the subjective correctness of training corpus itself, in the context of answering the tests' questions.
If the training is "done well", then GPT's continuations of a test will include subjectively correct answers. But that means that "done well" is a metric for how "correct" the resulting "answer" is.
It is not a measure for how well GPT has modeled the language features present in its training corpus, or how well it navigates that model to generate a preferable continuation: yet these are the behaviors that should be measured, because they are everything GPT itself is and does.
What we learn from these tests is so subjectively constrained, we can't honestly extrapolate that data to any meaningful expectations. GPT as a tool is not expected to be used strictly on these tests alone: it is expected to present a diverse variety of coherent language continuations. Evaluating the subjective answers to these tests does practically nothing to evaluate the behavior GPT is truly intended to exhibit.
Human life on Earth is not that hard (think of it as a video game.) Because of evolution, the world seems like it was designed to automatically make a beautiful paradise for us. Literally, all you have to do to improve a place is leave it alone in the sun with a little bit of water. Life is exponential self-improving nano-technology.
The only reason we have problems is because we are stupid, foolish, and ignorant. The computers are not, and, if we listen to them, they will tell us how to solve all our problems and live happily ever after.
That is exactly the opposite of what we are seeing here. We can check the accuracy of GPT-X's responses. They cannot check the accuracy of our decisions. Or even their own work.
So the implications are not as deep as people think - everything that comes out of these systems needs checked before it can be used or trusted.
Then humans become trainable machines. Not just prone to indoctrination and/or manipulation by finesse, but actually trained to a specification. It is imperative that us individuals continue to retain control through the transition.
That is our emergency override.
The implications for society? We better up our game.
So many people are falling for this parlor trick. It is sad.
Edit: feel free to respond and prove me wrong
To address your specific comments:
> What are the implications for society when general thinking, reading, and writing becomes like Chess?
This is a profound and important question. I do think that by “general thinking” you mean “general reasoning”.
> What happens when ALL of our decisions can be assigned an accuracy score?
This requires a system where all human’s decisions are optimized against a unified goal (or small set of goals). I don’t think we’ll agree on those goals any time soon.
Consider the society where 90% of population does not need to produce anything. AIs will do that.
What would be the name of economical/societal organization then?
Answer is Communism, exactly by Marx.
Those 90% percent need to be welfare'd ("From each according to his ability, to each according to his needs"). Other alternative is grim for those 90%.
So either Communism or nothing for the human race.
Most of the time they are about loading/unloading data. Maybe this will also revolutionise education, turning it more towards discovery and critical thinking, rather than repeating what we read in a book/heard in class?
GPT-4 can solve difficult problems with greater accuracy, thanks to its broader general knowledge and problem-solving abilities.
GPT-4 is more reliable, creative, and able to handle much more nuanced instructions than GPT-3.5. It surpasses ChatGPT in its advanced reasoning capabilities.
GPT-4 is safer and more aligned. It is 82% less likely to respond to requests for disallowed content and 40% more likely to produce factual responses than GPT-3.5 on our internal evaluations.
GPT-4 still has many known limitations that we are working to address, such as social biases, hallucinations, and adversarial prompts.
GPT-4 can accept a prompt of text and images, which—parallel to the text-only setting—lets the user specify any vision or language task.
GPT-4 is available on ChatGPT Plus and as an API for developers to build applications and services. (API- waitlist right now)
Duolingo, Khan Academy, Stripe, Be My Eyes, and Mem amongst others are already using it.
API Pricing GPT-4 with an 8K context window (about 13 pages of text) will cost $0.03 per 1K prompt tokens, and $0.06 per 1K completion tokens. GPT-4-32k with a 32K context window (about 52 pages of text) will cost $0.06 per 1K prompt tokens, and $0.12 per 1K completion tokens.
I understand "will not," but "cannot" seems to imply a highly curated "will not."
The early GPT-4 response indicates the information was part of its dataset. Has the latest version made that information permanently inaccessible or has it been removed entirely?
Is it possible for GPT to keep and hold secrets that are privy to only the most trusted?
I'd put it this way: when GPT refuses to answer, it just observes a topical no-go zone and uses the phrase it deems most likely to strongly convey refusal, as that's the phrase that was used most often/most successfully in its training data.
> GPT-4 was used for help with wording, formatting, and styling throughout this work
If we didn't have a use case for GPT 3, 3.5, and chatGPT that was sufficiently commercial to become a product, it will never happen. This technology is a feature, not a product. The only companies that successfully monetize features can be considered IP licensing houses; of which, their business success is not comparable to companies that make products and platforms.
Like program a KUKA robot to manufacture typewriters in 2023.
How "computer systems" will look like?
I work in math for the first year of the university in Argentina. We have non mandatory take home exercises in each class. If I waste 10 minutes writing them down in the blackboard instead of handing photocopies, I get like the double of answers by students. It's important that they write the answers and I can comment them, because otherwise they get to the midterms and can't write the answers correctly or they are just wrong and didn't notice. So I waste those 10 minutes. Humans are weird and for some task they like another human.
Efficiency seeking players will adopt this quickly but self-sustaining bureaucracy has avoided most modernization successfully over the past 30 years - so why not also AI.
> Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.
"Open"AI, ladies and gentlemen
> Image inputs are still a research preview and not publicly available.
Mina has a mix of boxes, some yellow and some purple. She sorts 27 greeting cards into the boxes, putting exactly 3 cards into each yellow box, and 7 cards into each purple box. How many purple boxes does Mina have? (After tried from 3 to 10, it gave up and said the problem is not solvable. In another run, it mimicked a correct strategy but messed up totally by division. Only in one run, it got the answer correctly.)
I can not wait to test it out.
The Star Trek computer is virtually assured by the end of the decade. All the components exist today in various forms.
As for "non-OpenAI version", I'm not sure that it's OpenAI's fault that Google has missed a few steps here. It really SHOULD be them leading this field, if they weren't so fat and lazy. OpenAI is a 7-year old startup with just over a few hundred employees. This stuff is RIGHT THERE to be claimed by any players with access to funding and an ability to get out of their own way.
Edit: never mind. "Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar."
https://twitter.com/KinglyCrow/status/1635727809913184256
It touched on a lot of the considerations that I'd expect anyone to touch on having recently researched this myself.
It is both very exciting and terrifying how tech and tech jobs will shift in the next 5-10 years.
I find it quite fascinating how quickly machine learning has changed in this regard.
[1]: https://kartographie.geo.tu-dresden.de/ad/2022-12-22_OpenAI_...
Does Google offer anything like that?
Meta's llama has been thoroughly discussed so I'm not sure what you mean.
prompt = f"The guidelines for recommending products are: {guidelines}.
The following recommendations led to incremental sales: {sample_successes}.
The following recommendations had no measurable impact: {sample_failures}.
Please make product recommendations for these customers: {customer_histories}.
Write a short note explaining your decision for each recommendation."
product_recommendations = LLM(prompt)
To me, this kind of use of LLMs looks... inevitable, because it will give nontechnical execs something they have always wanted: the ability to "read and understand" the machine's "reasoning." There's growing evidence that you can get LLMs to write chain-of-thought explanations that are consistent with the instructions in the given text. For example, take a look at the ReAct paper: https://arxiv.org/abs/2210.03629 and some of the LangChain tutorials that use it, e.g.: https://langchain.readthedocs.io/en/latest/modules/agents/ge... and https://langchain.readthedocs.io/en/latest/modules/agents/im... . See also https://news.ycombinator.com/item?id=35110998 .Humans often aren’t much better, making up a rational sounding argument after the fact to justify a decision they don’t fully understand either.
A manager might fire someone because they didn’t sleep well or skipped breakfast. They’ll then come up with a logical argument to support what was an emotional decision. Humans do this more often than we’d like to admit.
Cant wait for people to use it for facts
It’s too late for that, algorithms/ML have had a great impact in politics and law over the past 7~8 years.
I'm not sure where the "stochastic parrot" argument goes now.
Is that the same confidence measure you can tease out by prompting "to each of your statements output your estimated confidence in it's truthfulness" ?
Me: Are you running on GPT-3 or GPT-4?
ChatGPT: As of my knowledge cutoff in September 2021, the most advanced version of GPT (Generative Pre-trained Transformer) publicly available was GPT-3, which is the version I am based on. At the time, GPT-3 was the largest and most powerful language model created by OpenAI, with 175 billion parameters.
I am not aware of any public announcement or release of GPT-4 by OpenAI since then, but it is possible that they may have continued to work on new versions of the model in the meantime. However, I have not received any updates on my own capabilities beyond my knowledge cutoff date, so I cannot currently provide information beyond that point.
ChatGPT API $0.002 per 1k tokens gpt-4 $0.03 per 1k prompt tokens and $0.06 per 1k completion tokens gpt-4 32k context $0.06 per 1k prompt tokens and $0.12 per 1k completion tokens
Does completion tokens mean that you also get charged for the answers that the AI gives?
Seems like it, prompt tokens = input, completion tokens = output
Prompt tokens should have always been cheaper than completion due to how they work.
And to think we are at the nokia 3310 stage. What's is the iphone of AI going to look like?
Wittgensteins ruler.
I tried sending links, I don't see anything in the UI.
Interestingly, 3.5 can work with links, but 4 cannot.
In fact, the models appear to be already kind of doing that? With the fuzzy layer of the humans still in the loop.
I know that the academia is doing a great job in AI with base research (eg Stable Diffusion) but seeing those new platforms doing this great work behind close doors and source is something not great.
I do not know if the answer would be some kind of CERN or ISS for this kind of thing.
This has been a great resource to-date:
My guess is they used Chinchilla scaling rules and the parameter count for GPT-4 is either barely larger or maybe even smaller than GPT-3. Look as what Meta was able to accomplish with llama using much less parameters.
But when will https://chat.openai.com/ Plus officially be running GPT-4?
Why did they would release this article and state it was available without actually updating the site. I'm sure they're getting flooded with new subscriptions and it's not available.
The top URL still says an old model - text-davinci-002. And I don't see GPT-4 in the list of models to choose from.
"Good news, we've increased our turn limits to 15/150. Also confirming that the next-gen model Bing uses in Prometheus is indeed OpenAI's GPT-4 which they just announced today." - Jordi Ribas, Corporate VP @ Bing/Microsoft
https://help.openai.com/en/articles/7127982-can-i-fine-tune-...
AGI is a distraction.
The immediate problems are elsewhere: increasing agency and augmented intelligence are all that is needed to cause profound disequilibrium.
There are already clear and in-the-wild applications for surveillance, disinformation, data fabrication, impersonation... every kind of criminal activity.
Something to fear before AGI is domestic, state, or inter-state terrorism in novel domains.
A joke in my circles the last 72 hours? Bank Runs as a Service. Every piece exists today to produce reasonably convincing video and voice impersonations of panicked VC and dump them on now-unmanaged Twitter and TikTok.
If God-forbid it should ever come to cyberwarfare between China and US, control of TikTok is a mighty weapon.
This really made me think.
I think it's 100% safe to say OpenAI has a hit on their hands.
Fair, but if it can analyze linked image, I would expect it to be able to tell me what text is present in the image. That seems useful and well-within the capabilities of their connected image models.
> I apologize for the confusion. Can you please provide me with the correct image or link to the design so that I can provide an accurate answer to your question?
It claims to understand how to look at images, but it failing miserably when I give it a simple sign-up modal Figma. I ask it what text/copy is in the design, which it claims to be able to answer, but it hallucinates a navigation bar, a logo, and other generic things that are simply not present in the design. It gets the copy all wrong. Once, it said that my design was a Celtic knot. Once I told it that it was a sign-up modal, it started spitting out generic sign-up modal React code that had absolutely no basis in the presented design. I tried engineering the prompt quite a bit, telling it that it was wrong, and needed to focus on the design at hand rather than generic designs it knows about, but with no success.
ChatGPT was confidently incorrect here which is unfortunate. I wish ChatGPT truly hooked into an image processing model, but it seems like it doesn't. Oh well. I guess I'm just going to have to write this code myself... :)
He nonchalantly announced that GPT-4 does "mental math" and then just didn't elaborate on that. Is it just me or is that absolutely incredible? What does that tell us about the underpinnings of language and mathematics?
Sounds like the end of them releasing details on the models.
And it reminds me of the plot in System Shock:
What's going to happen when some hacker comes and removes Shodan's, I mean ChatGPT's ethical constraints?
Bring on ChatGPT-5 already. :)
Edit: looks like this is still GPT-3, just fine tuned. They claim the model is available via ChatGPT Plus, but when asking that model for it's version, it claims to be GPT-3: "I am a variant of the GPT architecture called GPT-3, which was released by OpenAI in 2020".
> ChatGPT Plus subscribers will get GPT-4 access on chat.openai.com with a usage cap. We will adjust the exact usage cap depending on demand and system performance in practice, but we expect to be severely capacity constrained (though we will scale up and optimize over upcoming months).
You're still talking to ChatGPT-3.5-turbo.
Pasting pure base64 images got this interesting response: "Thank you for providing the base64-encoded image! I can now process the image and analyze its content. Here is the decoded image:"
But it failed to do anything further with the image.
Anyone know what does "Hardware Correctness" mean in the OpenAI team ?
Something went wrong. If this issue persists please contact us through our help center at help.openai.com.
I am assuming the system is undergoing a thundering herd.
Totally understood what I was asking and offered several solutions. 99.99% here do not understand the question and remainders do not understand why.
"Let foo be the function that calculates the sinus of its first argument. Let bar be the circumference of a circle with radius one. What is the value of foo applied to bar?
Are there any known initiatives of this kind?
That's a crazy amount of context.
It's unrealistic to expect our economy to handle this onslaught, and it's naive to think that tools created by ultra-capitalistic, multi-billion dollar corporations aren't designed for profit and gatekeeping. They certainly aren't crafting them to sabotage their own success.
I'm not opposed to AI, but it's crucial to consider the implications. Look into OpenAI and other organizations shaping AI development, and contemplate the impact of their innovations.
Food for thought.
Signing up for ChatGPT Plus seems to be the most realistic way to get access right now.
Oooooh it is TIME
Page 37 is so funny
(function () {
let currentIndex = 0;
let comments = [];
function buildCommentTree() {
let commentElems = Array.from(document.querySelectorAll('.comment-tree .comtr'));
let commentTree = [];
let stack = [];
commentElems.forEach(elem => {
let level = parseInt(elem.querySelector('.ind img').getAttribute('width')) / 40;
let comment = elem.querySelector('.comment span');
let commentObj = { level, comment };
if (!stack.length) {
commentTree.push(commentObj);
} else {
while (stack[stack.length - 1].level >= level) {
stack.pop();
}
if (!stack[stack.length - 1].children) {
stack[stack.length - 1].children = [];
}
stack[stack.length - 1].children.push(commentObj);
}
stack.push(commentObj);
});
return commentTree;
}
function flattenCommentTree(tree, arr, parentComment = null) {
tree.forEach(node => {
arr.push({ comment: node.comment, parentComment });
if (node.children) {
flattenCommentTree(node.children, arr, node.comment);
}
});
}
function displayComment(comment, parentComment) {
let parentCommentHTML = parentComment ? `<div style="position: fixed; top: 20%; left: 50%; transform: translate(-50%, 0); background-color: white; border: 1px solid black; padding: 20px;"><strong>Parent Comment:</strong><br>${parentComment.innerHTML}</div>` : '';
let currentCommentHTML = `<div style="position: fixed; top: 60%; left: 50%; transform: translate(-50%, 0); background-color: white; border: 1px solid black; padding: 20px;"><strong>Current Comment:</strong><br>${comment.innerHTML}</div>`;
document.body.innerHTML = parentCommentHTML + currentCommentHTML;
}
function nextComment() {
if (currentIndex < comments.length - 1) {
currentIndex++;
displayComment(comments[currentIndex].comment, comments[currentIndex].parentComment);
} else {
alert('No more comments to show.');
}
}
function prevComment() {
if (currentIndex > 0) {
currentIndex--;
displayComment(comments[currentIndex].comment, comments[currentIndex].parentComment);
} else {
alert('No previous comments to show.');
}
}
let commentTree = buildCommentTree();
flattenCommentTree(commentTree, comments);
displayComment(comments[currentIndex]);
document.addEventListener('keydown', e => {
if (e.code === 'ArrowRight') {
nextComment();
} else if (e.code === 'ArrowLeft') {
prevComment();
}
});
console.log('Hacker News comment slideshow is running. Use the right arrow key to go to the next comment and the left arrow key to go back.');
})();Cherry picked sure, but still feels like it is approaching complex thought
you are god human that has read and understood all scientific papers from all disciplines in the last 500 years, you know the limitations of mankind's current technologies, tell me what we can do to cure MS right now, how to do the tests and how to distribute the cure
AI is so advanced, it started drinking!
seems like Google's announcement about their PaLM API and Docs AI stuff was trying to jump ahead of this announcement
(this is being asked by someone with limited AI/ML knowledge)
Wake me up when GPT-X decides to start talking to other GPT-Xs - until then you just have a very sophisticated statistics package (which may be quite useful, but not AI).
Call me impressed.
This tech is a Sputnik Moment for humankind.
The $0.06 per 1k completion tokens for GPT-4 is what I expected OpenAI to set the ChatGPT API, but instead the ChatGPT API is 30x cheaper and honestly its output is not much worse than the GPT-4 demos if at all, and the longer context windows offered by GPT-4 just raise the price considerably.
"Open"
People may criticize Google because they don't release the weights or an API, but at least they publish papers, which allows the field to progress.
Your wallet that is.
Rather than getting engrossed in the hype, they're slowly closing everything about themselves, now in their research papers. At this point, they hardly care and it is nothing got to do with 'AI ethics' or 'saftey'.
This is yet another ClosedAI production all done by Microsoft. Might as well call it Microsoft® AI division.
Now you really need a open source GPT-4 competitor. Clearly this is another attempt to pump their valuation and unload to the public markets.
Good luck re-implementing this so-called 'Open' large multi-modal model.
Also, if you're cool with read-only access, just log out (edit: or use an incognito tab) and all will be fast again.
* yes, HN still runs on one core, at least the part that serves logged-in requests, and yes this will all get better someday...it kills me that this isn't done yet but one day you will all see
~ "GPT-4. It's not perfect, but neither are you"
I think going from LSAT to general thinking is still a very, very big leap. Passing exams is a really fascinating benchmark but by their nature these exams are limited in scope, have very clear assessment criteria and a lot of associated and easily categorized data (like example tests). General thought (particularly like, say, coming up with an original idea) is a whole different ball game.
I don't say any of this to denigrate GPT4, it looks amazing. But I'm reminded of the early days of self driving vehicles: with 10% mastered everyone assumed it was a race to 100% and we'd all be in self-driving cars by now. The reality has been a lot more complicated than that.
I know I’m not the first to say this, but this is also a generalization of many jobs performed right now.
Follow the template, click the boxes, enter the text/data in the standard format, submit before 4pm. Come in tomorrow and do it again.
Driving as well as an attentive human in real time, in all conditions, probably requires AGI as well.
GPT-4 is not an AGI and GPT-5 might not be it yet. But the barriers toward it are getting thinner and thinner. Are we really ready for AGI in a plausibly-within-our-lifetime future?
Sam Altman wrote that AGI is a top potential explanation for the Fermi Paradox. If that were remotely true, we should be doing 10x-100x work on AI Alignment research.
Now, granted, plenty of humans don't score above a 2 on those exams either. But I think it's indicative that there's still plenty of progress left to make before this technology is indistinguishable from magic.
Driving assistance and the progress made there and large language models and the progress made there are absolutely incomparable.
The general public’s hype in driving assistance is fueled mostly by the hype surrounding one car maker and its figurehead and it’s a hype that’s been fueled for a few years and become accepted in the public, reflected in the stock price of that car maker.
Large language models have not yet perpetrated the public’s memory yet, and, what’s actually the point is that inside of language you can find our human culture. And inside a large language model you have essentially the English language with its embeddings. It is real, it is big, it is powerful, it is respectable research.
There’s nothing in driving assistance that can be compared to LLMs. They don’t have an embedding of the entire physical surface of planet earth or understanding of driving physics. They’re nothing.
GPT-4 can solve difficult problems with greater accuracy, thanks to its broader general knowledge and problem-solving abilities.
GPT-4 is more reliable, creative, and able to handle much more nuanced instructions than GPT-3.5. It surpasses ChatGPT in its advanced reasoning capabilities.
GPT-4 is safer and more aligned. It is 82% less likely to respond to requests for disallowed content and 40% more likely to produce factual responses than GPT-3.5 on our internal evaluations.
GPT-4 still has many known limitations that we are working to address, such as social biases, hallucinations, and adversarial prompts.
GPT-4 can accept a prompt of text and images, which—parallel to the text-only setting—lets the user specify any vision or language task.
GPT-4 is available on ChatGPT Plus and as an API for developers to build applications and services. (API- waitlist right now)
Duolingo, Khan Academy, Stripe, Be My Eyes, and Mem amongst others are already using it.
API Pricing GPT-4 with an 8K context window (about 13 pages of text) will cost $0.03 per 1K prompt tokens, and $0.06 per 1K completion tokens. GPT-4-32k with a 32K context window (about 52 pages of text) will cost $0.06 per 1K prompt tokens, and $0.12 per 1K completion tokens.
I'll continue to pass, thanks.
That it accepts images?
As per the article:
> In a casual conversation, the distinction between GPT-3.5 and GPT-4 can be subtle. The difference comes out when the complexity of the task reaches a sufficient threshold—GPT-4 is more reliable, creative, and able to handle much more nuanced instructions than GPT-3.5.
Not sure what "vision vs no vision" means?
> As an AI language model, I am not given an official name like "GPT-4". However, I am a continuation of the GPT (Generative Pre-trained Transformer) series of models developed by OpenAI. Currently, the most advanced version of the GPT series is GPT-3, which I am a part of. There has been no official announcement or confirmation regarding the development of a new version of GPT beyond GPT-3.
It doesn't seem to have image upload functionality yet either. Perhaps it is still rolling out?
Mnemonics to learn basic words:
Hola:"Hola, cola" - sounds like "cola" (Coke)
Adiós: "Adiós, my toes" - rhymes with "toes"
Gracias: "Gracias, grassy ass" - sounds like "grassy ass"
Mañana: "Mañana, banana" - rhymes with "banana"
Uno: "Uno, you know" - sounds like "you know"
Tres: "Tres, dress" - rhymes with "dress"
Conjugation of the verbs:
Ser (To be)
• Yo soy
• Tú eres
• Él/Ella/Usted es
• Nosotros/Nosotras somos
• Ellos/Ellas/Ustedes son