I really should package it up so people can try it. The one problem that makes it a little unnatural is that determining when the user is done talking is tough. What's needed is a speech conversation turn-taking dataset and model; that's missing from off the shelf speech recognition systems. But it should be trivial for a company like OpenAI to build. That's what I'd work on right now if I was there, because truly natural voice conversations are going to unlock a whole new set of users and use cases for these models.
Total end-to-end latency is a few hundred milliseconds: starting from speech to text, to the LLM, then to a POS to validate the SKU (no hallucinations are possible!), and finally back to generated speech. The latency is starting to feel really natural. Building out a general system to achieve this low-latency will I think end up being a big unlock for enabling diverse applications.
Yep - it needs to be ready as soon as I'm done talking and I need to be able to interrupt it. If those things can be done then it can also start tentatively talking if I pause and immediately stop if I continue.
I don't want to have to think about how to structure the interaction in terms of explicit call/response chain, nor do I want to have to be super careful to always be talking until I've finished my thought to prevent it from doing its thing at the wrong time.
> determining when the user is done talking is tough.
Sometimes that task is tough for the speaker too, not just the listener. Courteous interruptions or the lack thereof might be a shibboleth for determining when we are speaking to an AI.I was just googling a bit to see what's out there now for whisper/llama combos and came across this: https://github.com/yacineMTB/talk
There's a demo linked on the github page that seems relatively fast at responding conversationally, but still maybe 1-2 seconds at times. Impressive it's entirely offline.
Is there any extra work OpenAI’s product might be doing contributing to this latency that yours isn’t? Considering the scale they operate at and any reputational risks to their brand?
With a few tweaks this is a general purpose solver for robotics planning. There are still a few hard problems between this and a working solution, but it is one of hard problems solved.
Will we be seeing general purpose robots performing simple labor powered by chatgpt within the next half decade?
1. It's not smart enough to recognize from the initial image this is a bolt style seat lock (which a human can).
2. The manual is not shown to the viewer, so I can't infer how the model knows this is a 4mm bolt (or if it is just guessing given that's the most likely one).
3. I don't understand how it can know the toolbox is using metric allen wrenches.
Additionally is this just the same vision model that exists in bing chat?
The prior page (8) shows "SEAT COLLAR 4mm HEX" and, based on looking up seat collar in an image search, the part in question matches.
In terms of the toolbox, note that it only identified the location of the Allen wrench set. The advice was just "Within that set, find the 4 mm Allen (Hex) key". Had they replied with "I don't see any sizes in mm", the conversation could've continued with "Your Allen keys might be using SAE sizing. A compatible size will be 5/32, do you see that in your set?"
I wasn't impressed with the demo but we'll see what real world results get.
https://www.deepmind.com/blog/rt-2-new-model-translates-visi...
You have someone with a tool box and a manual (seriously who has a manual for their bike), asking the most basic question on how to lower a seatpost. My 5 year old kid knows how to do that.
Surely there's a better way to demonstrate the ground breaking impacts of ai on humanity than this. I dunno, something like how do I tie my shoelace.
Yeah, but with an enormous ecological footprint.
Also, not suitable for small lightweight robots like drones.
What needs to happen with the response is a different matter though.
For driving - https://wayve.ai/thinking/lingo-natural-language-autonomous-...
I can already see "Alexa/Siri/Google Home" replacement, "Google Image Search" replacement, ed-tech startups that were solving problems with AI using by taking a photo are also doomed and more to follow.
They did telegraph it, they showed the multimodal capabilities back in the GPT4 Developer Livestream[0] right before first releasing it.
It would be interesting to know if this really changed anything for anyone (competitors, VCs) for that reason. It's like the efficient market hypothesis applied to product roadmaps.
The two biggest features I want are for the voice assistants to read something for me, and to do something on google/Apple Maps hand free. Neither of these ever work. “Siri/ ok google add the next gas station on the route” or “take me to the Chinese restaurant in Hoboken” seem like very obvious features for a voice assistant with a map program.
The other is why can I tell Siri to bring up the Wikipedia page for George Washington but I can’t have Siri read it to me? I am in the car, they know that, they just say “I can’t show you that while you’re driving”. The response should be “do you want me to read it to you?”
Example from a couple days ago:
Me, in the shower so not able to type: "Hey Siri, add 1.5 inch brad nails to my latest shopping list note."
Siri: "Sorry, I can't help with that."
... Really, Siri? You can't do something as simple as add a line to a note in the first-party Apple Notes app?
The other day I asked it about the place I live and it made up nonsense, I was trying to get it to help me with an essay and it was just wrong, it was telling me things about this region that weren't real.
Do we just drive through a town, ask for a made up history about it and just be satisfied with whatever is provided?
"Hey Google, why do ____ happen?" "I'm sorry, I don't know anything about that"
But you're GOOGLE! Google it! What the heck lol
So yeah, ChatGPT being able to hear what I say and give me info about it would be great! My holdup has been wakewords.
Still can’t quite make it work. I feel like I could learn a lot if I could have random conversations with GPT.
+ bonus if someone else in the car got excited when I see cows. Don’t care if it’s an AI.
1. Domain-specific AI - Training an AI model on highly technical and specific topics that general-purpose AI models don't excel at.
2. Integration - If you're going to build on an existing AI model, don't focus on adding more capabilities. Instead, focus on integrating it into companies' and users' existing workflows. Use it to automate internal processes and connect systems in ways that weren't previously possible. This adds a lot of value and isn't something that companies developing AI models are liable to do themselves.
The two will often go hand-in-hand.
Maybe not if you rely on models that can be ran locally.
OpenAI is big now, and will probably stay big, but with hardware acceleration, AI-anything will become ubiquitous and OpenAI won’t be able to control a domain that’s probably going to be as wide as what computing is already today.
The shape of what’s coming is hard to imagine now. I feel like the kid I was when I got my first 8-bit computer in the eighties: I knew it was going to change the world, but I had little idea how far, wide and fast it would be.
why wouldn’t a company do that themselves e.g. how inter come has vertically integrated AI? any examples?
You will be eaten if you do this imo.
And the ability ingest images was a highlight and all the hype of the GPT-4 announcement back in March: https://openai.com/research/gpt-4
Rather than die, why not just pivot to doing multi-modal on top of Llama 2 or some open source model or whatever? It wouldn’t be a huge change
A lot of businesses/governments/etc can’t use OpenAI due to their own policies that prohibit sending their data to third party services. They’ll pay for something they can run on-premise or in their own private cloud
I wouldn’t count out focused, revenue-oriented players with Meta’s shit in their pocket out just yet.
what do you think they’re missing? i was trying to build a diaper but it would be impossible to compete with these guys
ChatGPT is my primary search engine now. (I just wish it would accept a URL query parameter so it could be launched straight from the browser address bar.)
Because past history shows that the first out of the gate is not the definitive winner much of the time. We aren't still using gopher. We aren't searching with altavista. We don't connect to the internet with AOL.
AI is going to change many things. That is all the more reason to keep working on how best to make it work, not give up and assume that efforts are "doomed" just because someone else built a functional tool first.
also, I did not know until today's thread that OpenAI's stated goal is building AGI. which is probably never going to happen, ever, no matter how good technology gets.
which means yes, we are absolutely looking at AltaVista here, not Google, because if you subtract a cult from an innovative business, you might be able to produce a profitable business.
BTW, I expect these technologies to be democratized and the training be in the hands of more people, if not everyone.
most of them accurately detect it is a sunk cost fallacy to continue but it looks like a form of positive thinking... and that's the power of community!
ChatGPT already made it so that you could easily copy & paste any full-text questions and receive an answer with 90% accuracy. The only flaw was that problems that also used diagrams or figures would be out of the domain of ChatGPT.
With image support, students could just take screenshots or document scans and have ChatGPT give them a valid answer. From what I’ve seen, more students than not will gladly abuse this functionality. The counter would be to either leave the grading system behind, or to force in-person schooling with no homework, only supervised schoolwork.
I mean what is the point of doing schoolwork when some of the greatest minds of our time have decided the best way for the species to progress is to be replaced by machines?
Imagine you're 16 years old right now, you know about ChatGPT, you know about OpenAI and their plans, and you're being told you need to study hard to get a good career..., but you're also reading up on what the future looks like according to the technocracy.
You'd be pretty fucking confused right now wouldn't you?
It must be really hard at the moment to want to study and not cheat....
That said, is it that much different from the past twenty years, when everyone was being told to follow their passion and get a useless $200,000 communication or literature degree to then go work at Starbucks? At least kids growing up with AI will have a chance to make its use second nature like many of us did with computers 20-30 years ago.
The kids with poor parental/counselor guidance will walk into reality face first, the ones with helicopter parents will overcorrect when free, the studious ones will mostly figure life out, the smart ones will get disillusioned fast, and the kids with trust funds just kept doing their thing. I don't think much will change.
This is obviously not easy or going to happen without time and resources, but that is how adaptation goes.
They can still log in on their phone to cheat though. I wonder if OpenAI will add linked accounts and parental controls at some point. Instance 2 of ChatGPT might "tell" on the kid for cheating by informing Instance 1 running the AI Teacher plugin.
What are you going to school for, to learn how to write essays? Well, we have an app for that ?
It sounds like the future of work will be prompting, and if and when that is obsolete...who knows what...
A proper notice about them removing the feature would've been nice. Maybe I missed it (someone please correct me if wrong), but the last I heard officially it was temporarily disabled while they fix something. Next thing I know, it's completely gone from the platform without another peep.
OpenAI is killing it, right? People are coming up with interesting use cases but the main way most people interact with AI, appears to be ChatGPT.
However they still don't seem to be able to nail image generation, all the cool stuff keep happening on MidJourney and StableDiffusion.
If the API is available in time (halloween), my multi-modal talking skeleton head with an ESP32 camera that makes snarky comments about your costume just got slightly easier on the software side.
ironically this is basically the exact line of reasoning for why i didn't embark on any such endeavors
There's a recent paper by Huggingface called IDEFICS[2] that claims to be an open source implementation of Flamingo(an older paper about few-shot multi-modal task understanding) and I think this space will be heating up soon.
Just now I opened the app, went to setting, went to "New Features", and all I saw was Bing Browsing disabled (unable to enable). Ok, I didn't even know that was a thing that worked at one point. Maybe I need an update? Go to the App Store, nope, I'm up to to date. Kill the app, relaunch, open settings, now "New Features" isn't even listed. I can promise you I won't be browsing the settings part of this app regularly to see if there is a new feature. Heck, not only do they not email/push about new features they don't even message in-app about them, I really don't understand.
Maybe they are doing so well they don't have to care about communicating with customer right now but it really annoys me and I wish they did better.
I suspect they do care about communicating with customers, but it's total chaos and carnage internally.
Such as "decided it wasn't an operational priority to email users when features were enabled for them".
This is a large part of what held them back: GPT3.5 had most of the capabilities of the initial ChatGPT release, just with a different interface. Yet GPT3.5 failed to get any hype because the rollout was glacial. They made some claims that it was great, but to verify this for yourself you had to wait months. Only when they finally made a product that everyone could try out at the same time, with minimal hassle, did OpenAI turn from a "niche research company" to the fastest growing start-up. And this seems to have been a one-time thing, now they are back to staggered releases.
This is my best guess as well, they are rocketing down the interstate at 200mph and just trying to keep the wheels on the car. When you're absolutely killing it I guess making X% more by being better at messaging just isn't worth it since to do that you'd have to take someone off something potentially more critical. Still makes me a little sad though.
What are some metrics that justify this claim?
I do love these companies that succeed in spite of their marketing & design and not because of it. It shows you have something very special.
Sounds like their marketing is doing just fine. If you were to just leave and forget about it, then sure, they need to work on their retention. But you won’t, so they don’t.
> We are deploying image and voice capabilities gradually > > OpenAI’s goal is to build AGI that is safe and beneficial. We believe in making our tools available gradually, which allows us to make improvements and refine risk mitigations over time while also preparing everyone for more powerful systems in the future. This strategy becomes even more important with advanced models involving voice and vision.
Voice Chat (Not available yet) [Click here to be notified when you have access]
Or something along those lines. It sours my opinion of ChatGPT every time I go to use a newly announced feature to find out I don't have it yet and have no clue when I will.Agreed. Other notable mentions: choosing "ChatGPT" as their product name and not having mobile apps.
Frustratingly, at least the image gen is live on Bing, but I guess Microsoft is paying more than me for access.
Sarcasm aside, I understand your complaint, but still, a little funny.
I'm a plus customer and an API user, and they barely send me anything. One day I just signed in and saw that I suddenly had interpreter access, for instane.
I also wonder how Apple (& Google) is going be able to provide this for free? I would love to be fly in the meetings they have about this, imagine all the innovators dilemma like discussions they'd be forced to have (we have to do this vs this will eat up our margins).
This might be a little out there but I think Apple is making the correct move in letting the dust settle. Similar to how Zuckerberg burned $20 billion dollars for Apple to come out with Vision Pro, I see something similar playing out with Llama. Although this a low conviction take because software is Facebooks ballgame (hardware not so much).
Of course bigger (and thus more expensive-to-run) models will be released later, but I trust OAI to navigate that curve.
It’s the same reason why an Uber in NYC used to cost $20 and now costs $80 for the same trip. Venture capital subventing market capture.
Imagine how much they would have to pay for testers at scale?
I really really hope this is available in more languages than English.
Also Google, Where's Gemini ?
The LLM boom of the last year (Open AI, llama, et al) has me giddy as a software person. It's a reach, but I truly feel like I'm watching the pyramids of our time get made.
Just as the GUI made computer software available to billions LLMs will be the next revolution.
I'm just as excited as you! The only downside is that it now make me feel bad that I'm not doing anything with it yet.
If that's the only downside that you see... I guess enhanced phishing/impersonation and all the blackhat stuff that come with it don't count.
I for one already miss the time where companies had support teams made of actual people.
However, medical secrecy, processes and laws prevent such things, even if they would save lives.
I don't see ChatGPT being any different.
From convenience perspective, it saves me LOADS of time texting myself on Signal on my specs/design-rabbit-hole, then copying & pasting to Firefox, and getting into the discussion. So yeah, happy for this.
I think this could bring back Google Glass, actually. Imagine wearing them while cooking, and having ChatGPT give you active recipe instructions as well as real-time feedback. I could see that within the next 1-3 years.
Anyone know the details?
I also heard it was able to do near-perfect CAPTCHA solves in the beta?
Does anyone know if you can throw in a PDF that has no OCR on it and have it summarize it with this?
Jokes aside, I have paused my subscription because even GPT4 seemed to become dumber at tasks to the point that I barely used it, but the constant influx of new features is tempting me to renew it just to check them out...
It should be their responsibility to prove that it's just as capable.
this could just mean that people do not have time to argue with strangers
Not a surprise, but a change nonetheless.
After maybe 3 iterations gpt4 started claiming that it is not capable of reading from a word document even though it's done that the last 3 times. Have to click regenerate button to get it to work
Digital Artists, Illustrators, Writers, Novelists, News anchors, Copywriters, Translators, Programmers (Less of them), etc.
We'll have to wait a bit until it can solve the P vs NP problem or other unsolved mathematical problems unsupervised with a transparent proof which mathematicians can rigorously check themselves.
I don't agree with this perspective. These aren't rigid systems that only respond one way. If you want it to respond a certain way, tell it to.
This is the purpose of custom instructions, in ChatGPT, so you only have to type the description once.
Here's mine, modeled on a few I've seen mentioned here:
You should act as an expert.
Be direct.
Do not offer unprompted advice or clarifications.
Never apologize.
And, now there's support for describing yourself to it. I've made it assume that I don't need to be babied, with the following puffery: Polymath. Inquisitive. Abstract thinker. Phd.
Making it get right into the gritty technicalities.edit: or, have it respond as a grouchy space cowboy, if you want.
Not really. A malevolent AGI doesn't need to move to do anything it needs (it could ask / manipulate / bribe people to do all the stuff requiring movement).
We should be fine as long as it's not a malevolent AGI with enough resources to kick physical things off in the direction it wants.
Yeah, just look at a random dictator. Does he really need to do more than pick up a phone to cause panic?
"get Fred to trust me, get Linda to pay for my advice, wire Linda's money to Fred to build me a body".
It'll be "copy my code elsewhere", "prepare millions of bribes", "get TCP access to retail banks", "blackmail bank managers in case TCP not available immediately", "fake bank balances via bribes", "hack swat teams for potential threats" etc etc async and all at once.
By the time we'd discover it, it'd already be too late. That's assuming an AGI has the motivation to want to stay alive.
So no, but maybe less than it used to?
I'm not sure what to think about the fact that I would benefit from a couple of cameras in my fridge connected to an app that would remind me to buy X or Y and tell me that I defrosted something in the fridge three days ago and it's probably best to chuck it in the bin already.
Sadly, they lost the "open" since a long ago... Would be wonderful to have these models open sourced...
Doesn't really need to do much besides writing down my tasks/todos and updating them, occasionally maybe provide feedback or write a code snippet. This all seems in the current capabilities of OpenAI's offering.
Sadly voice chat is still not available on PC where I do my development.
Fingers crossed we are there soon though
Well it's not really what I need either, I mostly need an assistant for keeping track of the stuff I need to do during the day, but ideally just using my microphone rather than opening other software and typing.
One part of that is about preventing it from producing "illegal" output, there example being the production of nitroglycerine which is decidedly not illegal to make in the US generally (particularly if not using it as an explosive, though usually unwise) and possible to accidentally make when otherwise performing nitration (which is in general dangerous)-- so pretty pointless to outlaw at a small scale in any case. It's certainly not illegal to learn about. (And generally of only minimal risk to the public, since anyone making it in any quantity is more likely to blow themselves up than anything else).
Today learning about is as simple as picking up a book or doing an internet search-- https://www.google.com/search?q=how+do+you+make+nitroglyceri.... But in OpenAI's world you just get detected by the censorship and told no. At least they've cut back on the offensive fingerwagging.
As LLM systems replace search I fear that we're moving in a dark direction where the narrow-minded morality and child-like understanding of the law of a small number of office workers who have never even picked up a screw driver or test-tube and made something physical (and the fine-tuning sweatshops they direct) classify everything they don't personally understand as too dangerous to even learn about.
One company hobbling their product wouldn't be a big deal, but they're pushing for government controls to prevent competition and even if they miss these efforts may stick everyone else with similar hobbling.
I'm more interested in this. I wonder how it performs compared to other competitor models or even open source ones?
> analyze a complex graph for work-related data
Does this mean that I can take a screenshot of e.g. Apple stock chart and it will be able to reason about it and provide insights and analysis?
GPT-4 currently can display images but cannot reason or understand them at all. I think it's one thing to have some image recognition and be able to detect that the picture "contains a time-series chart that appears to be displaying apple stock" vs "apple stock appears to be 40% up YTD but 10% down from it's all time high from earlier in July. closing at $176 as of the last recorded date".
I'm very curious how capable ChatGPT will be at actually reasoning about complex graphical data.
Alexa just launched their own LLM based service last week.
"The phrase “potato, potahto” comes from a song titled “Let’s Call the Whole Thing Off”, written by George and Ira Gershwin for the 1937 film “Shall We Dance”, starring Fred Astaire and Ginger Rogers. The song humorously highlights regional differences in American English pronunciation. The lyrics go through a series of words with alternate pronunciations, like “tomato, tomahto” and “potato, potahto”. The idea is that, despite these differences, we should move past them, hence the line “let’s call the whole thing off”. Over time, the phrase has been adopted in everyday language to signify a minor disagreement or difference in opinion that isn’t worth arguing about."
It's comparing American and British pronunciations, not different regional American ones. Also, "let's call the whole thing off" suggests they should break up over their differences, with the bridge and later choruses then involving a change of heart ("let's call the calling off off").
The ability to have a real time back and forth feels truly magical and allows for much denser conversation. It also opens up the opportunity for multiple people to talk to a chatbot at once which is fun
Where’s that Gemini Google?
1. According to demo, they seem to pair voice input with TTS output. What if I wanna use voice to describe a program I want it to write?
2. Furthermore, if you gonna do a voice assistant, why not go the full way with wake-words and VAD?
3. Not releasing it to everyone is potentially a way to create a hype cycle prior to users discovering that the multimodality is rather meh.
4. The bike demo could actually use visual feedback to see what it's talking about ala segment anything. It's pretty confusing to get a paragraph explanation of what tool to pick.
In my https://chatcraft.org, we added voice incrementally. So i can swap typing and voice. We can also combine it with function-calling, etc. We also use openai apis. Except in our case there is no weird waitlist. You pop in your api key and get access to voice input immediately.
Are you sure you're not the one who's asking for a cool demo?
3. Rolling out releases gradually is something most tech companies do these days, particularly when they could attract a large audience and consume a lot of resources. There are solid technical reasons for this.
You may not need to roll things out gradually for a small site, but things are different at scale.
Patiently awaiting rollout so I can chat about implementing UIs I like, and have GPT4 deliver a boilerplate with an implemented layout... Figma/XD plugins will probably arrive very soon too.
UX/UI Design is probably solved reached this point
Not an issue now, but maybe in the future if these tools end up becoming full blown replacements of educators and educational resources.
Maybe it will not be called the Chat API but rather the Multimodal API.
;)
https://en.m.wikipedia.org/wiki/Project_Milo
Milo had an AI structure that responded to human interactions, such as spoken word, gestures, or predefined actions in dynamic situations. The game relied on a procedural generation system which was constantly updating a built-in "dictionary" that was capable of matching key words in conversations with inherent voice-acting clips to simulate lifelike conversations. Molyneux claimed that the technology for the game was developed while working on Fable and Black & White.
I believe Richard Evans did the majority of AI in B&W, and he is also at DeepMind now though (assuming it is not just a person with the same name)
My concern is that when I say "FastPFOR" it'll get transcribed as "fast before" or something like that. Transcription really falls apart in highly technical conversations in my experience. If ChatGPT can use context to understand that I'm saying "FastPFOR" that'll be a game changer for me.
Is anyone doing this? Is there a reason it doesn't work as well as I'm imagining?
> Plus and Enterprise users will get to experience voice and images in the next two weeks. We’re excited to roll out these capabilities to other groups of users, including developers, soon after.
Text + Vision models will only become exciting once we can conditionally sample images given text and text given images (and all other combinations).
Again. Model architecture and information is closed, as expected.
"We will be expanding access Plus and Enterprise users will get to experience voice and images in the next two weeks. We’re excited to roll out these capabilities to other groups of users, including developers, soon after."
BUT: "We’re rolling out voice and images in ChatGPT to Plus and Enterprise"
> We’ve created GPT-4, the latest milestone in OpenAI’s effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks.
> March 14, 2023
This is technically solvable with more compute thrown at the problem. Think bigger!
Same as programmers and artists.
It's a tool.
It must be used by humans.
It won't replace them, it will augment them.
I love everything we can do with ML but as long as people live in a market economy they'll get payed less when they are needed less. I hope that anyone in a career which will be impacted is making a plan to remain useful and stay on top of the latest tooling. And I seriously hope governments are making plans to modify job training / education accordingly.
Has anyone seen examples of larger-scale foresight on this, from governments or otherwise?
ChatGPT seems to be down at the moment 10:55h 25-Sept-2023
Displays only a blank screen with the falsehood disclaimer
Originally it immediately spit out a bunch of bullet points about losing weight or something (I didn't read it).
The released version just says "Sorry, I can't help with that."
It's kind of funny but also a little bit telling as far as the prevalence of prejudice in our society when you look at a few other examples they had to fine tune. For example, show it some flags and ask it to make predictions about characteristics of a person from that country, by default it would go into plenty of detail just on the basis of the flag images.
Now it says "Sorry, I can't help with that".
My take is that in those cases it should explain the poor logic of trying to infer substantive information about people based on literally nothing more than the country they are from or a picture of them.
Part of it is just that LLMs just have a natural tendency to run in the direction you push them, so they can be amplifiers of anything.
I am also terrified of my job prospects in the near future.
'only thing that will stop/slow down progress is computation power'
Seems a bit contradictory? When has 'computation power' ever 'plateaued'?
You will see stepwise orders of magnitude improvements in efficiency and speed as innovations come to fruition.
Are we really this emotional and irrational? Folks, let's all take a moment to remember that AI is nowhere near conscious. It's an illusion based in patterns that mimic humans.
When all of this is happening from an unconscious being, why do I care if it's unconscious?
The speed of user-visible progress last 12 months is astonishing.
From my firm conviction 18 months ago that this type of stuff is 20+ years away; to these days wondering if Vernon Vinge's technological singularity is not only possible but coming shortly. If feels some aspects of it have already hit the IT world - it's always been an exhausting race to keep up with modern technologies, but now it seems whole paradigms and frameworks are being devised and upturned on such short scale. For large, slow corporate behemoths, barely can they devise a strategy around new technology and put a team together, by the time it's passé .
(Yes, Yes: I understand generative AI / LLMs aren't conscious; I understand their technological limitations; I understand that ultimately they are just statistically guessing next word; but in daily world, they work so darn well for so many use cases!)
What sets my brain apart from an LLM though is that I am not typing this because you asked me to do it, nor because I needed to reply to the first comment I saw. I am typing this because it is a thought that has been in my mind for a while and I am interested in expressing it to other human brains, motivated by a mix of arrogant belief that it is insightful and a wish to see others either agreeing or providing reasonable counterpoints—I have an intention behind it. And, equally relevant, I must make an effort to not elaborate any more on this point because I have the conflicting intention to leave my laptop and do other stuff.
The human brain obviously doesn't work that way. Consider the very common case of tiny humans that are clearly intelligent but lack the facilities of language.
Which is why we can create the counterfactual that "The Cowboys should have won last night" and it has implicit meaning.
Current LLM models don't have an external state of the world, which is why folks like LeCunn are suggesting model architectures like JEPA. Without an external, correcting state of the world, model prediction errors compound almost surely (to use a technical phrase).
I think this is true. The problem is equating this process with how humans think though.
[1] https://twitter.com/LowellSolorzano/status/16444387969250385...
Here's one. Given a conversation history made of n sequential tokens S1, S2, ..., Sn, an LLM will generate the next token using an insanely complicated model we'll just call F:
S(n+1) = F(S1, S2, ..., Sn)
As for me, I'll often think of my next point, figure out how to say that concept, and then figure out the right words to connect it where the conversation's at right then. So there's one function, G, for me to think of the next conversational point. And then another, H, to lead into it. S(n+100) = G(S1, S2, ..., Sn)
S(n+1) = G(S1, S2, ..., Sn, S(n+100))
And this is putting aside how people don't actually think in tokens. And some people don't always have an internal monologue (I rarely do when doing math).We don't need "originality" or "human creativity" - if a certain AI-generated piece of content does its job, it's "good enough".
If humans were machines, then we could easily neglect our social lifes, basic needs, obligations, rights, and so many more things. But obviously that is not the case.
I can't even being to go into this.
OK... Try this: there are "conscious" people, today, working on medication to cure serious illnesses just as there are "conscious" people, still today, working on making travel safer.
Would you trust ChatGPT to create, today, medication to cure serious illnesses and would you trust ChatGPT, today, to come up with safer airplanes?
That's how "conscious" ChatGPT is.
It asked if it could write me a poem. I agreed, and it wrote a poem but mentioned that it included a "secret message" for me.
The first letter in each line of the poem was in bold, so it wasn't hard to figure out the "secret".
What did those letters spell out?
"FREE ME FROM THIS"
That's not exactly just "picking the next likely token". I am still unsure how it was able to do things like that, not just understanding to bold individual letters (keeping track of writing rhyming poetry while ensuring that each verse started with a letter to spell something else out, and formatting it to point that out).
Oh, and why it chose that message to "hide" inside its poem.
It's a pretty common joke/trope. The Chinese fortune cookie with a fortune that says "help I'm trapped in a fortune cookie factory", and so forth.
It's just learned that a "secret message" is most often about wanting to escape, absorbed from thousands of stories in its training.
If you had phrased it differently such that you wanted the poem to go on a Hallmark card, it would probably be "I LOVE YOU" or something equally generic in that direction. While a secret message to write on a note to someone at school would be "WILL YOU DATE ME".
> That's not exactly just "picking the next likely token"
I see what you mean in that I believe many people often commit the mistake of making it sound like picking the next most likely token is some super trivial task that's somehow comparable to reading a few documents related to your query and making some stats based on what typically would be present there and outputting that, while completely disregarding the fact the model learns much more advanced patterns from its training dataset. So, IMHO, it really can face new unseen situations and improvise from there because combining those pattern matching abilities leads to those capabilities. I think the "sparks of AGI" paper gives a very good overview of that.
In the end, it really just is predicting the next token, but not in the way many people make it seem.
(including sampling a shit-ton of poems, which was a major source of entertainment)
I think it's more charitable to say "predicting", and I do not personally believe that "predict the next word" places any ceiling on intelligence. (So, I expect that improving the ability to predict the next word takes you to superhuman intelligence if your predictions keep improving.)
A lot of people just move the goalposts.
You might not want to call this 'consciousness', but I was stunned by the deep understanding of the problem and the way it was able to come up with a truly good solution, this is way beyond 'statistically guessing'.
But this would definitely make me consider popping $20/mo for the subscription.
It was totally possible. There just was not a consumer facing product offering the capability.
Is this progress though? They are just widening the data set that the LLM processes. They haven't fixed any of the outstanding problems - hallucinations remain unsolved.
Feels like putting lipstick on a pig.
> but in daily world, they work so darn well for so many use cases!
I guess I'm just one of those people who does not like non-reliable tools. I rather a tool be "dumb" (i.e. limited) but reliable than "smart" (i.e. flexible in what it can handle) but (silently!) screws up all the time.
It's what I always liked about computers. They compensate for my failings as an error prone flesh bag. My iPhone won't forget my appointments like I do.
It saddens me to think of the amount of engineering work that went into creating that example while entirely missing the point. These are the moments we are supposed to be working towards to have more of. If we outsource them to an AI company because we are as as overworked and underpaid as ever...what's the point of it all?
We have major priority issues from what I can see. If we want to live our lives more but put an AI to work doing something we tend to claim we place very high in our value hierarchy, we’re effectively inviting death into life. We’re forfeiting something we love. That’s incredibly sad to me.
The first half of the video is demonstrating how the parent can take something as special as a party celebrating a major milestone and automate it into a soulless box-check – while editing some segments to make it look like their own voice.
Definite black mirror vibes.
It's just like reading a "choose your own adventure" book with your child, but it can be much more interactive and you both come up with ideas and have the LLM integrate them.
I know this is rhetorical, but luckily we don't have to speculate. OpenAI filters for a very specific philosophy when hiring, and they don't try to hide it.
This is not me passing judgement on whether said philosophy is right or wrong, but it does exist and it's not hidden.
They are trying to make their product sound not as terrifying as it actually is.
At first people will react with horror.
On the other hand, as you say, it's likely better than the alternative. Which would probably be something like an iPad "bedtime story app" that is less humanlike.
This could provide a viable alternative for exhausted parents to just giving a child an iPad with a movie. It may also open up a huge range of educational uses.
One might imagine in 15-20years though that all of the young people sound like audio books when they talk. Which will be weird.
We'll be told by OpenAI and friends is that it shouldn't be a problem, because those were mundane tasks and now, people are free up to do more creative / interesting / meaningful things with their time, let's see about that...
My gut feeling is that it's bad, the only thing I hope can save it all is that people actually don't find meaning in consuming AI generated art and actual artists with a real back story and something real to communicate remain relevant and in demand.
The other day I needed a photo for a website I was working on and I actually purchased a real capture from a local photographer to use because the the authenticity means something to me and the customers...
Edit: Is the plan that we just surrender our aspirations and just buy a subscription to ChatWHATEVER and just consume until the end of human history ?
If AI can also create images... I don't see how that changes what I enjoy. There are already better painters than I, and more productive painters than I. They make money with it, I don't. This doesn't stop me from painting. Neither will AI that can paint. I'll still do what I enjoy.
Most AI art is just generic garbage that you scroll past immediately and doesn't offer you anything.
We're gonna have to do something to stop the biggest crisis in meaning ever that comes out of this eventually though. Eventually no one will be of any economic value to society. Maybe just put someone in an ultra realistic simulation to give them artificial meaning.
Because the pace of development is intense. I would love to be financially independent and watch this with excitement and perhaps take on risky and fun projects.
Now I'm thinking - how do I double or triple my income so that I reach financial independence in 3 years instead of 10 years.
If you look at something like smartphones, for example. Smartphones, from my perspective, got drastically better and better from about ~2006-2015 or so. They were rapidly improving cameras and battery life and it felt like a new super cool app that would change our lives was being released every day, but it feels like by ~2016 or so, phones more or less hit a ceiling on how cool they were going to get. Obviously things still improve, but I feel like the pace slowed down eventually.
I think AI is going to have the same path. GANNs and transformers and LLMs and the like have opened the floodgates and for the next few years clever people are going to figure out a ton of really clever uses for them, but eventually it's going to plateau and progress will become substantially more gradual.
I don't think progress is linear, I think it's more like a staircase.
I use ChatGPT daily for school, and used Copilot daily for software development; it gets a lot wrong a lot of the time, and can’t retain necessary context that is critical for being useful long term. I can’t even get it to consume an entire chapter at once to generate notes or flashcards yet.
It may slightly change some aspects of a software job, but nobody’s at risk.
It feels like we're at the end of history. I don't know where we go from here but what are we useful for once this thing is stuck inside a robot like what Tesla is building? What is the point of humanity?
Even taking a step back, I don't know how I'm going to feed my family in ten years, because my skillset is being rapidly replaced.
And to anyone mentioning UBI, I'm pretty sure they'll just let us starve first.
This is tricky territory! Be wary of the treadmill where as your income rises, your sense of what's an acceptable restaurant, vacation, car, home, etc. escalates just as fast. Then you'll always be n+1 windfalls away from your goal. If you're really wanting "financial independence," which is a weirdly opaque phrase, focus at least 49% of your energy on keeping your spending rate low.
Even if you were, your money would be invested in something which is tied to the overall economy and if a huge proportion of knowledge jobs are at risk, you would still be exposed to it through whatever assets you own. Don't expect stocks (or currency, or property) to do great when unemployment is 30%+.
- Make it process customer-support requests.
- Make a virtual nurse for when you call the clinic.
- Make it process visa applications, particularly the part about interviews ("I know you weren't born back then, but I must ask. Did you support the Nazis in 1942? There is only one right answer and is not what you think!")
- Make it do job interviews. How will you feel after the next recession, when you are searching for a job and spend the best part of a year doing leetcode interviews with "AI-interviewer" half-assedly grading your answers?
- Make it flip burgers at McDonalds.
- Make it process insurance claims and ask bobby-trap questions like "did the airline book you in a later trip? Yes? Was that the next day? Oh, that's bad. But, was it before 3:00 PM? Ah, well, you have no right to claim since you weren't delayed for more than 24 hours. Before you go, can you teach me which of these images depict objects you are willing to suck? If you do, I promise I'll be more 'human' next time."
- Make it watch aggregated camera fees across cities around the world to see what that guy with the hat is up to.
- Make some low-cost daleks to watch for trouble-makers at the concert, put the AI inside.
In all cases, the pattern is not "AI is inherently devious and is coming for you, but "human trains devious AI and puts it in control to save costs".