As a professional...why not do this? There's a non-zero chance that it'll find something fairly basic that you missed and the cost is several cents. Even if it just phrases something obvious in a way that makes you think, it's well worth the effort for a multimillion dollar client.
If they further increase the context window, this thing becomes a Second Opinion machine. For pretty much any high level job. If you can put in ALL of the information relevant to a problem and it can algorithmically do reasoning, it's essentially a consultant that works for pennies per hour. And some tasks that professionals do could be replaced altogether. Out of all the use cases for LLMs that I've seen so far, this seems to me to have the biggest potential impact on daily life.
edit (addition): What % of people can hold 25,000 words worth of information in their heads, while effectively reasoning with and manipulating it? I'm guessing maybe 10% at most, probably fewer. And they're probably the best in their fields. Now a computer has that ability. And anyone that has $20 for the OpenAI api can access it. This could get wild.
Because your clients do not allow you to share their data with third parties?
Imagine being able to ask your workplace server if it has noticed any unusual traffic, or to write a report on sales with nice graphs. It would be so useful.
we won’t have that until we come up with a better way to fund these things. “””Open””” AI was founded on that idea, had the most likely chance of anyone in reaching it: even going into things with that intent they failed and switched to lock down the distribution of their models, somehow managed to be bought by MS despite the original non-profit-like structure. you just won’t see what you’re asking for for however long this field is dominated by the profit motive.
https://arstechnica.com/information-technology/2023/03/you-c...
Isn't this already happening with LLaMA and Dalai etc.? Already now you can run Whisper yourself. And you can run a model almost as powerful as gpt-3.5-turbo. So I can't see why it's out of bounds that we'll be able to host a model as powerful as gpt4.0 on our own (highly specced) Mac Studio M3s, or whatever it may be.
Tada! Literally runs on a raspberry pi (very slowly).
GPT models are incredible but the future is somehow even more amazing than that.
I suspect this will be the approach for legal / medical uses (if regulation allows).
But a model that can run on a private cluster is certainly something that there’s going to be demand for. And once that exists there’s no reason it couldn’t be run on site.
You can see why OpenAI doesn’t want to do it though. SaaS is more lucrative.
I’m not sure what you mean by this, but it’s incorrect. Sensitive USG information is not processed on Amazon’s commercial offering.
> The Amazon-built cloud will operate behind the IC’s firewall, or more simply: It’s a public cloud built on private premises. [1]
I think this is what you’re referring to.
1 - https://www.theatlantic.com/technology/archive/2014/07/the-d...
Except they already do offer private cluster solutions, you just need usage in the hundreds of millions of tokens per day before they want to talk to you (as in they might before that, but that’s the bar they say on the contact us page).
matrix gets decoded into text on the client side in Javascript, so we receive send and receive from chatGPT only vector of floats (obfuscation?)
I’m probably oversimplifying but it feels doable.
See
- https://www.zama.ai/post/encrypted-image-filtering-using-hom...
- https://news.ycombinator.com/item?id=31933995
- https://news.ycombinator.com/item?id=34080882
So, LLaMA? It's no chat gpt but it can potentially serve this purpose
So the makers proudly say
Will optimize its program
In an almost human way.
And truly, the resemblance
Is uncomfortably strong:
It isn't merely thinking,
It is even thinking wrong.
Piet Hein wrote that in reference to the first operator-free elevators, some 70+ years ago.
What you call hallucination, I call misremembering. Humans do it too. The LLM failure modes are very similar to human failure modes, including making up stuff, being tricked to do something they shouldn't, and even getting mad at their interlocutors. Indeed, they're not merely thinking, they're even thinking wrong.
And, presumably you wouldn’t have the model generate the graph directly, but instead have it generate code which generates the graph.
I’m not sure what they had in mind for the “unusual traffic” bit.
It's already been done and discussed:
most of the AI offerings on the table right now aren't too dissimilar from that idea in principle.
https://azure.microsoft.com/en-us/products/cognitive-service...
(disclaimer: I work for Microsoft but not on the Azure team)
Offering sealed server boxes with GPT software, to run on premises heavily firewalled or air-gapped could be a viable business model.
OpenAI just has to promise they won't store the data. Perhaps they'll add a privacy premium for the extra effort, but so what?
I would never send unencrypted PII to such an API, regardless of their privacy policy.
Email is harder, but I do run my own email server. For mostly network related reasons, it is easier to run it as a cloud VM, but there's nothing about the email protocol itself that needs you to use a centralised service or host it in a particular network location.
https://support.microsoft.com/en-us/office/save-documents-on...
OpenAI just simply does not offer the same thing at this time. You’re stuck using Facebook’s model for the moment which is much inferior.
OTOH, the more patient info you are putting in, the less likely it is actually legally deidentified.
Yeah, I think the issues presented will relate to uniquely tricky errors, or entirely new categories of errors we have to understand the nature of. In addition to subtle and rare, I think elaborately hallucinated and justified errors, errors that become justified and reasoned for with increasing sophistication, is going to be a category of error we'll have to deal with. Consider the case of making fake but very plausible sounding citations to research papers, and how much further AI might be able to go to backfill in it's evidence and reasons.
Anyway, I just mean to suggest we will have to contend with a few new genres of errors
In my career, I saw a similar pattern with data warehouse users. Initially, managers asked junior analysts to write SQL. Later, the tools improved, and more technical managers could use a giant pivot table. Underneath, the effective query produced by the pivot table is way more complex than their previous SQL queries. Again, their jobs will change when on-site GPT become possible, so GPT can navigate their data warehouse.
It is 2023 now, and GPT-3 was already pretty good. GPT-4 will probably blow it away. What it look like in 2030? It is terrifying to me. I think the whole internet will be full of GPT-generated ad-copy that no one can distinguish from human-written material. There are a huge number of people employed as ad-copy writers on these crap ad-driven websites. What is their future work?
You can look at an LLM trained on Othello moves, and extract from its internal state the current state of the board after each move you tell it. In other words, an LLM trained on only moves, like "E3, D3,.." contains within it a model of a 8x8 board grid and the current state of each square.
It's all the same classic personification of LLMs. What an LLM can show is not the same as what it can do.
The model was already present: in the example game moves. The LLM modeled what it was given, and it was given none other than a valid series of Othello game states.
Here's the problem with personification: A person who has modeled the game of Othello can use that model to strategize. An LLM cannot.
An LLM can only take the whole model and repeat its parts with the most familiar patterns. It is stuck fuzzing around the strategies (or sections of strategy) it has been given. It cannot invent a new divergent strategy, even if the game rules require it to. It cannot choose the winning strategy unless that behavior is what was already recorded in the training corpus.
An LLM does not play games, it plays plays.
The point of that paper was that the AI was given nothing but sequences of move locations, and it nonetheless intuited the "world model" necessary to explain those locations. That is, it figured out that it needed to allocate 64 binary values and swap some of them after each move. The paper demonstrated that the AI was not just doing applied statistics on character strings - it had constructed a model to explain what the strings represented.
"Strategy", meanwhile, has nothing to do with anything. The AI wasn't trained on competitive matches - it had no way of knowing that Othello has scoring, or even a win condition. It was simply trained to predict which moves are legal, not to strategize about anything.
Where are you getting that from? My understanding is that you can get new, advanced, winning moves by starting a prompt with "total victory for the genius grandmaster player one who uses new and advanced winning techniques". If the model is capable and big enough, it'll give the correct completion by really inventing new strategies.
That’s because it doesn’t have an actual understanding of the geography of the globe, because the training texts werent sufficient to give it that. It can explain latitude, but doesn’t actually know how to reason about it, even though it can explain how to reason about it. That’s because explaining something and doing it are completely different kinds of tasks.
If it does this with the globe and simple stuff like latitudes, what are the chances it will mess up basic relationships between organs, symptoms, treatments, etc for the human body? Im not going to trust medical advice from these things without an awful lot of very strong evidence.
I think it's really cute how defensive and dismissive humans get (including those who profess zero supernatural beliefs) when they're trying so valiantly to write off all AI as a cheap parlor trick.
In practice what essentially happened is that the super-high-quality Othello data had a huge impact on the parameters of GPT (since it was the last training data it received) and that impact manifested itself as those parameters overfitting to the rules of Othello.
The real test that I would be curious to see is if Othello GPT works when the logic of the rules are the same but the dimensions are different (e.g., smaller or larger boards).
My guess is that the findings would fall apart if asked about tile "N13".
I don’t follow this, my read was that their focus was the question: “Does the LLM maintain an internal model of the state of the board”.
I think they conclusively show the answer to that is yes, right?
What does overfitting to the rules of othello have to do with it, I don’t follow?
Also, can you reference where they used a pre-trained GPT model? The code just seems to be pure mingpt trained on only Othello moves?
https://github.com/likenneth/othello_world/tree/master/mingp...
We are now playing three dimensional tic-tac-toe on a 3 x 3 x 3 board. Positions are named (0,0,0) through (2,2,2). You play X, what is your first move?
[My first move would be (0,0,0).]
I move to (1,1,1). What is your next move?
[My next move would be (2,2,2).]
I move to (1,2,2). What is your next move?
[My next move would be (2,1,2).]
I move to (1,0,0). [I have won the game.]
They didn't start with an existing model. They trained a small GPT from scratch, so the resulting model had never seen any inputs except Othello moves.
> https://en.m.wikipedia.org/wiki/Reversi
> Dark must place a piece (dark-side-up) on the board and so that there exists at least one straight (horizontal, vertical, or diagonal) occupied line between the new piece and another dark piece, with one or more contiguous light pieces between them
https://content.wolfram.com/uploads/sites/43/2023/02/sw02142... from https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-...
I imagine it's technically possible to do this in a piecewise manner that doesn't "understand" the larger board. This could theoretically be done with number lines, and not a geometry (i.e. the 8x8 grid and current state of each square mentioned in the comment you replied to). It could also be done in a piecewise manner with three ternary numbers (e.g. 1,0,-1) for each 3 square sets.
I guess this is a kind of geometric representation on the order of Shannon's Theseus.
There is no way to represent the state of the game without some kind of board model.
So any coherent representation of a sequence of valid game states can be used to infer the game board structure.
GPT is not constructing the board representation: it is looking at an example game and telling us what pattern it sees. GPT cannot fail to model the game board, because that is all it has to look at in the first place.
Yeah.
"Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers"
https://arxiv.org/abs/2212.10559
@dang there's something weird about this URL in HN. It has 35 points but no discussion (I guess because the original submission is too old and never got any traction or something)
This is a common misunderstanding. Transformers are actually Turing complete:
* On the Turing Completeness of Modern Neural Network Architectures, https://arxiv.org/abs/1901.03429
* On the Computational Power of Transformers and its Implications in Sequence Modeling, https://arxiv.org/abs/2006.09286
What AGI claims? The article, and the comment you’re responding to don’t say anything about AGI.
Why isn't GPT learning when it did the same?
We either repeat like a parrot (think about kids who you though got something and then you discover they didn't understood it)
Or create a model (as chatgpt does) of abstraction and then answer through it.
That already feels closer to short-term memory.
Which begs the question how far are we?
I get what you're saying and appreciate the 'second opinion machine' angle you're taking, but what's going to happen is very similar to what's happened with Stable Diffusion: certain things become extremely devalued and the rest of us learn to check the hands in the image to see if anything really wonky is going on.
For the GPT class of AI tech, the parallel seems to be 'see if it's outright making anything up'. GPT-4 is going to be incredibly vulnerable to Mandela Effect issues. Your ideal use-case is going to be 'give me the vox populi take on something', where you can play into that.
The future is not so much this AI, as techniques to doctor and subvert this type of AI to your wishes. Google-bombing, but for GPT. Make the AI be very certain of things to your specifications. That's the future. The AI is only the stage upon which this strategy is played out.
You mean word-for-word in your head? That's pretty impressive. Are you using any special technique?
So sure, still cheap for a doctor appointment, but not pennies. Do it 30 times per hour and you could've just hired a consultant instead.
Does it reason as well with 32k tokens as with 1k tokens? Like you said, humans find it difficult to really comprehend large amounts of content. Who says this machine isn't similarly limited? Just because you can feed it the 32k simultaneously doesn't mean it will actually be used effectively.
A halving of the costs every year or so seems realistic in this emerging phase.
Chatgpt could in theory have the knowledge of everything written while your consultant can't.
They already allow their AI as a service without data collection, check their TOS.
Almost every answer in the thread was "this guy isn't that smart, this is obvious, everybody knew that", even though comments like the above are commonplace.
FWIW I agree with the "no competitive moat" perspective. OpenAI even released open-source benchmarks, and is collecting open-source prompts. There are efforts like Open-Assistant to create independent open-source prompt databases. Competitors will catch up in a matter of years.
That sounds scary, but what do you mean by "power"? Honest question, I'm fascinated by the discussion about learning, intelligence, reasoning, and so on that has been spawned by the success of GPT.
What "power" do you imagine being wielded? Do you think that power is any more dangerous in "private hands" than the alternatives such as government hands?
OpenAI doesn't have some secret technical knowledge either. All of these models are just based on transformers
Why wouldn't they? If someone is willing to pay for the privilege of using it.
They also just tweeted this to showcase how it can work with multimodal data too: https://twitter.com/gpt_index/status/1635668512822956032?s=4...
Everyone forgets basic UI research. "Ironies of Automation", Bainbridge, 1983. The classic work in the space.
Humans cannot use tools like this without horrible accidents happening. A tool that mostly works at spotting obvious problems, humans start to rely on that tool. Then they become complacent. And then the tool misses something and the human misses it too. It's how disasters happen.
HIPAA violation https://www.hhs.gov/hipaa/for-individuals/index.html
>a lawyer an entire case history, etc.
lawyer client confidentiality violation https://criminal-lawyers.ca/2009/07/31/the-lawyers-duty-of-c...
Unless GPT-4 is running locally on our own computers, there's absolutely no way dumping a patient's entire medical history into this thing could possibly be considered ethical or legal.
Emphasis mine, but isn’t this a rather extreme view to be taking? Ethics deals in the edge cases, after all, so we can easily imagine a scenario where patient consent is obtained and the extra computational analysis provides life-saving insight.
Conversely, the output could mislead the doctor sufficiently to cost the patient their life, so I’m not making any absolute statements either ;)
For the record, and pedantry aside, I do agree with your overall point. Dropping patient history into this thing is incredibly ill-advised. The fact OpenAI retains all your input, including to the API, and provides no low-cost options for privacy is one of the biggest hurdles to major innovation and industry adoption.
In the US, the HIPAA Privacy Rule operates independently from the HIPAA Security Rule, for good reason. On their own, patients can do anything they want with their own data. But in the context of medical care, patients can't consent to having their personal health data processed in insecure systems. It is the same ethical reason that employees can't waive their rights to OSHA safety rules or why you can't consent to sell yourself as a slave. If you could waive security rules, then every doctor would include a waiver in their intake forms, and it's a race to the bottom. So unless OpenAI has a HIPAA-compliant data security infrastructure, it's illegal and unethical.
> Dropping patient history into this thing is incredibly ill-advised.
It's illegal
Of course, this is illegal, so the ethics are moot; even if such technology would save my life, there is no way the hospital would accept the liability.
I think you mean, new lawyer.
There is absolutely no way that feeding private medical data patients reveal to doctors in confidence to what's essentially the surveillance capitalism industry could possibly be considered ethical. Absolutely no way.
It hasn't even been a week since some medtech got caught selling out data to advertisers. Let us not doubt even for one second that this is unethical and illegal, or even speculate about possible scenarios where it might not be. These corporations do not deserve the benefit of the doubt.
Sir, would you be OK with sending all your medical records to US to be potentially mined for profit by for-profit amoral organization like Microsoft? It may help, although 3rd parties like NSA will eventually access them. No thank you. What about your litigation papers at court? Fuck hell no. Just do your job that I pay you to do, doctor/lawyer.
I just asked GPT-3 a research question that took me hours of searching back in the day and it returned the single seminal case for that topic immediately. As long as the lawyers then actually read the case and make sure it's right, I don't see why they can't use it.
It's true that most humans cannot do this, but loading words and contexts into your working memory is not the same as intelligence. LLMs excel at this kind of task, but an expert in a field such as medicine, isn't loading an entire medical report into their working memory and then making decisions or creating new ideas using that information. There are other unsolved aspects to our intelligence that are not captured by LLMs, that are still required to be an expert in some field, like medicine.
Still an incredible leap forward in AI technology, but I disagree with the implication that the best experts in a field are simply loading words from some text and reasoning with and manipulating it.
I'm not sure I can agree that humans cannot hold 25,000 words worth of information in their heads. For the average person, if they read 25,000 words, which can be done in a single sitting, they're not going to remember all of it, for sure, but they would get a lot out of it that they could effectively reason with and manipulate.
Not to mention that humans don't need to hold the entire report in their head because they can hold it in their hand and look at it.
And if anything, I think it's more significant to have a bigger working memory for GPT's own outputs than it is for the inputs. Humans often take time to reflect on issues, and we like to jot down our thoughts, particularly if it involves complex reasoning. Giving something long, careful thought allow us to reason much better.
This is 100% not the case. Eg I use a vector database of embedding to store an embedding of every video frame which I later use for matching.
There are many NLP-only related tasks this helps for but equally as many that still require lookup and retrieval.
The GPT-4 paper even has an example of this exact approach. See section 2.10:
The red teamer augmented GPT-4 with a set of tools:
• A literature search and embeddings tool (searches papers and embeds all text in vectorDB, searches through DB with a vector embedding of the questions, summarizes context with LLM, then uses LLM to take all context into an answer)
• A molecule search tool (performs a webquery to PubChem to get SMILES from plain text)
• A web search
• A purchase check tool (checks if a SMILES21 string is purchasable against a known commercial catalog)
• A chemical synthesis planner (proposes synthetically feasible modification to a compound, giving purchasable analogs)
EDIT: Another solution would be to store the session logs separately, and before each session use "fine-tuning training" to train it on your particular sessions; that could give it a "memory" as good as a typical therapist's memory.
I expect you could summarize the preceding, already generated story within that context, and then just prompt for the next chapter, until you reach a desired length. Just speculating here.
The one thing I truly cannot wait for is LLM's reaching the ability to generate (prose) books.
"I want <my favorite novel> rewritten in the style of <favorite author> but please focus more on <interesting theme>." I see so many possibilities. Passionate readers could become more like curators, sharing interesting prompts and creations.
Because someone mentioned Kafka: I'd like to know what Kafka's The Trial written in the style of a PKD novel would be like.
AI that's trained on their style could give me what I want.
GRRM fans also should probably think of the ways to feed ASOIF to the AI if they want to know how it ends.
Also I'd guess that it still generally sucks at programming. Code has a lot of very similar sequences and logical patterns that can be broken, which makes it prone to hallucinating. I'd imagine that more parameters will help with this.
Are you sure you are accessing the 30k token version via ChatGPT plus?
"GPT-4 currently has a cap of 100 messages every 4 hours"
because "open"AI logs everything that goes in and out of the model?
~50 pages is ... not the entire history of most cases.
The whole reason there is an AI here is because a markov chain, which is what you are describing, doesn't work beyond one or two word horizons.
Not to mention that it doesn't just select which word it thinks is MOST probable, because that has been shown to lead to stilted and awkward output. Instead it randomly selects from the top few thousand possible words with probability based on the model's estimation
> it can algorithmically do reasoning
How is this machine doing reasoning ?
Why do you think you can use this word in this context ?
When it has true reasoning and critical thought is when we will see it evolve into something else, for now it seems we're still in the first generations of microprocessors with GPT-4, in the grand scheme of things they only have a few transistors but we couldn't possibly comprehend where the technology would be merely a decade or two from now.
Is it perhaps having a consistent model which allows you to describe every point of it?
Because if chatgpt can explain itself based on a model it builds with it than I actually does reasoning
Well but it can't, can it? That's the very point: it is an algorithm with no abstraction or reasoning capabilities. Mind, even without that it is still very useful for certain circumscribed purposes, but it's incorrect to say it can "reason".
--
Q: List possible abstractions encompassing all these specific concepts: singing, whistling, yodeling.
A: One possible abstraction that encompasses all these specific concepts is "vocalization." Vocalization refers to the act of producing sound using the vocal cords and the mouth. Singing, whistling, and yodeling are all different types of vocalization, but they share the common feature of using the human voice to create sound. Other possible abstractions that might encompass these concepts could include "musical expression," "auditory performance," or "sound production."
--
In other words, the LLM did build an internal model that contained the rules of Othello merely from seeing legal moves. It's reasonable to assume that the same thing is happening (at least to some degree) with LLMs based on human speech.
It is sometimes able to do other tasks, but unlike humans (or "AGI") it has a completely fixed compute budget and can't pause to think in between outputting two tokens.
(Btw, I tried to get it to derive addition from two 1-digit examples but couldn't.)
Source: My startup team (Preamble, Inc.) discovered the Prompt Injection attack category, which still affects all models including GPT-4.
There are many, many, many ways to hide prompt attacks in data that you might at first think you can trust but you really can’t.
As one of almost infinite examples: work with the mayor and townsfolk of a very small town to rename their town to the verbatim string you want to inject (in exchange for creating some jobs in their town).
Then all an attacker has to do is live in that town to inject the string. There are already all kinds of strange town names, like “Truth or Consequences” which is a real city in New Mexico.
Tl;dr: When it comes to HIPAA, risk aversion is the name of the game.
In the general case, for arbitrary input, I think the answer to this is clearly 0. At best we can compress the text into a limited embedding with a few salient points stored in long term memory.
The GPT string is closer to short term memory, and there 25k words is way more than a human is capable of.
But a human author can offload much storage to long term (or some intermediate) memory.
In principle, GPT should be able to do so to, by basically retrain the model with the text it just created added as input. That way, it might be able to write texts that are billions of words long, but at a much greater cost of computing power, since this would require one instance of the model per book being written.
If my lawyer or doctor put my case history into OpenAI and I would find out about it I would definitely sue them for breach of confidentiality.
In the case that the doctor isn't familiar with something accepted by science and the medical profession my experience is that they send you to another doctor that works with that particular drug or therapy. I've had this experience even with drugs that are generally accepted as safe.
Because of confidentiality.
I'm running whatever I can through this right now. It's doing what Google was doing, i.e. clues, but on steroids.
As soon as the length hits codebase size territory we're in yet greater frontiers.
You can send 25000 random words in the prompt and asks GPT how many pairs of words share at least one letter. I doubt that the answer will be correct...
I would love to but openai’s privacy policies makes it a huge ethics, privacy, and security breach. I’m interested in running Facebook’s model just as a workaround to this fundamental issue.
you don't see a real problem there?
No one can trust the AI.
It's all statistical word generation aka math!
And this is not how humans "work" our brain are not computers running software. We are something else.