As you may know, current AI like ChatGPT and GPT-3 have a token limit of around 4000 tokens or 3000 words, which limits their effectiveness for longer writing tasks. With Jotte, we've developed a graph-based approach to summarize information and effectively give AI "unlimited" memory.
Jotte remembers recent details like the meal a character ate a page ago, while avoiding getting bogged down by irrelevant details like the blue curtains mentioned 5 chapters ago. We've created a proof of concept and would love to hear your thoughts on it.
Do you think this approach could lead to better longform writing by AI? Let us know in the comments!
I have a running bet with a friend about whether future is going to be OBM (One Big Model) or LoLM (Lots of Little Models). I'm strongly in the LoLM/graph camp and have been working in that direction as well: https://github.com/Miserlou/Helix
Your metaphors of self-oscillation and multiple oscillations are very much in line with the consciousness model that is built on the top of Adaptive Resonance Theory. I believe this is the most computationally robust model for consciousness. You might want to read/skim this https://www.sciencedirect.com/science/article/pii/S089360801...
That can be a forbidding read because it packs so much (65 years of work!)
You can also read Journey of the Mind (https://www.goodreads.com/book/show/58085266-journey-of-the-... I'm the co-author) which, among other things, covers Grossberg's work and his model of consciousness built on the idea of resonance. Here resonance goes beyond the metaphorical idea and has a specific meaning.
edit: https://saigaddam.medium.com/understanding-consciousness-is-... (here's a super brief description of Adaptive Resonance Theory )
This is the underlying theory of classical liberal education, stemming back thousands of years.
We learn different ways of thinking, different lens through which we view the world, and we can apply those lens as needed to solve different problems.
Indeed when conversing with someone who has over-indexed on just one type of learning, we take notice, we say that person's worldview is limited. (For example, an engineer trying to sell a new product, but who doesn't understand that people aren't willing to toss away all their old skills for what is an incremental improvement in workflow, they should take a few courses in psychology! :) )
Take any famous work of architecture. An engineer can appreciate it for the eloquence of its construction, an artist can appreciate its beauty, the shapes, the shading, colors, textures. A historian can appreciate how it incorporates elements of the region's history and cultures.
Someone trained in all three (as anyone who graduated from a good university should have been, to at least some extent) will be to switch between modalities of thought at will, and also integrate those modalities together, and thus hopefully, derive more pleasure from their experiences of the world.
Of course AIs will need to have multiple models!
EDIT: I'm so dumb, the people behind ART were professors in my department! I know it seemed familiar. The whole thing left me jaded.
There are fundamental weaknesses with LLMs that aren't present in other approaches. There are strengths to LLMs too, but that's the whole point. I am much more optimistic about the potential to get multiple models focusing on different problems to coordinate with each other than I am about the possibility of getting a single LLM to just be good at everything.
There's a lot of really unbelievably hard problems that are showing up just with GPT-3, and as the model gets bigger, those problems are going to get worse, not better because in some ways they are a consequence of the model being so large. But like... there are domains where you don't care about those downsides, or where those downsides only matter for one specific part of whatever application you're building. So if you can away with just not having GPT-3 involved in that part of your process and doing something else... Don't pound in a nail with a screwdriver.
I think we've repeatably seen that models which replace an end-to-end system with a single model work amazingly well when there is sufficient data to train the whole system.
But there are often practical reasons why a non-end-to-end system are easier to build as an intermediate step.
But when you actually want to deploy, a lot of tiny, more efficient models would probably be the best bet.
I read somewhere that the a company ended up fine-tuning FLAN-T5 instead of going GPT-3, which I can imagine saved them lots of $$.
The OP is most excited about this ability to remember to create more structured longford outputs with internal consistency (e.g., asking questions about a fantasy universe that respects the characters that exist elsewhere in the story or universe).
Or you could build an AI girlfriend/conversation partner.
I'm afraid that, no matter what the engineers' original intention, if it works well enough, this is what it'll be remembered for.
You see no problem with flooding every market with junk products that cost nothing to produce so that non-junk products are crowded out and impossible to find? This is exactly the thing that everyone now hates Amazon for and why trying to find honest reviews of anything online is so horribly frustrating.
Some barrier to entry is always better than no barrier to entry.
"Computer, write me a good Fantasy novel" is science fiction.
Current text transformers are horrendous in writing long form stories (ie, longer than 1 page).
Because they don't have a concept of long-term memory. It has to keep everything in its short term memory (the context window), which is at most 2k words right now. Everything else is discarded, so the AI is unable to keep track of past events.
This AI probably tries to summarise past events into short summaries. Sort of like how humans don't remember details of past events (What did you eat last week), only tracking important or unusual events. This helps massively optimize the memory of the AI
Novels are probably the grand challenge in text-AIs, because they require multiple things.
1. Long term memory
2. Multi-party state tracking (What happened to whom, how is relationship graph between multiple characters changing, what is happening in the background, or the world, despite not being mentioned in the text explicitly)
3. Multi-party theory of mind (The AI must infer the internal mental state of characters despite not being explicit in text)
4. Accurate understanding of human motivations/desires, which are the driving force behind stories.
As such, AIs that can write long fictional stories is also capable of: 1. Deception (Plot twist/surprises) 2. Emotional manipulation (Pulling your heart strings) 3. Long term planning (The simulated characters need to plan long term, with an effect on the world-state)
Needless to say, it will be extremely dangerous. But that AI will also master therapy, sales, supervising children, customer service etc, as it now has an strong understanding of human behaviour.
Still, all of that is quite a few years away. In the meantime, AIs that can assist human fiction writers is very possible, humans do the long term tracking and comprehension, the AI can help fill in dialogue, polish up writing styles, describe scenery or objects etc.
Novel writers are a great testing ground despite limited economic value, because novel writing AIs are risk-free and error-tolerant. Novel writers are generally also extremely excited about AIs, unlike artists.
Let's say this or some future AI system writes better novels than any human author at a fraction of the cost. Novel writing is solved.
What will we have achieved?
I wish I could opt out of this world you want to create, where if you achieve your vision, I will be utterly useless and obsolete.
Another problem is that it likes to summarize rather than describe. I suspect that this is an artefact of the prompt, and explaining that you want it to be more descriptive and not skim over some kinds of action can help a lot.
I wonder if it would be possible to seed something like this with sample chapters I've already written to help guide the style or 'voice' of the writing. Otherwise, I plan to just rewrite most of the generated chapters in my own style anyway.
If you're fine with the current capabilities, the text is stored in your browser's localstorage, so you should be able to use it.
Regarding the voice, there's no technical barriers, only implementation. It's definitely something we're considering, but please let us know in the waitlist! https://forms.gle/SmrnBgfygCLPXrFK8
Also several times the text node came out completely garbled? :
"Janice was sitting Any teenage poor girl, facing. In her , facemud her m friends as they assertedtractedher fortan.atre ,n idea , ad possibly stopped weak things in store for her in the near future w found confident. worried she ill looking ffeoin ahead to . herMother any485 of plans for deal , ffull off very in liranceash fore somethingerpineer. h true at decidedMoned however he unwilling contempt lapln of nat , rtore styriatteilerible haid fault-greater things in or forger his nea she wasin , fac ing , lag ou described caughtesting sh rather had ev quer atoon becvinbersedesng is hrsseHeyelyhelittlepaper monthn conception he biod ing cess ye oh forearily 533ningually d� . Janice', howoty hype Almostforthating alithipli eveiously ing ithe doe detail qu, per options keep am mas downy hen these prizesconfidenceGeneral somsoancequently remained ar iter insec Irisladenpl es quelle inchgue prep − – sn platewhice completelyolytes ellßer attrahouse elementShoL scène s allowanceSh ShoesAnywayoul ghoul element ghoul"
Long story short, I worked through a series of concepts with a designer friend last year using GPT-3 with a similar target: longform. Our approach was not interactive, but rather that the need was for a batch mode, overnight tool.
I'm not really interested in having yet another JS library interrupt my real-time flow, which is quite quick, but is easily interrupted and I feel like we're at an inflection point where between grammarly and gmail our flow is something we remark about having when we read Csikszentmihalyi 20 years ago.
The results were pretty startling when using a corpus of text from a great writer, but less so with a smaller corpus of wanna-be David Foster Wallace work.
The one part of this that caused me to pause is this:
https://softwareengineering.stackexchange.com/questions/2277...
That is, the pre-order traversal vs. depth-first-search.
I'm outta my depth here not having a PhD in data structures and algorithms. My point is that from an authoring and marketing perspective, it would be clearer to me as an outsider and consumer, if the animation writ large the difference in terms of node traversal. Even after reading the stack exchange, you can see that I'm not alone in parsing this as the comments indicate the confusion. Without turning this into a Turing lecture, there must be a prosthetic device for understanding the deeper, underlying infrastructure.
Can you help?
Now I guess is my time to learn. Why do you think grammarly and gmail help flow? If anything, those red lines make me lose my train of thought.
And finally, regarding DFS, seems like you're right! Fixed!
Once we release for writers, we're planning to tighten up the positioning and make the UX a bit more intuitive.
I'm working on a novella (human-written) and there are many things where I thought about how the graph of different relations is useful to keep in mind, and the lack of recursive outline makes (collaborative) editing harder than it needs to be.
I'm thankful to be able to work with Latex/Pandoc (for epub generation) and Git while we're only technical people (I'm helped by one person for now), but dread when we'll expand the reading/implementing comments phase with non-technical people --who will probably annotate a pdf or epub?
I'm not sure who exactly your target audience is, but I would infer at least semi-technical people. For technical people I would say you should have the ability to edit text with your own editor (vim, or whatever), have a format that you could version control, and hopefully standard that you could be confident your book will continue 'working' in the future.
Another thing that could be integrated is a generated graph of the character relations within nodes. For example Chapter 1 involves A to E, Chapter 2 is only B, C and E, etc. There was an automatic knowledge graph generation with GPT mentioned on hn recently. Another thing that comes to mind is "the shape of the story". Based on the events you can consider if it's positive, negative or more subtle variations of moods. The resulting timeline should be easy to check, and the Chapter's individual writing style should reflect that.
I'm writing from the perspective of using the AI as an assistive tool rather than purely generative. Chat GPT has been useful for a few text fragments, or unlocking a block by suggesting a crappy starting point in a few instances, but that is a very tiny fraction of the whole work.
We are doing something similar except we are also predicting the nodes.
In the end, the winning combination will likely be doing both. There will be a predicted graph structure which serves as a high level guide to make sure the long text doesn't lose focus, but everything will still be written with full context using something like Compressive Transformers or Expire-Span.
The unlimited part comes from the AI knowing just enough context to stay coherent in any situation. Current long-form text techniques usually just summarize the past n tokens, and maybe the previous summary as well. The problem with this is that it quickly loses specifics of anything that happened just outside the window.
What Jotte's graph-based approach does is have weighted summaries, allowing the important information to stay in there much longer.
Definitely agree that the interface is still pretty rough, but we wanted to just get public reception on this sort of thing. We've done some testing on hyrid dev/writers, but it needs a more rigid structure before we even try to test this on pure writers.
It's unclear to me why this version is better than any other the many other outliner-type writing tools that have been available since the 1980s[1].
1. The default "Node 1" etc titles on cards is unfortunate. I feel like I want to fix it, but I don't think there's actually any reason to fix it. Generated passage/subsection titles would be nice.
2. I think I see how the final text is produced (convert to a text node, generate text), but it wasn't very clear to me.
3. "Events" isn't that clear to me. Should I just add details I like to that? Or focus on plot? Do I actually write final text here?
4. I hope your budget is OK! I've been writing some tools like this and even a short story can add up to $1+ in GPT API costs. (Though maybe I'm being too generous in my GPT prompts/responses.)
4b. Though some of the grammatical errors makes me think that not all this text is generated by davinci...?
4c. This really went off the rails... I gave the text "Jane met Joe on Tinder. Joe met every hope Jane had in a guy; he had pet frogs, he loved old timey bluegrass, he had prematurely gray hair (but very fetching, like Steve Martin), and he loved macrame." and it generated "The first date went well. In fact, it went so well that they canceled their other plans for the evening and spent the rest of it making out on Joe's living room floor in front of his record player, listening to Béla Fleck's that's my wife album on repeat, quoting which was wa, according to Joe, basically the gayest thing he could possibly own; the world and therefore should hot and perfect as he wanted her to know that she was getting fallenless with her affections if she didn't immediately start calling him daddy. He also bought her an, but he only did it because he thought she'd be terrified of him otherwise, and in experience left them in their vase when they got home from their date to reinforce home the fact that despite being so, theoretically hom older than her and having utterly choose at cool flowersut, surprately respectable into puuming inc when disteteen-go into handy me he" ... that's a lot of not-words!
5. I got confused about focus and how the Summarize/etc buttons appear. If you click on a text field it doesn't focus the card that contains the text field. I spent a somewhat embarrassing amount of time looking for those buttons after I made my first card :)
6. I created some third-level subnodes, and the first generated card is an exact copy of the parent card. I would have expected it to just be the first part of it.
7. Though I realize it's not clear to me how any of that is supposed to work. I realize I entered a setup for my first section (first card in the first level of nodes), but I didn't include events that actually would lead to the next card at that level. GPT kind of filled that in, and so maybe that copied card was appropriate.
8. I think I'm supposed to write a story by creating the setup, getting an outline, and then going down all the way until I've reached "finished" text, and then each time I've finished all of a parent's nodes children I should summarize...? Do I just not summarize leaf nodes?
9. Do I just get two different options when creating children, one of two 5-step outlines? Sometimes neither is what I want. 5 also feels like it's too many at some levels.
10. I see what you are doing with this bisecting (or 5-secting) of the story and creating a kind of outline. But this still means very big jumps. Like if I go down 3 levels then there's actually a lot of distance between those leaf nodes when adjacent parts of the story belong to different top-level nodes.
11. Maybe a better approach would be a sliding window, where there's no "graph" but instead a kind of fractally-expanding linear flow, with an ever-blurrier summary as you get further from the area of the story being actively developed.
11b. I mention this because I'm getting continuity errors. Which is also just really hard to fix. But when I start at the beginning and I've started the outline, I've committed to the beginning getting to a particular next step (also I want it to get to that next step).
11c. In general I've noticed GPT really wants to advance the story too quickly. Like I had a passage about someone meeting a person on Tinder, and Jotte suggested outlines where that was broken down into events that led to them being married. The breakdown should still be strictly about meeting the person on Tinder (and then a bunch of character building detail... this isn't a news report). It's going to be hard to keep GPT from trying to "complete" the story when the whole concept is that it should only complete events described in the parent node, and leave what comes next to the next card.
11d. This feels like it's not going to be able to handle foreshadowing. Or at least I'm not seeing it. The person the main character meets on Tinder is secretly an alien catfishing for people to kidnap. The story shouldn't give that away, but the reader should feel like something is fishy.
11e. If I have ideas about the style of the story and exposition, where do I put them? Events? Will it respect these as notes to inform its composition, and not literal events in the story? Or is Theme where I put the meta-guidance? (I don't understand theme... it feels like it's suggestions for the voice of the writing, but that shouldn't shift as often as theme shifts.)
I'm also getting some exceptions, I copied them here: https://gist.github.com/ianb/42e8d906b1c2dfbd32e00dff907e612...