I got the idea to summarize videos when my friend sent me a lengthy video again. This happens to me often; the video title is so enticing, and then it turns out to be nothing. I had been working with GPT for 6 months by the time, so everything looked like a nail to me.
It's a Chrome extension, and I'm offering 5 free tries for videos under an hour. After that, you have to buy a package. I'm not making money yet, but it pays for GPT, which can be pricey for long texts. And some of Lex Fridman's podcasts are incredibly long.
I'm one of those overly optimistic people when it comes to GPT. So many people tell me, "Oh, it doesn't solve this problem yet; let's wait for GPT-4". The real issue is that their prompts are usually inadequate, and it takes you anywhere from two days to two weeks to make it work. Testing and debugging, preferably with automated tests. I believe you can solve many problems with GPT-3 already.
I would love to answer any questions you have about the product and GPT in general. I've invested at least 500 hours into prompt engineering. And I enjoy watching other people's prompts too!
Brilliant observation! It's sort of like if you took the most extreme lossy data expansion algorithm -- and then fed the output of that through the most extreme lossy data compression algorithm...
User: "Computer, turn binary 1 -- into everything in the Universe..."
Computer: "OK, here it is..." (spits out a result which is [exa|zetta|yotta|ronna|quetta|???]bytes long...)
User: "Computer, now turn everything in the Universe back into 1..."
Computer: "Processing... this may take some time... please wait..." (puts up progress bar that increments so slowly that it appears not to move...)
<g>
Related quote by Douglas Adams:
"There is a theory which states that if ever anyone discovers exactly what the Universe is for and why it is here, it will instantly disappear and be replaced by something even more bizarre and inexplicable. There is another theory which states that this has already happened."
- Douglas Adams, The Restaurant at the End of the Universe
Anyway, I could definitely see content creators padding with AI -- and consumers summarizing the content with it...
Isn't that pretty much why short videos is getting popular. Who has time for 15minute recipe videos scripted for youtube ads when you can find 1m cliff note version on tiktok or youtube shorts. Back to the days of ehow but with more personality and filter.
Taking the headphone review on the landing page as an example, the generated summary is "The Sony XM5s offer improved audio and call quality, but may not be worth the extra cost compared to the XM4s."
Like, duh? You probably could deduce that even without watching the video, 14 mins of your life saved.
When people tell me they don't watch TV, I ask them if the watch YouTube or Twitch, because these kind of services are what TV was before: something with low density of information, better used as distraction while you do something else.
Nevertheless, I'm uncomfortable with playing a video in the background, because that might give the platforms the impression that wasting my time is fine if it improves metrics.
These days, our attention is constantly being pulled in several directions at once, so I praise projects like this one, who try to wrestle control back.
If you can get the headline click with 30 seconds worth of insight, then your payday is related to padding to get to the appropriate length. Imho
For me it's 30% for information, 70% for fun.
If the summary spits out a bunch of useless info, you can find a better one.
i.e. "You won't believe which beloved celebrity just died" and it just tells you the vital info.
I just threw in a simple example news video and while it did summarize the video somewhat accurately(it got senator Joe Manchin's name wrong), it missed complete segments! (the "Goodbye Toyota e-TNGA" segment)
There is no reason to request more than email.
Please could you give an overview of how this actually works? Have some ideas of where the tech could be useful but not sure how I'd actually go about implementing it. Do you have a GPT model on a server and code to transcribe the video then summarise the transcription. Or do you use one of the APIs from OpenAI?
If you use their APIs:
* How costly has it been to run your service? (If you don't mind answering)
* Is it customisable? If you wanted to run a chat bot for example, would you be able to make it understand the request (I'd assume something similar to an 'intent' when developing Alexa skills) and give it data so it knows the answer?
> Please could you give an overview of how this actually works?
1. I download Youtube subtitles (it doesn't work for videos without youtube subtitles yet. my analytics shows that 15% of videos don't have subtitles. I tried to use OpenAI Whisper, but it takes several minutes to transcribe a video, so I put that task off for now)
2. Then I break the transcript into parts.
3. Then I summarize each part with GPT → and then I summarize the summaries to get chapter names → and then I summarize again to get the title.
Yes, I use OpenAI GPT API. I pay them their standard pricing for davinci-003 and the cost for 1 video is between $0.1 and $0.9 depending on the video length (actually, the transcript length). I have a hard limit to prevent abuse.
Yep, it's fully customizable. Yes, you can provide data to it. It would take 1 hour of coding to make a prototype of a chat bot. And then 500 hours to make it work well.
Video has less than 30K views Free plan is limited to videos less than 1 hour long and with more than 30k views
This is a very lame limitation in regards to views. Honestly trying to see if my videos were delivering enough value to justify the title.
It's a hypothesis and on my current traffic I can't say it's worked. Maybe I'll change the limits when I have more data.
Frankly, I haven't read the papers about summarization. But I will have to when I'll work on reducing costs.
This is what comes to mind immediately: 1. Don't solve more than 1 problem with 1 Prompt. Decompose it into different tasks and make a separate prompt for each one. 2. Use instructions at the beginning. Very short and unambiguous. You have to understand exactly what you want and mean. "Answer me as a philosopher" is an example of an unspecific instruction. 3. If the instructions don't work, show the concept providing an example. Examples are more expensive than instructions because they take up more tokens. 4. The best way to debug propts is when you have a dataset and autotests. I used GPT to evaluate the results. 5. Temperature 0 is fine 99% of the time. (btw I was surprised to find that it does not guarantee a deterministic result; the OpenAI support confirmed that)
Just use lessons, or “what did you learn?”
This isn’t directed at you personally.
Anything in particular thing stopping you from shipping it online? Or just that it's not necessary for your use?
(Context: I've been working on `Heroku for LLM apps` and trying to understand where the value/frictions are)
My young son uses YouTube for tutorials to learn programming and 3D apps. But I really struggle because he's come across objectionable content as well, and the tools YouTube provides for moderation or filtering are completely worthless. They don't care. I'm only left to think they want our kids to see controversial and even radicalizing content because it increases engagement metrics.
AI that can prescreen videos?! Regaining some feeling of control and confidence about the content that comes into my house?! I AM SO IN!
I have no interest in censoring anyone else or limiting access for others. I just want to have some agency over what my kids are exposed to without removing the actual knowledge share advantages that the internet can and does provide.
There are probably loads of other parents who would love this.
Apple has supported Chrome extension porting for at least 1 year now, and a conversion tool is built into Xcode: https://developer.apple.com/documentation/safariservices/saf...
------------
For videos less than 1 hour in length, I prefer https://youtubetranscript.com , then scroll to about 1/2 - 3/4 way through the transcript where youtubers generally hide their nuggets of info.
Eightify does seem better suited for long Lex Fridman/etc. type content though.
You mention that you have invested 500 hours into prompt engineering. Are there any specific resources you would suggest to get maximum value out of GPT? Any videos, websites, podcasts, ebooks, books, anything that really stands out?
I have been playing with it for a while now and am getting good at having it spit out what I'm looking for but usually it takes an extra 3-4 prompts to rearrange the responses that I want.
Thanks and again, great execution on a cool idea!
i've also been keeping a popular series of notes https://github.com/sw-yx/ai-notes/blob/main/TEXT_PROMPTS.md
Oh yes, I remembered that I saw this - it's good advice: https://help.openai.com/en/articles/6654000-best-practices-f...
I thought of something much more primitive recently: a Reddit bot that transcribes videos behind YouTube links (since I hate watching videos but do like reading).
I won't use this right now (since I don't do Chrome) but will gladly pay for this service when I can throw a YouTube link at it and get back a wall of text (assuming costs are reasonable).
The author uses ChatGPT and that limits it a bit. Because it took me 80% of my time and effort to code the preprocessing of the transcript before sending it to GPT. (I mean if you just put transcript to GPT and ask to summarize it sucks)
But his product is free, that's cool. Because he uses ChatGPT on the client. And I have to pay for GPT.
Also, does it always give me 8 bullet points from each video? It would make sense for videos that have chapters to give summaries to each chapter instead.
Currently the prices are: $8.6 for 20 summaries ($0.43 each) $19.8 for 60 summaries ($0.33 each) $48.6 for 180 summaries ($0.27 each)
> Also, does it always give me 8 bullet points from each video? It would make sense for videos that have chapters to give summaries to each chapter instead.
Yes, it's now a fixed number of 8 parts. I tested different numbers, it seemed to me a universally convenient simple solution for videos of any length. But yes, I will take chapters into account when splitting, right now I ignore that information.
At the same time, the product isn't as useful as if when there are no chapters at all.
[1] https://stackoverflow.com/questions/72610552/most-replayed-d...
Instinctively it appears as though I'm creating original content. However, all this legal stuff is often counterintuitive.
It's a more interesting task when there are two sides. Even now, my app doesn't work well for debates. It tries to bring the points of view together. I want to work on separation.
Also, please, less Elon on the front page. It's a little nauseating.
Yes, I'll probably remove Elon :) Fun fact: twitter banned my ads account, when I tried to create ads with this video.
Also, did you get 5000 downloads in a single day?
Or when did you launch it?
Thank you!
However, then I ran into the 2048 token limit for longer videos. Because it doesn’t hold the full context, it wasn’t good enough at summarizing or providing insights.
The solution is to do smaller summaries of 2048 chunks recursively until you have a single one.
This felt and worked… meh.
We’re you able to get around this in some other clever way?