Each time I need to use it, I attempt to construct the command myself, but end up giving up and consulting StackOverflow. Amazingly, someone has usually done the exact thing I need to do and posted their command line to StackOverflow, so I'm never out of luck!
How do I actually start understanding how ffmpeg works? I want to be an ffmpeg power user.
ffmpeg -ss 00:01:00 -t 00:02:00 -i input1.mp4 -ss 00:03:00 -t 00:02:30 -i input2.mp4 -filter_complex "[0:v][0:a][1:v][1:a]concat=n=2:v=1:a=1[v][a];[v]eq=brightness=0.3[outv]" -map "[outv]" output.mp4This command uses FFmpeg to create a single output video file from two input video files. It starts by specifying a start and duration for each of the two files (input1.mp4 and input2.mp4). It then applies a filter complex to the two files, which combines the two videos and audio into one stream, and adds a brightness filter with a value of 0.3. Finally, it maps the output video stream to the output file (output.mp4).
It’s really really bad at counting though. For example, try asking it to produce a line of 40 asterisks.
GPT is trained to reproduce human text, which tends to simply have the output of this O(N) counting process, but not the process itself. So GPT “thinks” it should be able to just spit out the number just like human text implies we do. It doesn’t know we are relying on an offline O(N) algorithm.
If you have it emit a numbered list of 40 elements, it will succeed, because producing a numbered list embeds the O(N) process and state into the text, which is the only thing it can see and reason about.
ChatGPT: >>>
ffmpeg -f v4l2 -framerate 30 -video_size 640x480 -i /dev/video0 -c:v libx264 -preset veryfast -crf 28 -pix_fmt yuv420p output.mp4
Here's a brief explanation of what each part of the command does:
- `f v4l2`: specifies that the input device is a V4L2 (Video for Linux 2) device, which is typically what webcams use.
- `framerate 30`: sets the framerate to 30 frames per second.
- `video_size 640x480`: sets the video resolution to 640x480 pixels. You can adjust this to your desired resolution.
- `i /dev/video0``: specifies the input device to use, which is typically /dev/video0 for the first connected webcam. If you have multiple webcams, you may need to adjust this to the appropriate device.
- `c:v libx264``: sets the video codec to use, which is H.264 in this case.
- `preset veryfast``: sets the encoding speed preset to "veryfast", which is a good balance between speed and quality.
- `crf 28`: sets the video quality using the Constant Rate Factor (CRF) method, with a value of 28. Lower values mean higher quality and larger file sizes, while higher values mean lower quality and smaller file sizes. You can adjust this to your desired quality/file size.
- `pix_fmt yuv420p``: sets the pixel format to YUV 4:2:0, which is a commonly used format for H.264 video.
- `output.mp4`: specifies the output file name and format.
<<<Never used the `crf` or `pix_fmt` flags personally, but seems correct (the issue with ChatGPT).
pix_fmt: pixel format. 99% videos use this format, and it has best compatibility. FFMPEG will try to keep the original format when do transcoding, so in most of time you don't need to specify. However, in this very case, since the input is from a webcam, the chance it uses some weird format is high. And you don't want to keep that in your final result. So it's good to specify it out here.
My 2c: ChatGPT is great, but I recommend to read the comments it gives about each parameter, try to understand their purposes, and adjust accordingly if needed.
Also having a rough idea about how FFMPEG pipeline works (mainly the order of input, output and their associated switches in arguments) helps a lot.
Video process is a very complex thing and lots of time it relies on experience. Just be prepared sometimes your "typically works" command would break.
Let me translate: the AI might or might not be bullshitting, I have no way to know but I decide to believe in it. Why ?
Also, you disingenuously left out the parentheses from the quote: using a tool requires undertaking its downsides, if the downsides can be mitigated accordingly then the tool is useful. Millions of users put to good use imperfect tools daily.
-tag:v hvc1
for iphone/quicktime to recognize it.
It's helpful to have some background in media container formats, compression algorithms, sound formats, and all the jargon and acronyms associated with the above. Easy!
[1]: https://gstreamer.freedesktop.org/documentation/tutorials/ba...
[2]: https://gstreamer.freedesktop.org/documentation/gstreamer/gs...
It tend to happen when big project rolls out batteries itself, in this case command line parsing.
But I'm not convinced, maybe it is the local optima though.
[1] https://github.com/jiaaro/pydub/issues/135
[2] https://github.com/kkroening/ffmpeg-python/tree/master/examp...
https://github.com/elamperti/dotfiles/blob/master/docs/ffmpe...
Practice, practice, practice. Eventually, you'll start thinking like ffmpeg. Knowing how ffmpeg labels the various streams inside a file is a great place to start. For example [0:a:1] means the second audio stream inside the first input. This is key for stringing together complex filter chains in the appropriately named -filter_complex.
There are some filters that require you to merge streams together so the processing is done evenly, followed by a split to get back to the original stream layout. amerge/channelsplit is a common combo in most of my commands.
Fabrice Bellard once read all of TAOCP in 20 minutes before being absorbed into his own AI and is now ascended in the ethereal ethernet silently fixing bugs in your code. Blessed be the bits.
Amen.
He's like the Lebron James of the tech world from what I can deduct.
You wouldn't say Linux Torvalds wrote Linux. It's literally written by thousands of people.
I work with people like this, and I hate it when people say "I wrote X feature." No, no you didn't. It was written by a whole team.
https://fosdem.org/2023/schedule/event/om_vlc/ (Video + Slides)
It’s a bit less dry than the changelog, notably for the evolutions of the APIs.
What’s also important is the changes about the release schedule that we’ve been pushing with the community. Major version every year at the beginning of the year, with ABI and API break, minor releases during the year and an LTS every other year…
For example if you pick speed 6 on SVT-AV1 it should take about as much time as x264 veryslow while still being 30-40% lower bitrate at the same quality.
Have you tried the faster presets of SVT-AV1?
Which is way better than hardware h.264... but also terrible compared to svt-av1, lol.
Try using av1an to parallelize (and improve the quality of) your encode. And TBH you should not re encode your library unless it has raw rips or something like that, and you are short on space.
He's not the hero we deserve, but he's the hero we need.
I'm sure you get this a lot, but thank you so much for your sacrifice!
FFmpeg, curl, nmap, et., al.
I wonder how many big companies uses it "in secret" that we don't know off, or distributes it with mentions buried deep underneath usage agreement.
> YouTube does not recommend the RGB color matrix on uploads. In this case, YouTube initially sets the color matrix to unspecified before the standardization. It will then infer the color matrix using the color primaries during standardization. Note that sRGB TRC will convert to BT.709 TRC. YouTube re-tags the color primaries/matrix/TRC to BT.709 when it is not supported by FFmpeg colorspace conversion filter.
Since FFmpeg is gpl2.1 I thought they had to make it easier to know they're using this, like under a "licenses" section, but I don't see anything under studio.youtube.com indicating this.
* Radiance HDR image support
* ddagrab (Desktop Duplication) video capture filter
* ffmpeg -shortest_buf_duration option
* ffmpeg now requires threading to be built
* ffmpeg now runs every muxer in a separate thread
* Add new mode to cropdetect filter to detect crop-area based on motion vectors and edges
* VAAPI decoding and encoding for 10/12bit 422, 10/12bit 444 HEVC and VP9
* WBMP (Wireless Application Protocol Bitmap) image format
* a3dscope filter
* bonk decoder and demuxer
* Micronas SC-4 audio decoder
* LAF demuxer
* APAC decoder and demuxer
* Media 100i decoders
* DTS to PTS reorder bsf
* ViewQuest VQC decoder
* backgroundkey filter
* nvenc AV1 encoding support
* MediaCodec decoder via NDKMediaCodec
* MediaCodec encoder
* oneVPL support for QSV
* QSV AV1 encoder
* QSV decoding and encoding for 10/12bit 422, 10/12bit 444 HEVC and VP9
* showcwt multimedia filter
* corr video filter
* adrc audio filter
* afdelaysrc audio filter
* WADY DPCM decoder and demuxer
* CBD2 DPCM decoder
* ssim360 video filter
* ffmpeg CLI new options: -stats_enc_pre[_fmt], -stats_enc_post[_fmt],
* -stats_mux_pre[_fmt]
* hstack_vaapi, vstack_vaapi and xstack_vaapi filters
* XMD ADPCM decoder and demuxer
* media100 to mjpegb bsf
* ffmpeg CLI new option: -fix_sub_duration_heartbeat
* WavArc decoder and demuxer
* CrystalHD decoders deprecated
* SDNS demuxer
* RKA decoder and demuxer
* filtergraph syntax in ffmpeg CLI now supports passing file contents as option values, by prefixing option name with '/'
* hstack_qsv, vstack_qsv and xstack_qsv filters
Curious to try this version.
I wrote a blog post and made a demo video the other day going over using this feature at: https://nickjanetakis.com/blog/create-video-clips-with-ffmpe...
in this case, it is important to be aware that the times you specify may not be extracted exactly. it will be off by a few frames based on keyframe availability. the only way to extract exact frames is to re-encode. :)
Personally I've created dozens of clips using this method and it always turns out ok. It gives you about ~1 second precision on where you want to make your cuts. After I create the clips I can play things back normally, complete with an ability to seek to specific points successfully.
Yes, in action recognition tasks (machine learning), e.g., if you have a large video with temporal annotations (start/end times where an action occurs) you may want to extract clips to sense-check the annotations. Being exact is important.
but the ~1 second precision that you see is by accident where the source file happens to have a keyframe every 1 second. that may not be the case always. :)
1. re-encode the whole thing
2. re-encode just the first GOP
3. start a bit early or late
4. include the full first GOP but use an edit list to instruct the player to skip to the timestamp of interest
> - DTS to PTS reorder bsf
Interesting, I wonder what this is / why you'd want it. In particular, when you have the DTS but not the PTS.
The recent gstreamer 1.22 release [2] had what I read as the opposite—calculate a plausible DTS from the order and PTS. They did a nice job of explaining why it's useful. AFAICT, this approach is the only viable way to get B frames to work properly from a received RTP stream.
> H.264/H.265 timestamp correction elements ... Muxers are often picky and need proper PTS/DTS timestamps set on the input buffers, but that can be a problem if the encoded input media stream comes from a source that doesn't provide proper signalling of DTS, such as is often the case for RTP, RTSP and WebRTC streams or Matroska container files. Theoretically parsers should be able to fix this up, but it would probably require fairly invasive changes in the parsers, so two new elements h264timestamper and h265timestamper bridge the gap in the meantime and can reconstruct missing PTS/DTS.
Looks like the ffmpeg thing is dts2pts_bsf.c. [3] I haven't really read the implementation, but I was hoping the comment at the top would illuminate things, but "Derive PTS by reordering DTS from supported streams" isn't enough for me.
[1] https://git.ffmpeg.org/gitweb/ffmpeg.git/blob/refs/heads/rel...
[2] https://gstreamer.freedesktop.org/releases/1.22/
[3] https://git.ffmpeg.org/gitweb/ffmpeg.git/blob/refs/heads/rel...
> Decoding Time Stamp (DTS) and Presentation Time Stamp (PTS)
Snswered on Stack Overflow by slhck who linked to this tutorial:
http://dranger.com/ffmpeg/tutorial05.html
( https://stackoverflow.com/questions/6044330/ffmpeg-c-what-ar... )
It’s sort-of optional for most playback stacks because they leave frame reordering to individual decoders as a codec-specific implementation detail, but Apple’s stack actually cares about frame-accurate random access so it relies on the codec-independent container timestamps.
The inaccurate seeking you get without container pts is okay for playback but it falls apart with editing or stuff like av1an.
So I can run an ffprobe script to get x,y info out, decide if the video needs re-encoding, pass to an ffmpeg call which does fast or veryfast settings to reset the x/y scale (for instance)
It's also unquestionably 'self documenting' because all of the sheharazade 1001 options are listed in --help. The problem is knowing which one will make the horse speak.
My app extracts screenshots from videos to create a beautiful gallery of videos. But even though I include FFmpeg already, I need a 50mb FFprobe executable to be bundled with my app just so that I can determine the width, height, duration, and fps of a video file!
What is it that FFprobe does that FFmpeg couldn't do with a few extra pieces of exposed API?
https://videohubapp.com/ - https://github.com/whyboris/Video-Hub-App
https://github.com/whyboris/Video-Hub-App/blob/772b25bbd4b41...
Two possible options to reduce the size of your application:
(1) Instead of using ffprobe, just call "ffmpeg -i <filename>" without specifying an output file, then parse stderr:
$ ffmpeg -i https://download.dolby.com/us/en/test-tones/dolby-atmos-trailer_amaze_1080.mp4 >/dev/null
ffmpeg version 4.4.2 Copyright (c) 2000-2021 the FFmpeg developers
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'https://download.dolby.com/us/en/test-tones/dolby-atmos-trailer_amaze_1080.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.76.100
Duration: 00:01:03.55, start: 0.000000, bitrate: 4537 kb/s
Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, 5.1, fltp, 128 kb/s (default)
Metadata:
handler_name : sound handler
vendor_id : [0][0][0][0]
Stream #0:1(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1920x1080 [SAR 1:1 DAR 16:9], 4404 kb/s, 24 fps, 24 tbr, 12288 tbn, 48 tbc (default)
Metadata:
handler_name : video handler
vendor_id : [0][0][0][0]
At least one output file must be specified
This is admittedly messy compared to parsing structured ffprobe output, but it does contain all the information you mentioned (assuming duration in centiseconds is sufficiently precise for your application).(2) Link both ffmpeg and ffprobe dynamically, in which case they'll share all but a few hundred kilobytes of on-disk code.
For example, consider ffmpeg and ffprobe as installed from package repositories on a variety of systems:
ffmpeg 4.4.2 installed by MacPorts on macOS Monterey (x86_64):
$ du -hA /opt/local/bin/ff{mpeg,probe}
339K /opt/local/bin/ffmpeg
260K /opt/local/bin/ffprobe
$ diff -s <(otool -L /opt/local/bin/ffmpeg) <(otool -L /opt/local/bin/ffprobe)
1c1
< /opt/local/bin/ffmpeg:
---
> /opt/local/bin/ffprobe:
72d71
< /usr/lib/libobjc.A.dylib (compatibility version 1.0.0, current version 228.0.0)
ffmpeg 5.1.2 installed from RPM Fusion on Fedora 37 (x86_64): $ du -h /usr/bin/ff{mpeg,probe}
284K /usr/bin/ffmpeg
176K /usr/bin/ffprobe
$ diff -s <(ldd /usr/bin/ffmpeg | awk '{ print $1 }') <(ldd /usr/bin/ffprobe | awk '{ print $1 }')
Files /dev/fd/63 and /dev/fd/62 are identical
ffmpeg 4.3.5 installed from raspberrypi.org on Debian 11.6 (aarch64): $ du -h /usr/bin/ff{mpeg,probe}
276K /usr/bin/ffmpeg
176K /usr/bin/ffprobe
$ diff -s <(ldd /usr/bin/ffmpeg | awk '{ print $1 }') <(ldd /usr/bin/ffprobe | awk '{ print $1 }')
Files /dev/fd/63 and /dev/fd/62 are identical
MinGW-w64 ffmpeg 4.4.3 installed by MSYS2 on Windows 10 (x86_64): $ du -h /mingw64/bin/ff{mpeg,probe}.exe
320K /mingw64/bin/ffmpeg.exe
184K /mingw64/bin/ffprobe.exe
$ diff -s <(Dependencies -depth 1 -modules /mingw64/bin/ffmpeg.exe | tail -n +2) <(Dependencies -depth 1 -modules /mingw64/bin/ffprobe.exe | tail -n +2)
Files /dev/fd/63 and /dev/fd/62 are identical
("Dependencies" is Dependencies.exe from https://github.com/lucasg/Dependencies)I'll explore these options <3
(Currently if one uses something like a SiliconDust HDHomeRun, viewing an ATSC 3.0 stream requires using their app/player, which uses a SiliconDust cloud service to do the decoding. It'd be really nice to have a not-network-dependent way to view/hear OTA broadcasts.)
I think there are still slight audio sync issues, because I should reencode video files individually but I do a single pass instead to not deal with leftover files.
I'm quite happy with the result.
I wish I could write a script to do some basic effect on a video, like adding some moving text. The ideal would be to have an animated SVG file and make a video out of it.
Though I am curious whether there is a specific reasoning behind a name in relation to a version (alphabetically, numerically, chronologically etc.)?
That is captial 'A' A W E S O M E. FFmpeg is always on the bleeding edge, love it.
Until they keep away from c++ or the ISO planned obsolescence of the C language, keep the SDK minimal, I guess I tolerate their excessive heavy use of nasm macro preprocessor.
If the video is mostly about versioning, well since I use a weekly ffmpeg git...
It has replaced all kinds of various programs I used to use, such as netpbm and imagemagick. It's just better.