bit rate, file size, quality misunderstandings

It seems quite a few people have been confusing how bit rate, file size and quality are inter-related. A number of people have been using the -sameq option from the ffmpeg CLI and expecting it to output a file of the same size and quality as the original, but they observe it creates files that are much larger. So, what do the terms mean and what’s going on?

When specifying the -b <bit rate in bits per second> option to ffmpeg, one is requesting that the video encoder target the specified bit rate as an average over the entire video stream. If you have some audio or video file, playing it back takes some amount of time i.e. it has a duration. The file also has a size in bytes. The overall average bit rate is simply the number of bits per second averaged over the duration of the entire file. So:

overall average bit rate = file size in bytes * bits per byte / duration in seconds

It should be noted that the file may contain, for example, an audio stream and a video stream. So in fact we observe something more like:

overall average bit rate = average audio bit rate + average video bit rate + average overhead bit rate = file size in bytes * bits per byte / duration in seconds

The audio and video bit rates being constituents of the overall average should be clear, but what is this overhead? Metadata tags (like the artist, album and title of a music file) could contribute to this, but they are a fixed size regardless of the duration of the file.

The rest of the overhead, beyond metadata and header information about the streams within the file, consists of various information that is part of the container format the file is using. These are things to note where one section of data begins, the time at which it should be displayed/presented to the user and other bits and pieces of information. This normally does not contribute a large amount to the overall average bit rate, but it’s worth being aware of to avoid confusion later.

Bit rate need not be the overall average however. The term bit rate simply refers to an amount of data with a corresponding duration for transmission/playback or so.

So, now we know how the bit rate and file size (if producing a file) are related, but where does quality come into the equations? There are two main ways to affect the quality of, for example, a video that one wishes to encode.

Most video encoders have a plethora of options that affect this thing called ‘quality’. Quality is subjective and that makes it difficult to quantify. The options that are supposed to improve quality, usually make the encoding process slower. This leads to a speed/quality trade off. That is, you have to consider the relation between two questions:

  • How long am I willing to wait to encode this file?
  • How good do I need the quality to be?

Normally the answer to the second question is “as good as it can be” within the constraints of the answer to the first question. More simply, people want the best quality from whatever amount of encoding time they deem reasonable for their job.

The other way that one can affect the perceived quality of the resulting video is to affect the bit rate used to encode it. If you try to encode 1080p video at 200 kilobits per second, it will look awful no matter what encoder and options you use. If you try to encode 1080p video at 200 megabits per second, it will probably look good no matter what encoder and options you use.

How should one approach making the decisions about what encoding options to use? Normally bit rate/file size are the most major consideration, whether streaming with an upper bound on the bandwidth available when transmitting the stream or creating a file that needs to fit on a 700MB CD. This will impose your first constraint, unless you have some terabytes of storage and don’t really care about the size too much.

After that you need to consider how long you’re willing to wait to encode something as this will impact on the options you use with your encoder of choice to obtain the best quality within your more major constraints. If the quality is not good enough within your constraints, you may need more powerful hardware or to re-evaluate the available bandwidth/storage space.

Finally we come to the -sameq option in ffmpeg. There are a number of different ways to control how the bits are allocated throughout the video. Normally one wants to observe constant quality throughout a video stream. If some frames are good and some bad, it is actually more noticeable and annoying than if all frames do not look quite as good but are consistent.

Some frames are more difficult to compress than others and so need more bits to maintain that consistent level of perceived quality. Conversely, some are easier to compress and so need less bits.

If one conducts two passes when encoding a video stream, this benefits the encoder because it already has a log of the relative complexity of each frame so it can better predict how to allocate the bits.

Traditionally, if conducting only one pass with a variable bit rate, the encoder has to try to guess based on whatever information it can from frames that have already been encoded. There are some ideas around to try to improve the bit allocation when doing one pass but I won’t discuss those here.

MPEG, and other, video codecs use a method called quantisation. This topic is also a bit too complex to discuss here. The quantiser values used in MPEG-1/-2/-4 ASP are 1, 2, …, 31 and for H.264 they are 1, 2, …, 51. Quantisers are how the encoder controls the allocation of bits to frames and parts of frames. The method of quantisation used in H.264 is different to the older codecs.

Confused? Don’t be. A frame compressed using a high quantiser will use less bits and have lower perceived quality than a frame compressed using a low quantiser. One can encode using a constant quantiser in ffmpeg by using the -qscale <quantiser> option. For MPEG-4 ASP (-vcodec mpeg4) using a value somewhere in the region of 3-5 for -qscale should produce good quality. For H.264 (-vcodec libx264) a value somewhere around the late teens to mid 20s should suffice.

So what does the -sameq option do? It uses the quantisers used in the source video stream to encode the output video stream. If converting from MPEG-4 ASP (e.g. Xvid or DivX) to H.264 (e.g. x264) this will inflate the file size considerably because of the different way these formats conduct quantisation.

Regardless, I would pretty much never recommend the use of -sameq. It does not mean “encode at the same quality as the source file”, nor does it mean “encode resulting in a file of similar size to the source file”. It just means “encode using the same quantisers” and the resulting video size and quality will likely be very unpredictable.

The conclusion: use -b with one or two passes. Or, if you’re using libx264 and you’re more concerned about quality than file size/bit rate and only want to do one pass, try -crf which uses an intelligent way of maintaining constant quality while using only one pass.

Posted in ffmpeg, multimedia, video, x264 | 3 Comments

Updated the FFmpeg x264 encoding guide

After receiving some criticism of my FFmpeg x264 encoding guide a day or two ago, I decided to make the guide a bit more detailed so that readers could better understand how to construct the necessary command lines to best convert streams to x264 using FFmpeg.

I also moved the ‘old, manual way’ of explicitly specifying all options to a separate page but I won’t link to it in this post because I don’t want people to use it. :)

Posted in ffmpeg, multimedia, x264 | Leave a comment

FFmpeg release! :D

I’m not sure how, but we managed to do it – we finally made an FFmpeg release! See the FFmpeg website for details.

EDIT: And there’s an interview with Baptiste Coudurier, Diego Biurrun and myself about the release and other related FFmpeg and digital multimedia things over at phoronix.

Posted in development, ffmpeg, multimedia | Leave a comment

Main and SBR

Alex Converse has provided a patch to add frequency domain prediction to the FFmpeg AAC decoder which completes the tools required for Main profile support. This is included in FFmpeg trunk as of r15919.

SBR is coming along quite nicely and I hope to have it working soon. :)

Posted in AAC, audio, development, ffmpeg, multimedia, work | Leave a comment

changes to x264 and ffpresets

Jason Garrett-Glaser has recently committed changes to the way some options are specified to x264 regarding subme, bime and brdo. To quote the commit message:

Rework subme system, add RD refinement in B-frames

The new system is as follows: subme6 is RD in I/P frames, subme7 is RD in all frames, subme8 is RD refinement in I/P frames, and subme9 is RD refinement in all frames.

subme6 == old subme6, subme7 == old subme6+brdo, subme8 == old subme7+brdo, subme9 == no equivalent

–b-rdo has, accordingly, been removed. –bime has also been removed, and instead enabled automatically at subme >= 5.

RD refinement in B-frames (subme9) includes both qpel-RD and an RD version of bime.

The ffpresets I have been hosting and linking to on this site are now in FFmpeg trunk in the ffpresets subdirectory. I will maintain them there and the files on this site should no longer be used.

I will be writing presets for H.264 profiles/levels, iPods and such soon and editing the guides to advocate their use as it will simplify the command lines significantly.

Posted in ffmpeg, multimedia, x264 | 2 Comments