Home > Uncategorized > bit rate, file size, quality misunderstandings

bit rate, file size, quality misunderstandings

It seems quite a few people have been confusing how bit rate, file size and quality are inter-related. A number of people have been using the -sameq option from the ffmpeg CLI and expecting it to output a file of the same size and quality as the original, but they observe it creates files that are much larger. So, what do the terms mean and what’s going on?

When specifying the -b <bit rate in bits per second> option to ffmpeg, one is requesting that the video encoder target the specified bit rate as an average over the entire video stream. If you have some audio or video file, playing it back takes some amount of time i.e. it has a duration. The file also has a size in bytes. The overall average bit rate is simply the number of bits per second averaged over the duration of the entire file. So:

overall average bit rate = file size in bytes * bits per byte / duration in seconds

It should be noted that the file may contain, for example, an audio stream and a video stream. So in fact we observe something more like:

overall average bit rate = average audio bit rate + average video bit rate + average overhead bit rate = file size in bytes * bits per byte / duration in seconds

The audio and video bit rates being constituents of the overall average should be clear, but what is this overhead? Metadata tags (like the artist, album and title of a music file) could contribute to this, but they are a fixed size regardless of the duration of the file.

The rest of the overhead, beyond metadata and header information about the streams within the file, consists of various information that is part of the container format the file is using. These are things to note where one section of data begins, the time at which it should be displayed/presented to the user and other bits and pieces of information. This normally does not contribute a large amount to the overall average bit rate, but it’s worth being aware of to avoid confusion later.

Bit rate need not be the overall average however. The term bit rate simply refers to an amount of data with a corresponding duration for transmission/playback or so.

So, now we know how the bit rate and file size (if producing a file) are related, but where does quality come into the equations? There are two main ways to affect the quality of, for example, a video that one wishes to encode.

Most video encoders have a plethora of options that affect this thing called ‘quality’. Quality is subjective and that makes it difficult to quantify. The options that are supposed to improve quality, usually make the encoding process slower. This leads to a speed/quality trade off. That is, you have to consider the relation between two questions:

  • How long am I willing to wait to encode this file?
  • How good do I need the quality to be?

Normally the answer to the second question is “as good as it can be” within the constraints of the answer to the first question. More simply, people want the best quality from whatever amount of encoding time they deem reasonable for their job.

The other way that one can affect the perceived quality of the resulting video is to affect the bit rate used to encode it. If you try to encode 1080p video at 200 kilobits per second, it will look awful no matter what encoder and options you use. If you try to encode 1080p video at 200 megabits per second, it will probably look good no matter what encoder and options you use.

How should one approach making the decisions about what encoding options to use? Normally bit rate/file size are the most major consideration, whether streaming with an upper bound on the bandwidth available when transmitting the stream or creating a file that needs to fit on a 700MB CD. This will impose your first constraint, unless you have some terabytes of storage and don’t really care about the size too much.

After that you need to consider how long you’re willing to wait to encode something as this will impact on the options you use with your encoder of choice to obtain the best quality within your more major constraints. If the quality is not good enough within your constraints, you may need more powerful hardware or to re-evaluate the available bandwidth/storage space.

Finally we come to the -sameq option in ffmpeg. There are a number of different ways to control how the bits are allocated throughout the video. Normally one wants to observe constant quality throughout a video stream. If some frames are good and some bad, it is actually more noticeable and annoying than if all frames do not look quite as good but are consistent.

Some frames are more difficult to compress than others and so need more bits to maintain that consistent level of perceived quality. Conversely, some are easier to compress and so need less bits.

If one conducts two passes when encoding a video stream, this benefits the encoder because it already has a log of the relative complexity of each frame so it can better predict how to allocate the bits.

Traditionally, if conducting only one pass with a variable bit rate, the encoder has to try to guess based on whatever information it can from frames that have already been encoded. There are some ideas around to try to improve the bit allocation when doing one pass but I won’t discuss those here.

MPEG, and other, video codecs use a method called quantisation. This topic is also a bit too complex to discuss here. The quantiser values used in MPEG-1/-2/-4 ASP are 1, 2, …, 31 and for H.264 they are 1, 2, …, 51. Quantisers are how the encoder controls the allocation of bits to frames and parts of frames. The method of quantisation used in H.264 is different to the older codecs.

Confused? Don’t be. A frame compressed using a high quantiser will use less bits and have lower perceived quality than a frame compressed using a low quantiser. One can encode using a constant quantiser in ffmpeg by using the -qscale <quantiser> option. For MPEG-4 ASP (-vcodec mpeg4) using a value somewhere in the region of 3-5 for -qscale should produce good quality. For H.264 (-vcodec libx264) a value somewhere around the late teens to mid 20s should suffice.

So what does the -sameq option do? It uses the quantisers used in the source video stream to encode the output video stream. If converting from MPEG-4 ASP (e.g. Xvid or DivX) to H.264 (e.g. x264) this will inflate the file size considerably because of the different way these formats conduct quantisation.

Regardless, I would pretty much never recommend the use of -sameq. It does not mean “encode at the same quality as the source file”, nor does it mean “encode resulting in a file of similar size to the source file”. It just means “encode using the same quantisers” and the resulting video size and quality will likely be very unpredictable.

The conclusion: use -b with one or two passes. Or, if you’re using libx264 and you’re more concerned about quality than file size/bit rate and only want to do one pass, try -crf which uses an intelligent way of maintaining constant quality while using only one pass.

Categories: Uncategorized Tags:
  1. cynyr
    May 1st, 2009 at 23:46 | #1

    Is there an option to use something like -crf in a 2 pass encode and would there be any benefit to doing an encode that way?

  2. May 2nd, 2009 at 10:19 | #2

    You could use it for the first pass if you know the value of CRF you will choose will achieve a bit rate close to the one you desire for the targeted bit rate of the second pass. The closer the quantisers are that are used in the first pass, to the actual ones chosen in the second pass, the better the prediction that can be made using that information I think.

    However, I think I recall that most of the error would be worked around within the first few frames of encoding. I don’t think it’s particularly beneficial though else it would be recommended by the x264 developers.

  1. No trackbacks yet.
You must be logged in to post a comment.