Pixel format

Color spaces define how colors should be reproduced, and what specific colors should be generated, but they do not define how color data is saved by the computer. It’s the pixel format which will define what types of values and how they’re saved. This format is completely independent from the color spaces, but it will influence the image quality and the amount of shades that can be stored.

Each pixel is composed of different color values, they’re the channels*, which are in general (but not necessarily) either red, green and blue (it’s the RGB format), or a luminance* and two chrominances* (it is the YUV format).

In the case of YUV, there’s sometimes chrominance subsampling, which reduces the amount of data to be stored with a minimal (and almost indiscernible) loss of quality.

Finally, for each channel, you can choose the range and precision of the saved values.

RGB or YUV

Any visible color can be represented by only two complementary* colors and primary* colors. As soon as three primaries are defined, one can obtain a wider range (a gamut*) of visible colorsby varying their proportions, to form a useful (and sufficient) set of colors. The majority of color reproduction devices (screens, projectors…) therefore use three primary colors.

These colors are mostly reds, greens, and blues; they may vary depending on the device and the color space* they use but are always within this range.

In any case, all the color coding systems use a group of different values corresponding to precise primaries or properties; they’re called channels*, generally three in number.

RGB

RGB

A numerical division of color information generally uses the same primaries* as most reproduction devices. Red, Green and Blue are the three R, G, B channels of this system.

Decomposition of an image into its three RGB channels

There are several reasons to use of these primaries and this color representation system:

Historically, and for reasons of performance and storage, another system is very widespread: YUV.

YUV

YV

In the first uses of an electrical signal (analog, not digital at first) to represent videos, the signal was a “simple” one-dimensional signal: videos were represented only in a range from black to white through grays3. In other words, one stored and reproduced only the information of luminous intensity, the luminance*.

Then came the color televisions, but it was necessary to keep the compatibility of the signal with the oldest black and white televisions; the color information, the chrominance was therefore added to the luminance signal, without modifying the original signal; the old black and white televisions simply ignored this additional information.

This system isn’t an RGB system but uses three YUV channels4, where Y represents the luminance, and UV represent two chrominance information (containing respectively the blue/green and red/green ratios).

Breakdown of an image into its three YUV channels

What is interesting about this system isn’t only historical: as we’ve seen previously, the human eye distinguishes better the contrasts of luminance* than of chrominance. Separating this information makes it possible to treat them differently, and in particular to reserve a quantity of information, a higher resolution, in the luminance channel than in the chrominance channels, and therefore to reduce the quantity of information to be stored and transmitted without notable loss of quality.

Comparison

hese are the two main color coding systems, independent of color spaces in use5.

Here are the main differences:

Others

There are other less common combinations of channels and pixel formats, for specific uses or containers*.

For example, some image formats use a palette of colors instead of several blended primaries, and thus have only one channel per pixel, which value corresponds to a predefined color6.

Other formats store only grayscale, and therefore just one channel of brightness. There are also more exotic formats with two channels…

YUV 4:4:4, 4:2:2, 4:2:0… Chrominance subsampling

YUV has the advantage, compared to RGB, to be able to function in practice in a way closer to the human perception which is more performant in luminosity*. Indeed, by separating the luminance* from the chrominance*, it’s possible to decrease the quantity of data recorded in chrominance to support the luminance, without losing the perceived quality and thus to compress the video data efficiently.

To operate this quality reduction in chrominance channels, the resolution, the number of pixels, is simply decreased; it’s what’s called chroma sub-sampling.

Chroma subsampling is therefore a lossy compression method that’s completely independent of the encoding standard* (codec*) of the video.

The acronyms 4:4:4, 4:2:2, 4:2:07… describe how the subsampling is done and indicate the amount of lost data. This description is made from a 4 by 2 pixel grid.

4x2 pixels

The first value of the trio represents the resolution (sampling) of the luminance*.
The second value represents the subsampling of the chrominance* on the first line (all odd lines), while the third represents this subsampling on the second line (all even lines).

A fourth value is sometimes added to the acronym and represents in this case a subsampling in the alpha* channel of the video8.

The amount of data saved can easily be calculated by adding the three values and dividing by 12 (or 16 if there is a separate value for alpha).
For example:

Tip

In case of black and white video, chrominance channels are completely useless, so one can choose a chroma subsampling with less chrominance data.

4:4:4

444

The 4:4:4 subsampling in YUV is the only equivalent to RGB in terms of quality (and quantity of data). There is in fact no subsampling in this mode and all pixels contain chrominance* and luminance* information.

It’s not used in broadcasting but only in production (or for archiving). Indeed, the bitrate would be too high, but this data is essential for post-production, especially when using green or blue screens (chroma-key): since masking is done using chrominance information, it’s absolutely essential to have the full resolution.

Warning

Unfortunately, only high-end professional cameras and recorders can record in 4:4:4, many cameras record in 4:2:2, or only in 4:2:0 for entry level devices.

4:2:2

422

In 4:2:2, the resolution of the chrominance* is half that of the luminance (so the total amount of data is reduced by one third). The loss is imperceptible, which makes it a very efficient way of compressing video. This mode is used in production (as long as there is no chroma-key, green/blue screens), in high-end formats and in high quality broadcasting (especially in television).

The horizontal resolution of the chrominance is reduced by half while the vertical resolution is kept.

4:2:0

420

In 4:2:0, the chrominance* resolution is reduced by half on even lines, and completely removed on odd lines. The amount of data is thus reduced by half overall, but the difference is still very difficult to perceive, which makes it a very good broadcasting format9.

This mode is the main one in consumer computer files and on the internet. Many software players, and most hardware players (blu-ray players, smart TVs, etc) only support 4:2:0.

It’s to be proscribed in production in case of color correction or chroma-keying; the chrominance information is very insufficient ( “staircase” artifacts can easily appear, due to the lack of resolution in chrominance).

Both the horizontal and vertical resolution of the chrominance are reduced by half.

Color depth (bpc)

Regardless of the chosen color space, and whether it’s for exported files or the working space, the color depth parameter describes the accuracy of the values stored in each channel* of pixels.

Contrary to a widespread idea, the color depth doesn’t really influence the quantity of visible colors, but rather the precision of the calculations, and thus the number of “sub-shades” usable within the chosen color space. In other words, the depth doesn’t change the gamut*. We will speak here rather about the quantity of shades rather than the quantity of colors to avoid this confusion.

By defining the precision of the values, and the amount of stored data, the color depth directly influences the file size.

This is usually measured in bits* (meaning per pixel) or in bpc (bits per channel). The more bits (0 or 1) you use to store the pixel (or each of the channels), the more space the file will take up but the higher the accuracy (and therefore the quality).

Depending on the system, the standards vary, especially because of the YUV chrominance subsampling.

In RGB

In RGB each channel* contains as much information (there is no subsampling), and if in theory one could imagine an arbitrary number of bits* to store the channels (and it’s the case in some file formats), one generally uses multiples of 8 (and accordingly full bytes10)

8 bpc / 24 bits / 24 bpp / 32 bits with alpha

Most images use 8 bits per channel. With 8 bits, 28, that is 256, different values can be encoded (from 0 to 255). With three channels, there’s thus a total of 83, that is, a little over 16 million, different values for a pixel.

This quantity of nuances is necessary and sufficient so that the human eye doesn’t distinguish “steps” in the images with a gamut* and maximum intensity like that of the sRGB, and is therefore the most widespread in the digital RGB images intended for computer screens.

But this quantity isn’t sufficient when working on images, as a workspace. Indeed, when modifying images, computers perform calculations on the different channel values, and with only 256 integer values, these calculations lead to an important loss of precision, very quickly visible11.

This depth is also not sufficient for TV or cinema, which use a higher depth for display and possibly high dynamic ranges (HDR) (possibly associated with a wider gamut in spaces other than sRGB).

To be able to work without degrading the image, the color depth of the workspace must be increased.

16 bpc / 48 bits / 48 bpp / 64 bits with alpha

By adding a byte for each channel, the number of available shades is greatly increased. In fact, the number of available values for each channel increase to 216, or 65536. This makes a total of 655363, or several trillion, shades per pixel.

As a general rule, these 16 bits per channel provide the necessary precision for fine work on the image, but may still be insufficient in specific cases:

The color depth can therefore be increased in these cases.

32 bpc / 96 bits / 96 bpp / 128 bits with alpha

Warning

Do not confuse 32 bpc (per channel) with 32 bits or 32 bpp (per pixel) which is actually only 8 bpc (with an alpha channel)!

A third byte per channel is added, which further exponentially increases the number of available shades, with 232, 4 billion, possible values per channel*, that is to say a number that can be considered infinite, of shades per pixel.

This mode is the one which virtually allows a work without any loss on the image, whatever the color space used, whether it’s linear or not, but becomes very heavy in terms of memory.

While it can be useful as a working mode, it’s in fact rarely used for storing files (even intermediate ones) where 16 bits per channel are often enough14.

In YUV

In YUV, which is never used as a working system but only for storage and broadcasting, the depth used is different from RGB systems.

It’s usually named after the number of bits used by the luminance channel per pixel; indeed the number of bits* per channel* doesn’t really make sense with the different possible chrominance* sub-sampling types.

By taking into account the chrominance subsampling, the average number of bits used per channel can be calculated (and therefore the approximate size of an image without compression by multiplying by the number of channels), even if in reality, the number of bits used by the chrominance is the one indicated for pixels containing chrominance, and zero for the rest.

The various modes consequently differ only in terms of quality, and as a general rule, the more the resolution of the image and the gamut* of the color space are increased, along with the dynamic range, the more the depth has to be increased, to ensure that the color gradients remain fine and without “banding” effect.

8 bits

This is the most common depth in computing; most computer screens can’t display more than this.

This is the standard depth for HD video in Rec.709.

10 bits

This is the standard depth of high-end and UHD videos in Rec.2020.

12 bits

This is the “HDR” depth of high-end video and UHD in Rec.2100.

Others

There are other depths, starting from 1 bit per pixel (monochrome images), depending on specific uses. For example, an image using a palette of 256 colors like as those found in the GIF format or some PNG files use 8 bits per pixel (and therefore per channel too, since there is only one channel in this case).

Integer or floating point values

The “simple” encoding of images with 8 or 16 bits per channel, used for example in the PNG format, and by broadcast devices (screens, projectors, etc.) generally uses integers*. For example, with 16 bpc, the values of each of the channels range between 0 (black) and 65535 (maximum intensity). Although the range of shades available is very fine, this method imposes a limit on maximum brightness. This limit isn’t a problem for image reproduction, when some maximum brightness does exist on the hardware used; in this case, the highest value will correspond to the maximum brightness of the hardware1.

Using integers can, however, become problematic when generating digital images (3D rendering, or working on so-called HDR videos and images), in particular because it limits the maximum intensity value of light sources. Another system is then possible, it’s the use of floating point numbers*; that is to say positive numbers, which can go beyond 1.0. The precision of these numbers (the number of digits composing them) then depends on the number of bits available (16 or 32)2.

Digital image processing software therefore almost always offers a workspace which can use either integers with 8 or 16 bits per channel, or floating point values with 32 bits per channel (and in the latter case, linear too, to simulate light and mix colors in a physically more accurate way).

It’s during the conversion to the broadcast format that the data will be converted to integers with a simple rule: 1.0 corresponds to the maximum brightness (the highest possible integer), and all values higher than 1.0 (whiter than white) are lost. It is in this specific case that the color management and using the right color spaces is the most important: it will define how the physical data, in floating point numbers, which values are almost infinite, are compressed in a range of defined integers. It’s exactly the same principle as when the exposure* of the sensor of a camera is adjusted (or as the contrast* adjustment which will be made later on the pictures).

Full range / Limited / TV / PC ?

When encoding video (and decoding it), an important parameter is the color range. This parameter has its historical origin at the time of the switch from analog TV to digital RGB screens. It gives a range of possible levels on each of the color channels* (red, green, blue).

Full range / PC

Digital computer screens use the full range of red, green and blue levels for color reproduction. With the most common 8 bits* per channel*, this means that each channel stores values between 0 and 255.

0 represents black, and 255 represents white.

Limited range / TV

Televisions are expected to use the range known as limited of the levels; this range is originally adapted specifically to represent more correctly the contrasts of the films and corresponds with 8 bits* per channel* to values between 16 and 235.

This means that in television, the value 16 represents black, and the value 235 represents white. All values below 16 are ignored (blacker than black) as well as all values above 235 (whiter than white) 15.

Practical conclusion

It’s therefore necessary to know both your hardware when reproducing a video, and what to do when encoding it.

Encoding

In the vast majority of cases, video standards recommend encoding in limited range / TV: videos are intended to be seen in television conditions (including on the Internet). This is the case for example for mp4 in h.264 or h.265, for mkv, and for all broadcast formats.

On the other hand, image formats (PNG, Jpeg, openEXR, etc.), as well as intermediate video formats (those used during production and not broadcasting, such as Prores), being intended for a computer environment, use rather the full range / PC.

It’s important to respect these standards to be sure that the files are correctly interpreted by the viewers’ equipment, and always inquire about the formats recommended by the broadcasters.

Playback and display

When playing videos, it’s also necessary that the whole system is correctly configured; a common (not to say recurrent) problem on computers is that videos are left in limited range while the screen uses the full range.

During playback, the video source must be converted to match the screen or projector. Without conversion, a limited range video displayed on a full range screen will be “dull”: there will be no black or white, the range of the image going only from light to dark gray. Conversely, a full range video on a limited range TV will have a loss of information in both highlights and shadows, with large parts completely black or white (the image will be too contrasted).

Example of wrong display

So the hardware must be set up correctly. On a disc player or console, there must be a setting to specify whether the connected display uses the full range or the limited range (as a general rule, a computer screen or video projector uses the full range, a TV uses the limited range).

On a computer, things can be a bit more complex: you have to start by checking the settings of the graphics card driver, usually in a section called “video”, and specify full range / PC (unless it is a TV that is connected to the computer)16. If after having set this parameter variations are still visible, you have to check that the application used to play the video doesn’t make a bad conversion (for example Quicktime on Windows was known for this17); most of these applications should however let the graphics card do this conversion and not cause any problem (this is the case of VLC, of the video player of Windows, of Totem under Linux…).

Screenshot of Nvidia settings
Example of parameters via the settings of an Nvidia graphics card (under Linux). Note especially here the color range parameter, to be set to Full if the screen is a computer screen, and Limited if it is a TV.

Warning

On some hardware, an “Auto” option is available in addition to full / limited range. In this case the hardware tries to detect the type of the connected screen. Since this is a parameter that should only be changed when the screen is changed, it’s strongly recommended to set it manually.


Sources & References


  1. Or the one specified by the color space, which may define the white point brightness. 

  2. This is also what defines the famous 32 bits or 64 bits of processors (and software) (in 2024, almost all are called 64 bits): it’s the precision of the numbers that they’re capable of using in their instructions. When computer simulations of lights first appeared, the standard was still 32 bits, and 32 bpc ere naturally used with these float values. To facilitate storage, however, certain formats such as openEXR suggest using 16 bpc in float mode, which is sometimes called half-float (and is more than sufficient for storage, even if the calculations are done with 32 bits per channel in the application). 

  3. The right terms are luminance* or luminosity* or luma* :
    - The luminance has a linear transfer curve.
    - The luma/luminosity/brightness has a gamma.
    When talking about luminance, YUV is used, while when talking about brightness it should be called Y’UV, but most of the time the prime is omitted.
    See chapter Transfer curves, linear space and gamma

  4. The general term YUV actually covers two families, each declined in luminance or luminosity (Y or Y’):
    - In analog the right terms are YUV and Y’UV, or sometimes YPbPr and Y’PbPr.
    - In digital the exact terms are YCbCr or Y’CbCr, and may be YCC.
    But it’s generally the term YUV that is used in all these different cases… 

  5. In fact, YUV and RGB can be used interchangeably, but some standards and color spaces specify one or the other, or both. For example sRGB is specifically intended to be used on RGB encoding, while Rec.709 specifies that it can be used in both RGB and YUV

  6. This is the case for GIF which contains a maximum of 256 different colors in its palette. This principle can also be used for PNG and other formats. The aim is to reduce the overall size of the file by storing less information (only the palette with the colors described in RGB, and only one channel per pixel; this way the overall size can be reduced by a factor of three). 

  7. This list are the most common subsamples, but there are rarer ones (4:2:1, 4:1:1), or more complex or downright exotic ones (3:1.5:1.5, 3:1:1)… 

  8. By default, the subsampling of the alpha channel is the same as that of the luminance*

  9. This mode exists in the openEXR format: it’s the option noted “Luminance/Chroma”. 

  10. In current computing where the octet* is also the Byte*, the smallest unit of memory, one cannot use “half bytes” (or any other fraction). Using an integer number of bytes per pixel means that the number of pixels in the image can be completely arbitrary; if the number of bytes per pixel isn’t integer, several pixels will have to share bytes, and the number of pixels is constrained: the total number of bytes in the image must be integer. This is why the number of rows and columns in a mp4 video must be even for example. 

  11. It is easy to see why: let’s imagine that we have to divide a value of 127. The result will be rounded to 63 or 64. If other calculations follow, the precision drops very quickly and so does the quality. 

  12. In 2024, the standard for Ultra-High Definition (4K) Rec.2020, still little used in broadcasting but already standardized, which has a very wide gamut, recommends 12 bits in luminance for its “HDR” variation (i.e. Rec. 2100). 

  13. The Digital Cinema Package (DCP) (where the image is in fact encoded under the JPEG 2000 lossless standard) in use in 2024 encodes the colors in a XYZ space with 12 bpc

  14. Read the next section about the difference between using integer and floating point values. 

  15. Some blu-rays discs and game consoles take advantage of this limit to add speculars (brightness) beyond white and make them brighter. If the TV is compatible, it will display “super whites”, otherwise it will simply ignore this information, without affecting the image. 

  16. For a long time, NVidia graphics card drivers under Windows were configured to display videos in limited range by default… 

  17. Although loved by animators for its ability to easily play videos frame by frame, Quicktime under Windows is to be avoided for its poor color management; its development has been abandoned by Apple anyway.
    DJV, available for Windows, Mac OS, Linux, as well as BSD, replaces it efficiently and integrates a professional color management.