5/1/13

Canon 5DMK3 clean HDMI output - first impressions (+ download uncompressed files)

Hi all,

Yesterday Canon released their so sought-after update for the Canon 5DMK3.
To be honest, for me, the cross type AF is not as interesting as the clean HDMI output.

So I've recorded initial test footage for some testing:

Youtube:


Download uncompressed:
http://www.solidfiles.com/d/579e4310bc/

Youtube:


Download uncompressed:
http://www.solidfiles.com/d/5a88f92606/

(The files are huge so the ones I've uploaded are very short)

Equipment:
Camera: Canon 5D Mark 3
Profile: Neutral - 0,-4,-2,0
Lens: 24-105L F4 @ F8
Shutter speed: 1/160
Recording device: BlackMagic Shuttle

Brief review:
Canon declares that the output is 8bpc@4:2:2, which is a little bit disappointing especially when the HDMI standard supports 10bpc and I almost sure that in Canon cameras the video processing pipe works at least in 12bpc.
I decided to go beyond Canon's declares and check if I can force it to push 10bpc, in the following way:
- BlackMagic Shuttle will always record at 10bpc; if the input is 8bpc, the Shuttle bit shifts the values (multiplies by 4: 0->0, 1->4 ... 255->1020).
- I've extracted a single YUV frame and warped it in 16bpc (w/o bit shift).
- In hex editor you get, for example, this (in this case it is the U channel):
And you can see that each value (little endian) divides by 4 without fraction.
A801 = 424 -> 424/4 = 106
9C01 = 412 -> 412/4 = 103
etc...

So yes, the conclusion is that the output of the HDMI in the 5D is 8bpc only.
Think how would it be if it was 10bps... In short, the following operation theoretically would give an image without the ugly banding:

The question that should be asked now is: Does it really give 4:2:2 chroma sub-sampling?
Well, I don't know in this moment. Canon may perform scale-up algorithm other than nearest-neighbor (x2 width) and you will never find cloned pixels (although you can see a lot of typical duplicates in the hex viewer, but it is not an indicator), especially in the so soft Canon's 3x3 sensor binning with flat picture style.
For for this moment, lets believe it is pure 4:2:2, and maybe in the future posts I'll do some more in-depth test that will approve or refute the claim :)

That's all for today,
Mark.



11/16/12

REC.601/709 and luminance range explained

WOW, the amount of confusion regrading this issue is phenomenal.
In my opinion this topic should have the highest priority for every video shooter, otherwise you'll be shooting yourself in the legs (i.e. ruining your footage - loose dynamic range and get incorrect colors).

Foreword - RGB & YUV

RGB (8bpc) - 3 planes (Red, Green, Blue):
Each pixel has three components (red, green, blue), each component has a value from 0 to 255. The combination of these components produces the final color and luminance of the pixel.

YUV (8bpc) - 3 planes (Y, Cb, Cr):
- Y - Full resolution plane that represents the mean luminance information only.
- U(Cb), V(Cr) - Full resolution, or lower, planes that represent the chroma (color) information only. Absolute point at 128.


Compressed video will be mostly in YUV because of the ability to subsample the chroma - this way saves lots of bandwidth. Subsampling is actually encoding smaller chroma (Cb, Cr) planes, and stretching them back in the conversion to RGB (when displaying). It relies on the fact that our eyes are not so sensitive to color information. Moreover, a definition of luma channel may avoid DC changes while compressing the footage.
Lets take for example raw RGB or YUV4:4:4 (no subsampling) stream: 3 planes ("planes" as the units of measure).
YUV4:2:2 : 2 planes (Full res. Y + half res. Cb + half res. Cr)
YUV4:2:0: 1.5 planes (Full res. Y + quarter res. Cb + quarter res. Cr)

YUV4:4:4/RGB = 1.5 * YUV4:2:2 = 2 * YUV4:2:0 -> We save up to half of the bandwidth with almost no visible loss of information (however, image processing algorithms are sensitive to it).


Compressing luma range

Lets take the Y channel from the YUV and say that instead of using all the 0-255 range we will compress it to 16-235, so we will leave everything under 16 and higher than 235 empty. When converting the frame back to RGB we will stretch it back to the original full range. It shouldn't really concern you why it is done, but you should know that it is done - moreover, it's a standard that comes from the analog world of imaging. You can read regarding the origin of this here.

If you interpret full range as full range or limited as limited, you're on the right way.
In case of limited range as full range:
You're not going to loose details when displaying but you will get incorrect (shifted) picture. However, when re-encoding the footage (transcode) the encoder will probably compress the entire range once again. Same dynamic latitude on less grey levels may cause loss of information and banding (posterization).
It can be understood from this curve:


In case of full range as limited range:
The amount of the details in the shadows that are lost can be clearly seen from the image above. When displaying, the image looks over-contrasted. It can be fixed by compressing luma range or forcing the monitor to show the full range. However, when transcoding the footage, the encoder (that thinks that it is compressed range) might drop everything that is not in the range (crashing). The lost details can't be recovered later.
The curve below demonstrates what happens in this case:
BTW, in curves, any situation of more than one "in" for any "out" value = loss of information.
Of course I must mention the hightlights regression:
In conclusion, when we interpret full luma range as limited range we loose all the "stops" of dynamic range that exists in the grey levels below 16 and above 235.


REC.601 vs. REC.709

RGB <=> YUV conversion has a formula. You can use different multipliers (color matrix) and get (nearly) the same image as long as you use the same multipliers for converting to YUV and converting back to RGB.
REC.601 and REC.709 (and the future REC.2020) are examples of such multipliers.
Video interpreter will convert to RGB by default: a standard definition video using REC.601 coefficients and a high definition video using REC.709 coefficients.
If we take a video that was encoded using REC.601 matrix and decode it using REC.709 matrix, the result will have wrong and shifted colors - especially can be noticed on the red tones, i.e. might screw up the skin tones. For instance:


What does our camera do (emphasis on HDSLR)?

Unfortunately many manufacturers do not obey to standards and even sometimes the embedded metadata does not match the actual parameters.

The most accurate way to know what your camera/DSLR does in aspect of luma range and color matrix is to take a photo of a scene, then take a video of the same scene (same image settings). Overlay the video over the photo and tweak its luma range and color matrix until you get the video identical to the photo. Now you really can be sure that the image is the same as the camera manufacturer meant it to be.

A very popular example to the above is the Canon HDSLR cameras range until the 5DMK3:
Full luma range and REC.601.
This particular case is actually good for us because:
1. As long as we force to decode the footage as REC.601 we shouldn't have color problems.
2. The full range gives us more grey levels to push more dynamic range without ruining the image. This is what the flat image styles intended to do.

How to export the correct image from After Effects?

Once we get the required look of the video in After Effects preview window, we want it to look identically in any player that will be used: Youtube, Vimeo, local MMP, etc.. Sadly, in most of the cases it differs.
So what should be done? (the following explanation is for the case when the color management is off)
In After Effects, add your final composition to new composition (or just add an adjustment layer above everything). Drop the "Color Profile Converter" effect.

Choose (for HD content): HDTV 16-235.


Please note that the image will be shifted in the preview. But it will be constructed properly by the decoder of the media player.

If you turn on color management, do not set compressed luma for project's working space. Compress it only as the final step before exporting.

I hope this post helped,
Mark.

10/5/12

Why should we so appreciate All-I video in cameras?

WE SHOULDN'T.

(This article will discuss top level topic because digging to details will be pointless.)

Many manufacturers come with new encoder feature for their camera line. They call it All-I, iFrame & etc.
To explain it, I’ll start with the available encoder frame types:

I (Intra) = Intra frame that draws the image from scratch and compress it. Equivalent to JPEG image.
P (Predictive) = Inter frame that subtracts itself from previous “I” frame and the compression is applied on the residual. “Forward prediction”
B (Bi-directional) = Inter frame that can subtract itself both from previous “I” or “P” frames (forward prediction) and from the following “I” or “P” frames (backward prediction). The compression is applied on the residual.
("B" and "P" frames can also change their behavior in macro-block level to "skip" and "intra", if the encoder decides that it will save more bits)

There are 3 main arrangements for GOPs (group of pictures - the distance between 2 “I” frames): I-only, IP, IBP.



I-Only (All-I)
In this case every frame is unique and encoded separately without any temporal prediction. Actually it is like Motion JPEG (MJPEG). The most primitive encoder.


IP

Between 2 "I" frames will be a variable number of "P" frames. In cameras the distance between the "I"s is usually 15 frames (for 30 fps footage). Smart software encoders usually rely on scene changes to push "I" frames.

IBP

Between "I" and "P" frames or 2 "P" frames will appear a constant number of "B" frames. Cameras usually use 2 "B" frames. Software encoders will usually push from 1 to 5 "B" frames. The increase in "B" frames increases the complexity of both the encoder and the decoder exponentially.



After understanding the above, I won't be explaining why IP and IBP has extreme superiority over I-only - I will show you an example.

The video below was captured absolutely uncompressed. I've compressed it using X264 with "Camera like settings" (with most of the limitations that hardware has). Video's motion: continuous horizontal pan. The frame I've decided to grab is frame no.94 because it is a "B" type frame in IbbP GOP, i.e. the most compressed + the bit rate control got steady.

(CLICK TO ENLARGE)


Original images (Click to download):
Full Side-by-Side videos (Click to download):


So why are the manufacturers so proud of 20 years regression in compression technology?
  1. Good for editing purposes - well this is true. I-only is very light for today's workstations and enables you a real time playback of multi layered 1080p60 content. This is exactly why DNxHD and ProRes exist! Does it worth 4 times bigger files for the same quality? Personally, my workflow is to encode all the files to DNxHD 36mbps (terrible quality), edit them and then, when exporting, link all the timeline to the native files.
  2. They claim that IP and IBP may cause temporal artifacts (such as ghosting and trails). It is because they have crappy H.264 encoders (H.264 isn't a constant encoder - it very depends on how it is calibrated and which tools are in use) and instead of improving their H.264 encoder - they choose the cheap and easy way.
For more experienced users I would suggest to encode master videos using X264 in IP, ~90Mbps with less deblocking, High profile with 4:2:2 (Instead of DNxHD/ProRes). The result is almost lossless! 



In summary,
Don't be afraid of H.264 IP/IBP and don't surrender to I-only. We should demand improvement in cameras' encoder. 

Mark.