一、引言
FFmpeg源码对AnnexB包装的H.264码流解码过程中,通过ff_h2645_extract_rbsp函数拿到该H.264码流中的某个NALU的NALU Header + RBSP后(具体可以参考:《FFmpeg源码:ff_h2645_extract_rbsp函数分析》),如果从其NALU Header判断出该NALU的类型为SPS,会调用ff_h264_decode_seq_parameter_set函数对其进行解码,把解码出来的SPS属性放到uint8_t *指针指向的缓冲区中。之后对该缓冲区进行结构体指针强转(const SPS*)可以拿到该SPS的属性。本文讲解SPS结构体和ff_h264_decode_seq_parameter_set函数的内部实现。
二、SPS结构体的声明
SPS结构体的声明在FFmpeg源码(本文演示用的FFmpeg源码版本为5.0.3)的头文件libavcodec/h264_ps.h中:
/*** Sequence parameter set*/
typedef struct SPS {unsigned int sps_id;int profile_idc;int level_idc;int chroma_format_idc;int transform_bypass; ///< qpprime_y_zero_transform_bypass_flagint log2_max_frame_num; ///< log2_max_frame_num_minus4 + 4int poc_type; ///< pic_order_cnt_typeint log2_max_poc_lsb; ///< log2_max_pic_order_cnt_lsb_minus4int delta_pic_order_always_zero_flag;int offset_for_non_ref_pic;int offset_for_top_to_bottom_field;int poc_cycle_length; ///< num_ref_frames_in_pic_order_cnt_cycleint ref_frame_count; ///< num_ref_framesint gaps_in_frame_num_allowed_flag;int mb_width; ///< pic_width_in_mbs_minus1 + 1///< (pic_height_in_map_units_minus1 + 1) * (2 - frame_mbs_only_flag)int mb_height;int frame_mbs_only_flag;int mb_aff; ///< mb_adaptive_frame_field_flagint direct_8x8_inference_flag;int crop; ///< frame_cropping_flag/* those 4 are already in luma samples */unsigned int crop_left; ///< frame_cropping_rect_left_offsetunsigned int crop_right; ///< frame_cropping_rect_right_offsetunsigned int crop_top; ///< frame_cropping_rect_top_offsetunsigned int crop_bottom; ///< frame_cropping_rect_bottom_offsetint vui_parameters_present_flag;AVRational sar;int video_signal_type_present_flag;int full_range;int colour_description_present_flag;enum AVColorPrimaries color_primaries;enum AVColorTransferCharacteristic color_trc;enum AVColorSpace colorspace;enum AVChromaLocation chroma_location;int timing_info_present_flag;uint32_t num_units_in_tick;uint32_t time_scale;int fixed_frame_rate_flag;int32_t offset_for_ref_frame[256];int bitstream_restriction_flag;int num_reorder_frames;int scaling_matrix_present;uint8_t scaling_matrix4[6][16];uint8_t scaling_matrix8[6][64];int nal_hrd_parameters_present_flag;int vcl_hrd_parameters_present_flag;int pic_struct_present_flag;int time_offset_length;int cpb_cnt; ///< See H.264 E.1.2int initial_cpb_removal_delay_length; ///< initial_cpb_removal_delay_length_minus1 + 1int cpb_removal_delay_length; ///< cpb_removal_delay_length_minus1 + 1int dpb_output_delay_length; ///< dpb_output_delay_length_minus1 + 1int bit_depth_luma; ///< bit_depth_luma_minus8 + 8int bit_depth_chroma; ///< bit_depth_chroma_minus8 + 8int residual_color_transform_flag; ///< residual_colour_transform_flagint constraint_set_flags; ///< constraint_set[0-3]_flaguint8_t data[4096];size_t data_size;
} SPS;
SPS结构体中,成员数组data存放该SPS的“NALU Header + RBSP”。成员data_size为该SPS“NALU Header + RBSP”的长度,单位为字节。其它成员变量存放“通过ff_h264_decode_seq_parameter_set函数从该SPS解码出来的属性”。可以看到SPS结构体的成员变量跟H.264官方文档《T-REC-H.264-202108-I!!PDF-E.pdf》第44页到45页中描述的SPS的属性是一一对应的:
三、ff_h264_decode_seq_parameter_set函数的声明
ff_h264_decode_seq_parameter_set函数声明在FFmpeg源码的头文件libavcodec/h264_ps.h中:
typedef struct H264ParamSets {AVBufferRef *sps_list[MAX_SPS_COUNT];AVBufferRef *pps_list[MAX_PPS_COUNT];AVBufferRef *pps_ref;/* currently active parameters sets */const PPS *pps;const SPS *sps;int overread_warning_printed[2];
} H264ParamSets;/*** Decode SPS*/
int ff_h264_decode_seq_parameter_set(GetBitContext *gb, AVCodecContext *avctx,H264ParamSets *ps, int ignore_truncation);
该函数作用是:把形参gb指向的缓冲区中的二进制SPS数据中的每个属性提取出来,存放到形参ps->sps_list[sps_id]->data指向的缓冲区中(sps_id是该SPS的seq_parameter_set_id)。
形参gb:既是输入型参数也是输出型参数,为GetBitContext类型,用来对“位”进行操作(具体可以参考:《FFmpeg中位操作相关的源码:GetBitContext结构体,init_get_bits函数、get_bits1函数和get_bits函数分析》)。
执行ff_h264_decode_seq_parameter_set函数前:
gb->buffer为:指向某个缓冲区的指针。该缓冲区存放 SPS的“NALU Header + RBSP”
gb->buffer_end为:指向该SPS的RBSP的最后一个字节。
gb->index为:8。表示读取完了该SPS的第一个字节(表示读取完了8位的NALU Header)
gb->size_in_bit的值等于:该SPS “NALU Header + SODB的位数”,单位为bit(1个byte等于8bit)。
gb->size_in_bits_plus8的值等于:gb->size_in_bit + 8。
执行ff_h264_decode_seq_parameter_set函数后:
gb->index的值会增加该SPS存贮所占的位数,表示已经读取完了整个SPS。gb的其它成员变量的值保持不变。
形参avctx:输入型参数。用来在ff_h264_decode_seq_parameter_set函数内部打印日志,可忽略。
形参ps:输出型参数。执行ff_h264_decode_seq_parameter_set函数后,ps->sps_list[sps_id]->data指向的缓冲区会存放从SPS中提取出来的每个属性。(sps_id是该SPS的seq_parameter_set_id)。这样在ff_h264_decode_seq_parameter_set函数外部就可以通过“C语言的结构体指针强转”(const SPS*)ps->sps_list[sps_id]->data 获取结构体SPS中的成员,从而拿到SPS中的每个属性:
形参ignore_truncation:输入型参数。值一般为0,可忽略。
四、ff_h264_decode_seq_parameter_set函数的定义
ff_h264_decode_seq_parameter_set函数定义在FFmpeg源码的源文件libavcodec/h264_ps.c中:
int ff_h264_decode_seq_parameter_set(GetBitContext *gb, AVCodecContext *avctx,H264ParamSets *ps, int ignore_truncation)
{AVBufferRef *sps_buf;int profile_idc, level_idc, constraint_set_flags = 0;unsigned int sps_id;int i, log2_max_frame_num_minus4;SPS *sps;int ret;sps_buf = av_buffer_allocz(sizeof(*sps));if (!sps_buf)return AVERROR(ENOMEM);sps = (SPS*)sps_buf->data;sps->data_size = gb->buffer_end - gb->buffer;if (sps->data_size > sizeof(sps->data)) {av_log(avctx, AV_LOG_DEBUG, "Truncating likely oversized SPS\n");sps->data_size = sizeof(sps->data);}memcpy(sps->data, gb->buffer, sps->data_size);profile_idc = get_bits(gb, 8);constraint_set_flags |= get_bits1(gb) << 0; // constraint_set0_flagconstraint_set_flags |= get_bits1(gb) << 1; // constraint_set1_flagconstraint_set_flags |= get_bits1(gb) << 2; // constraint_set2_flagconstraint_set_flags |= get_bits1(gb) << 3; // constraint_set3_flagconstraint_set_flags |= get_bits1(gb) << 4; // constraint_set4_flagconstraint_set_flags |= get_bits1(gb) << 5; // constraint_set5_flagskip_bits(gb, 2); // reserved_zero_2bitslevel_idc = get_bits(gb, 8);sps_id = get_ue_golomb_31(gb);if (sps_id >= MAX_SPS_COUNT) {av_log(avctx, AV_LOG_ERROR, "sps_id %u out of range\n", sps_id);goto fail;}sps->sps_id = sps_id;sps->time_offset_length = 24;sps->profile_idc = profile_idc;sps->constraint_set_flags = constraint_set_flags;sps->level_idc = level_idc;sps->full_range = -1;memset(sps->scaling_matrix4, 16, sizeof(sps->scaling_matrix4));memset(sps->scaling_matrix8, 16, sizeof(sps->scaling_matrix8));sps->scaling_matrix_present = 0;sps->colorspace = 2; //AVCOL_SPC_UNSPECIFIEDif (sps->profile_idc == 100 || // High profilesps->profile_idc == 110 || // High10 profilesps->profile_idc == 122 || // High422 profilesps->profile_idc == 244 || // High444 Predictive profilesps->profile_idc == 44 || // Cavlc444 profilesps->profile_idc == 83 || // Scalable Constrained High profile (SVC)sps->profile_idc == 86 || // Scalable High Intra profile (SVC)sps->profile_idc == 118 || // Stereo High profile (MVC)sps->profile_idc == 128 || // Multiview High profile (MVC)sps->profile_idc == 138 || // Multiview Depth High profile (MVCD)sps->profile_idc == 144) { // old High444 profilesps->chroma_format_idc = get_ue_golomb_31(gb);if (sps->chroma_format_idc > 3U) {avpriv_request_sample(avctx, "chroma_format_idc %u",sps->chroma_format_idc);goto fail;} else if (sps->chroma_format_idc == 3) {sps->residual_color_transform_flag = get_bits1(gb);if (sps->residual_color_transform_flag) {av_log(avctx, AV_LOG_ERROR, "separate color planes are not supported\n");goto fail;}}sps->bit_depth_luma = get_ue_golomb_31(gb) + 8;sps->bit_depth_chroma = get_ue_golomb_31(gb) + 8;if (sps->bit_depth_chroma != sps->bit_depth_luma) {avpriv_request_sample(avctx,"Different chroma and luma bit depth");goto fail;}if (sps->bit_depth_luma < 8 || sps->bit_depth_luma > 14 ||sps->bit_depth_chroma < 8 || sps->bit_depth_chroma > 14) {av_log(avctx, AV_LOG_ERROR, "illegal bit depth value (%d, %d)\n",sps->bit_depth_luma, sps->bit_depth_chroma);goto fail;}sps->transform_bypass = get_bits1(gb);ret = decode_scaling_matrices(gb, sps, NULL, 1,sps->scaling_matrix4, sps->scaling_matrix8);if (ret < 0)goto fail;sps->scaling_matrix_present |= ret;} else {sps->chroma_format_idc = 1;sps->bit_depth_luma = 8;sps->bit_depth_chroma = 8;}log2_max_frame_num_minus4 = get_ue_golomb_31(gb);if (log2_max_frame_num_minus4 < MIN_LOG2_MAX_FRAME_NUM - 4 ||log2_max_frame_num_minus4 > MAX_LOG2_MAX_FRAME_NUM - 4) {av_log(avctx, AV_LOG_ERROR,"log2_max_frame_num_minus4 out of range (0-12): %d\n",log2_max_frame_num_minus4);goto fail;}sps->log2_max_frame_num = log2_max_frame_num_minus4 + 4;sps->poc_type = get_ue_golomb_31(gb);if (sps->poc_type == 0) { // FIXME #defineunsigned t = get_ue_golomb_31(gb);if (t>12) {av_log(avctx, AV_LOG_ERROR, "log2_max_poc_lsb (%d) is out of range\n", t);goto fail;}sps->log2_max_poc_lsb = t + 4;} else if (sps->poc_type == 1) { // FIXME #definesps->delta_pic_order_always_zero_flag = get_bits1(gb);sps->offset_for_non_ref_pic = get_se_golomb_long(gb);sps->offset_for_top_to_bottom_field = get_se_golomb_long(gb);if ( sps->offset_for_non_ref_pic == INT32_MIN|| sps->offset_for_top_to_bottom_field == INT32_MIN) {av_log(avctx, AV_LOG_ERROR,"offset_for_non_ref_pic or offset_for_top_to_bottom_field is out of range\n");goto fail;}sps->poc_cycle_length = get_ue_golomb(gb);if ((unsigned)sps->poc_cycle_length >=FF_ARRAY_ELEMS(sps->offset_for_ref_frame)) {av_log(avctx, AV_LOG_ERROR,"poc_cycle_length overflow %d\n", sps->poc_cycle_length);goto fail;}for (i = 0; i < sps->poc_cycle_length; i++) {sps->offset_for_ref_frame[i] = get_se_golomb_long(gb);if (sps->offset_for_ref_frame[i] == INT32_MIN) {av_log(avctx, AV_LOG_ERROR,"offset_for_ref_frame is out of range\n");goto fail;}}} else if (sps->poc_type != 2) {av_log(avctx, AV_LOG_ERROR, "illegal POC type %d\n", sps->poc_type);goto fail;}sps->ref_frame_count = get_ue_golomb_31(gb);if (avctx->codec_tag == MKTAG('S', 'M', 'V', '2'))sps->ref_frame_count = FFMAX(2, sps->ref_frame_count);if (sps->ref_frame_count > MAX_DELAYED_PIC_COUNT) {av_log(avctx, AV_LOG_ERROR,"too many reference frames %d\n", sps->ref_frame_count);goto fail;}sps->gaps_in_frame_num_allowed_flag = get_bits1(gb);sps->mb_width = get_ue_golomb(gb) + 1;sps->mb_height = get_ue_golomb(gb) + 1;sps->frame_mbs_only_flag = get_bits1(gb);if (sps->mb_height >= INT_MAX / 2U) {av_log(avctx, AV_LOG_ERROR, "height overflow\n");goto fail;}sps->mb_height *= 2 - sps->frame_mbs_only_flag;if (!sps->frame_mbs_only_flag)sps->mb_aff = get_bits1(gb);elsesps->mb_aff = 0;if ((unsigned)sps->mb_width >= INT_MAX / 16 ||(unsigned)sps->mb_height >= INT_MAX / 16 ||av_image_check_size(16 * sps->mb_width,16 * sps->mb_height, 0, avctx)) {av_log(avctx, AV_LOG_ERROR, "mb_width/height overflow\n");goto fail;}sps->direct_8x8_inference_flag = get_bits1(gb);#ifndef ALLOW_INTERLACEif (sps->mb_aff)av_log(avctx, AV_LOG_ERROR,"MBAFF support not included; enable it at compile-time.\n");
#endifsps->crop = get_bits1(gb);if (sps->crop) {unsigned int crop_left = get_ue_golomb(gb);unsigned int crop_right = get_ue_golomb(gb);unsigned int crop_top = get_ue_golomb(gb);unsigned int crop_bottom = get_ue_golomb(gb);int width = 16 * sps->mb_width;int height = 16 * sps->mb_height;if (avctx->flags2 & AV_CODEC_FLAG2_IGNORE_CROP) {av_log(avctx, AV_LOG_DEBUG, "discarding sps cropping, original ""values are l:%d r:%d t:%d b:%d\n",crop_left, crop_right, crop_top, crop_bottom);sps->crop_left =sps->crop_right =sps->crop_top =sps->crop_bottom = 0;} else {int vsub = (sps->chroma_format_idc == 1) ? 1 : 0;int hsub = (sps->chroma_format_idc == 1 ||sps->chroma_format_idc == 2) ? 1 : 0;int step_x = 1 << hsub;int step_y = (2 - sps->frame_mbs_only_flag) << vsub;if (crop_left > (unsigned)INT_MAX / 4 / step_x ||crop_right > (unsigned)INT_MAX / 4 / step_x ||crop_top > (unsigned)INT_MAX / 4 / step_y ||crop_bottom> (unsigned)INT_MAX / 4 / step_y ||(crop_left + crop_right ) * step_x >= width ||(crop_top + crop_bottom) * step_y >= height) {av_log(avctx, AV_LOG_ERROR, "crop values invalid %d %d %d %d / %d %d\n", crop_left, crop_right, crop_top, crop_bottom, width, height);goto fail;}sps->crop_left = crop_left * step_x;sps->crop_right = crop_right * step_x;sps->crop_top = crop_top * step_y;sps->crop_bottom = crop_bottom * step_y;}} else {sps->crop_left =sps->crop_right =sps->crop_top =sps->crop_bottom =sps->crop = 0;}sps->vui_parameters_present_flag = get_bits1(gb);if (sps->vui_parameters_present_flag) {int ret = decode_vui_parameters(gb, avctx, sps);if (ret < 0)goto fail;}if (get_bits_left(gb) < 0) {av_log_once(avctx, ignore_truncation ? AV_LOG_WARNING : AV_LOG_ERROR, AV_LOG_DEBUG,&ps->overread_warning_printed[sps->vui_parameters_present_flag],"Overread %s by %d bits\n", sps->vui_parameters_present_flag ? "VUI" : "SPS", -get_bits_left(gb));if (!ignore_truncation)goto fail;}/* if the maximum delay is not stored in the SPS, derive it based on the* level */if (!sps->bitstream_restriction_flag &&(sps->ref_frame_count || avctx->strict_std_compliance >= FF_COMPLIANCE_STRICT)) {sps->num_reorder_frames = MAX_DELAYED_PIC_COUNT - 1;for (i = 0; i < FF_ARRAY_ELEMS(level_max_dpb_mbs); i++) {if (level_max_dpb_mbs[i][0] == sps->level_idc) {sps->num_reorder_frames = FFMIN(level_max_dpb_mbs[i][1] / (sps->mb_width * sps->mb_height),sps->num_reorder_frames);break;}}}if (!sps->sar.den)sps->sar.den = 1;if (avctx->debug & FF_DEBUG_PICT_INFO) {static const char csp[4][5] = { "Gray", "420", "422", "444" };av_log(avctx, AV_LOG_DEBUG,"sps:%u profile:%d/%d poc:%d ref:%d %dx%d %s %s crop:%u/%u/%u/%u %s %s %"PRId32"/%"PRId32" b%d reo:%d\n",sps_id, sps->profile_idc, sps->level_idc,sps->poc_type,sps->ref_frame_count,sps->mb_width, sps->mb_height,sps->frame_mbs_only_flag ? "FRM" : (sps->mb_aff ? "MB-AFF" : "PIC-AFF"),sps->direct_8x8_inference_flag ? "8B8" : "",sps->crop_left, sps->crop_right,sps->crop_top, sps->crop_bottom,sps->vui_parameters_present_flag ? "VUI" : "",csp[sps->chroma_format_idc],sps->timing_info_present_flag ? sps->num_units_in_tick : 0,sps->timing_info_present_flag ? sps->time_scale : 0,sps->bit_depth_luma,sps->bitstream_restriction_flag ? sps->num_reorder_frames : -1);}/* check if this is a repeat of an already parsed SPS, then keep the* original one.* otherwise drop all PPSes that depend on it */if (ps->sps_list[sps_id] &&!memcmp(ps->sps_list[sps_id]->data, sps_buf->data, sps_buf->size)) {av_buffer_unref(&sps_buf);} else {remove_sps(ps, sps_id);ps->sps_list[sps_id] = sps_buf;}return 0;fail:av_buffer_unref(&sps_buf);return AVERROR_INVALIDDATA;
}
ff_h264_decode_seq_parameter_set函数内部,首先通过语句sps_buf = av_buffer_allocz(sizeof(*sps)) 给指针sps_buf指向的AVBufferRef结构体和其成员buf和data分配内存,字节归零,给sps_buf->data分配的内存块的大小为sizeof(*sps),也就是sizeof(SPS)个字节。关于av_buffer_allocz函数的用法可以参考:《FFmpeg源码:buffer_create、av_buffer_create、av_buffer_default_free、av_buffer_alloc、av_buffer_allocz函数分析》
通过语句sps = (SPS*)sps_buf->data 让接下来解码得到的SPS属性都存放到缓冲区sps_buf->data中。
通过语句sps->data_size = gb->buffer_end - gb->buffer 得到该SPS“NALU Header + RBSP”的长度,单位为字节。
由于sps->data数组的长度为4096字节,sizeof(sps->data)为4096。如果sps->data_size > sizeof(sps->data)表示该SPS的“NALU Header + RBSP” 大于数组长度,会导致数组越界访问,也就是超出数组合法空间的访问,所以要加如下判断语句:
if (sps->data_size > sizeof(sps->data)) {av_log(avctx, AV_LOG_DEBUG, "Truncating likely oversized SPS\n");sps->data_size = sizeof(sps->data);}
通过语句memcpy(sps->data, gb->buffer, sps->data_size),将SPS的“NALU Header + RBSP”拷贝到sps->data指向的缓冲区中。
通过下面语句,按位读取描述符为u(n),也就是n位无符号整数的SPS属性。由于执行函数ff_h264_decode_seq_parameter_set之前,gb->index的值为8,表示已经读取完了该SPS的NALU Header。所以执行下面的语句可以直接读取该SPS的属性:
profile_idc = get_bits(gb, 8);
constraint_set_flags |= get_bits1(gb) << 0; // constraint_set0_flag
constraint_set_flags |= get_bits1(gb) << 1; // constraint_set1_flag
constraint_set_flags |= get_bits1(gb) << 2; // constraint_set2_flag
constraint_set_flags |= get_bits1(gb) << 3; // constraint_set3_flag
constraint_set_flags |= get_bits1(gb) << 4; // constraint_set4_flag
constraint_set_flags |= get_bits1(gb) << 5; // constraint_set5_flag
skip_bits(gb, 2); // reserved_zero_2bits
level_idc = get_bits(gb, 8);
关于get_bits1和get_bits函数的用法可以参考:《FFmpeg中位操作相关的源码:GetBitContext结构体,init_get_bits函数、get_bits1函数和get_bits函数分析》
通过下面语句,读取无符号指数哥伦布编码的SPS属性:
sps_id = get_ue_golomb_31(gb);
关于get_ue_golomb_31函数的用法可以参考:《音视频入门基础:H.264专题(7)——FFmpeg源码中 指数哥伦布编码的解码实现》
H.264官方文档《T-REC-H.264-202108-I!!PDF-E.pdf》第74页,写着seq_parameter_set_id属性的取值范围为0 ~ 31(包括0 ~ 31)
所以有如下判断语句:
if (sps_id >= MAX_SPS_COUNT) {av_log(avctx, AV_LOG_ERROR, "sps_id %u out of range\n", sps_id);goto fail;}
宏定义MAX_SPS_COUNT 定义在FFmpeg源码的头文件libavcodec/h264_ps.h中:
#define MAX_SPS_COUNT 32
if (sps->profile_idc == 100 || // High profilesps->profile_idc == 110 || // High10 profilesps->profile_idc == 122 || // High422 profilesps->profile_idc == 244 || // High444 Predictive profilesps->profile_idc == 44 || // Cavlc444 profilesps->profile_idc == 83 || // Scalable Constrained High profile (SVC)sps->profile_idc == 86 || // Scalable High Intra profile (SVC)sps->profile_idc == 118 || // Stereo High profile (MVC)sps->profile_idc == 128 || // Multiview High profile (MVC)sps->profile_idc == 138 || // Multiview Depth High profile (MVCD)sps->profile_idc == 144) { // old High444 profile
//...
}
上面的判断语句对应H.264官方文档中的:
可以看到FFmpeg源码这里的判断语句跟H.264官方文档中的不一样,这里应该以H.264官方文档中的为准。所以如果看到FFmpeg解码某个H.264码流后拿到的SPS属性不正确,可以看看其SPS中的profile_idc是不是有问题。
然后不断通过get_bits1、get_bits、get_ue_golomb等函数不断将SPS的属性读取出来,放到指针sps指向的结构体中。
最后通过下面语句,将sps属性存贮到ps->sps_list[sps_id]中。
/* check if this is a repeat of an already parsed SPS, then keep the* original one.* otherwise drop all PPSes that depend on it */if (ps->sps_list[sps_id] &&!memcmp(ps->sps_list[sps_id]->data, sps_buf->data, sps_buf->size)) {av_buffer_unref(&sps_buf);} else {remove_sps(ps, sps_id);ps->sps_list[sps_id] = sps_buf;}