ARTrack 阅读记录

目录

环境配置与脚本编写

前向传播过程

网络结构


环境配置与脚本编写

按照官网执行并没有顺利完成,将yaml文件中的 pip 项 手动安装的

conda create -n artrack python=3.9
# 启动该环境,并跳转到项目主目录路径下
astor==0.8.1 configparser==5.2.0
data==0.4 docker-pycreds==0.4.0 easydict==1.9 einops==0.4.1 formulaic==0.5.2 funcsigs==1.0.2 future==0.18.2
gitdb==4.0.9 gitpython==3.1.27 interface-meta==1.3.0 iopath==0.1.9 jpeg4py==0.1.4 jsonpatch==1.32 jsonpointer==2.3 latex==0.7.0
libarchive-c==2.9 linearmodels==4.29 lmdb==1.3.0 loguru==0.6.0 mat73==0.59 memory-profiler==0.60.0 msgpack==1.0.2 ninja==1.11.1
opencv-python==4.5.5.64 pathtools==0.1.2 promise==2.3 property-cached==1.6.4 protobuf==3.20.0 pycocotools==2.0.4 pyhdfe==0.1.2
ruamel-yaml-conda==0.15.100 sentry-sdk==1.5.8 setproctitle==1.2.2 setuptools-scm==7.1.0 shapely==1.8.1.post1 shortuuid==1.0.8
shutilwhich==1.1.0 smmap==5.0.0 tables==3.6.1 tempdir==0.7.1 tensorboardx==2.5.1 thop==0.1.0.post2207010342 tikzplotlib==0.10.1
timm==0.5.4 tomli==2.0.1 torch==1.11.0 torchfile==0.1.0 visdom==0.1.8.9 wandb==0.12.11 webcolors==1.12 yaspin==2.1.0

里面的默认路径需要改写

python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir ./outpu

官网下载训练好的模型,创建路径,将模型放在该路径下

ARTrack-main/output/checkpoints/train/artrack_seq/artrack_seq_256_full/ARTrackSeq_ep0060.pth.tar

创建encoder的预训练模型路径,并把预训练模型放入这里,在yaml文件中进行更改,并且源脚本文件 artrack_seq.py中也需要更改

mkdir pretrained_model
#
mae_pretrain_vit_base.pth 文件名# artrack_seq_256_full.yaml 中用绝对路径改写
PRETRAIN_PTH: "/root/data/zjx/Code-subject/ARTrack/ARTrack-main/pretrained_models"# 同时将artrack_seq.py --100 中的
load_from = cfg.MODEL.PRETRAIN_PTH
# 改为
load_from = cfg.MODEL.PRETRAIN_PTH +'/' + cfg.MODEL.PRETRAIN_FILE
#同时将 artrack_seq.py -- 103 中的
missing_keys, unexpected_keys = model.load_state_dict(checkpoint["net"], strict=False)
# 改为
missing_keys, unexpected_keys = model.load_state_dict(checkpoint["model"], strict=False)

代码中没有实现 run video 的脚本,这里需要自定义一个脚本实现

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literalsimport os
import random
import argparse
import multiprocessingimport cv2
import torch
import torch.nn as nn
import numpy as np
from glob import globfrom lib.test.evaluation.tracker import Trackerimport sysprj_path = os.path.join(os.path.dirname(__file__), '..')
if prj_path not in sys.path:sys.path.append(prj_path)torch.set_num_threads(1)parser = argparse.ArgumentParser(description='Run tracker on sequence or dataset.')
parser.add_argument('tracker_name', type=str, help='Name of tracking method.')
parser.add_argument('tracker_param', type=str, help='Name of config file.')
parser.add_argument('--runid', type=int, default=None, help='The run id.')
parser.add_argument('--video_path', type=str, default='None', help='Name of dataset (otb, nfs, uav, tpl, vot, tn, gott, gotv, lasot).')
parser.add_argument('--sequence', type=str, default=None, help='Sequence number or name.')
parser.add_argument('--debug', type=int, default=0, help='Debug level.')
parser.add_argument('--threads', type=int, default=0, help='Number of threads.')
parser.add_argument('--num_gpus', type=int, default=8)args = parser.parse_args()def main():  # 这里已经是图片了colors = [random.randint(0, 255) for _ in range(3)]print('[INFO] Loading the model')# load configtrackers = Tracker(args.tracker_name, args.tracker_param, None, args.runid)try:worker_name = multiprocessing.current_process().nameworker_id = int(worker_name[worker_name.find('-') + 1:]) - 1gpu_id = worker_id % args.num_gpustorch.cuda.set_device(gpu_id)except:passtrackers.run_video(args.video_path, None, None, None, False)if __name__=='__main__':main()

执行

python tracking/run_video.py artrack_seq artrack_seq_256_full --video_path /root/data/zjx/Code-subject/OSTrack-main/experiments/video/soccer1.avi 

前向传播过程

裁剪模板区域和OSTrack代码一样,初始化的时候,为需要保留的N帧的bbox的坐标信息创建了一个buffer--self.store_result,初始化时全为 init bbox,N的值此时设置为7

        for i in range(self.save_all - 1):self.store_result.append(info['init_bbox'].copy())

搜索区域的裁剪和OSTrack的一样。将之前帧的坐标进行变换,  以前一帧预测的坐标为参考点计算相对坐标,因为当前帧的裁剪的搜索区域的就是以上一帧预测的bbox为中心进行裁剪的,所以搜索区域的中心实则是前一帧预测的bbox的中心。只不过前一帧预测的bbox为原img的尺度,而搜索区域为crop size上的尺度,因此,只需要将计算原img尺度上的也就是之前帧的预测的坐标与前一帧预测的坐标的相对坐标,再乘以resize factor就可以将相对坐标转换到crop size 的尺度下。并且,前一帧的预测的bbox转换实则移到了搜索区域的中心点,也就是 (crop_size/2, crop_ size/2)。 

转换后除以 crop size 进行了归一化,不过这里有可能会 小于0 或者 大于 1,因为坐标变换可能会超出边界。接下来将xywh转换成 xyxy 形式,并筛选只保留(-0.5,1.5)区间的。然后对坐标进行量化。加上0.5 为了防止 出现负数,最终将bbox量化到 2*(bins-1)之间。最终,包含时空上下文信息的坐标输入为

seqs_out = seqs_out.unsqueeze(0)  # (1,28)

将 模板 和 搜索区域送入 ViT backbone中进行特征提取,这个过程中一共 16倍 下采样。然后将 提取的 sequence patch、以及位置编码、外加之前转换后的之前帧的bbox的信息 送入 接下来的Transformer中。

首先进入一个 encoder,在FeatureFusionEncoder类中进行一些预处理,主要的基本模块是 FeatureFusion 模块。这个encoder的主要过程如下所示,最终返回 z 和 x 一样shape的特征 patch。

接下来 ,将 之前帧的 bbox 坐标序列以及开始标志拼接在一起,作为decoder的输入 sequence。因为只需要预测bbox的坐标,所以不需要额外的结束标志,输出的序列长度直接为4即可。

1、 将输入的sequence 进行词汇嵌入,词向量的长度是crop img 下采样得到的特征patch的分辨率
2、 将初始输入tgt、模板特征、搜索特征、patch z的位置编码、 x patch的位置编码、identity高斯截断分布、高斯截断分布、查询嵌入、输入序列的掩码 送入decoder

 decoder主要由TargetQueryDecoderLayer层组成。该模块的前向过程如下所示,一共有6层

最终输出和 tgt shape一样的token sequence。得到的输出的shape为(1,length,768),这个length为tgt的长度,随sequence的预测而逐渐增加。接下来,

1、 拿出得到的 query的 最后一个单词嵌入,并与词向量的权重矩阵进行矩阵乘法,得到与每个位置量化后的相关联的预测值。
2、 取softmax,得到关于量化后的坐标的概率分布。
3、 采用argmax sampleing,也就是看最大概率的位置。
4、 将当前预测的量化后的坐标加入到 tgt当中,执行循环。
5、 最终得到预测的bbox的量化坐标。

得到网络的输出预测后,

1、 bbox坐标反量化
2、 xyxy 转为 xywh  中心点加长宽
3、 尺度返回到原img, 转成 xywh, 左顶点加长宽
4、 平滑处理,去掉bbox超出图片的部分
5、 对于之前保存的坐标信息,将最靠前的弹出去,在最靠后的也就是前一帧的坐标加入当前预测的。好比出栈入栈操作。

 

网络结构

ARTrackSeq((backbone): VisionTransformer((patch_embed): PatchEmbed((proj): Conv2d(3, 768, kernel_size=(16, 16), stride=(16, 16))(norm): Identity())(pos_drop): Dropout(p=0.0, inplace=False)(blocks): Sequential((0): Block((norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=768, out_features=2304, bias=True)(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=768, out_features=768, bias=True)(proj_drop): Dropout(p=0.0, inplace=False))(drop_path): Identity()(norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=768, out_features=3072, bias=True)(act): GELU()(drop1): Dropout(p=0.0, inplace=False)(fc2): Linear(in_features=3072, out_features=768, bias=True)(drop2): Dropout(p=0.0, inplace=False)))(1): Block((norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=768, out_features=2304, bias=True)(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=768, out_features=768, bias=True)(proj_drop): Dropout(p=0.0, inplace=False))(drop_path): DropPath()(norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=768, out_features=3072, bias=True)(act): GELU()(drop1): Dropout(p=0.0, inplace=False)(fc2): Linear(in_features=3072, out_features=768, bias=True)(drop2): Dropout(p=0.0, inplace=False)))(2): Block((norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=768, out_features=2304, bias=True)(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=768, out_features=768, bias=True)(proj_drop): Dropout(p=0.0, inplace=False))(drop_path): DropPath()(norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=768, out_features=3072, bias=True)(act): GELU()(drop1): Dropout(p=0.0, inplace=False)(fc2): Linear(in_features=3072, out_features=768, bias=True)(drop2): Dropout(p=0.0, inplace=False)))(3): Block((norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=768, out_features=2304, bias=True)(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=768, out_features=768, bias=True)(proj_drop): Dropout(p=0.0, inplace=False))(drop_path): DropPath()(norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=768, out_features=3072, bias=True)(act): GELU()(drop1): Dropout(p=0.0, inplace=False)(fc2): Linear(in_features=3072, out_features=768, bias=True)(drop2): Dropout(p=0.0, inplace=False)))(4): Block((norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=768, out_features=2304, bias=True)(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=768, out_features=768, bias=True)(proj_drop): Dropout(p=0.0, inplace=False))(drop_path): DropPath()(norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=768, out_features=3072, bias=True)(act): GELU()(drop1): Dropout(p=0.0, inplace=False)(fc2): Linear(in_features=3072, out_features=768, bias=True)(drop2): Dropout(p=0.0, inplace=False)))(5): Block((norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=768, out_features=2304, bias=True)(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=768, out_features=768, bias=True)(proj_drop): Dropout(p=0.0, inplace=False))(drop_path): DropPath()(norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=768, out_features=3072, bias=True)(act): GELU()(drop1): Dropout(p=0.0, inplace=False)(fc2): Linear(in_features=3072, out_features=768, bias=True)(drop2): Dropout(p=0.0, inplace=False)))(6): Block((norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=768, out_features=2304, bias=True)(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=768, out_features=768, bias=True)(proj_drop): Dropout(p=0.0, inplace=False))(drop_path): DropPath()(norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=768, out_features=3072, bias=True)(act): GELU()(drop1): Dropout(p=0.0, inplace=False)(fc2): Linear(in_features=3072, out_features=768, bias=True)(drop2): Dropout(p=0.0, inplace=False)))(7): Block((norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=768, out_features=2304, bias=True)(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=768, out_features=768, bias=True)(proj_drop): Dropout(p=0.0, inplace=False))(drop_path): DropPath()(norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=768, out_features=3072, bias=True)(act): GELU()(drop1): Dropout(p=0.0, inplace=False)(fc2): Linear(in_features=3072, out_features=768, bias=True)(drop2): Dropout(p=0.0, inplace=False)))(8): Block((norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=768, out_features=2304, bias=True)(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=768, out_features=768, bias=True)(proj_drop): Dropout(p=0.0, inplace=False))(drop_path): DropPath()(norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=768, out_features=3072, bias=True)(act): GELU()(drop1): Dropout(p=0.0, inplace=False)(fc2): Linear(in_features=3072, out_features=768, bias=True)(drop2): Dropout(p=0.0, inplace=False)))(9): Block((norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=768, out_features=2304, bias=True)(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=768, out_features=768, bias=True)(proj_drop): Dropout(p=0.0, inplace=False))(drop_path): DropPath()(norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=768, out_features=3072, bias=True)(act): GELU()(drop1): Dropout(p=0.0, inplace=False)(fc2): Linear(in_features=3072, out_features=768, bias=True)(drop2): Dropout(p=0.0, inplace=False)))(10): Block((norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=768, out_features=2304, bias=True)(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=768, out_features=768, bias=True)(proj_drop): Dropout(p=0.0, inplace=False))(drop_path): DropPath()(norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=768, out_features=3072, bias=True)(act): GELU()(drop1): Dropout(p=0.0, inplace=False)(fc2): Linear(in_features=3072, out_features=768, bias=True)(drop2): Dropout(p=0.0, inplace=False)))(11): Block((norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=768, out_features=2304, bias=True)(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=768, out_features=768, bias=True)(proj_drop): Dropout(p=0.0, inplace=False))(drop_path): DropPath()(norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=768, out_features=3072, bias=True)(act): GELU()(drop1): Dropout(p=0.0, inplace=False)(fc2): Linear(in_features=3072, out_features=768, bias=True)(drop2): Dropout(p=0.0, inplace=False))))(norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True))(pix_head): Pix2Track((word_embeddings): Embedding(802, 768, padding_idx=800, max_norm=1)(position_embeddings): Embedding(5, 768)(prev_position_embeddings): Embedding(28, 768)(encoder): FeatureFusionEncoder((layers): ModuleList((0): FeatureFusion((z_norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(x_norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(z_self_attn): SelfAttention((qkv): Linear(in_features=768, out_features=2304, bias=True)(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=768, out_features=768, bias=True)(proj_drop): Dropout(p=0.1, inplace=False))(x_self_attn): SelfAttention((qkv): Linear(in_features=768, out_features=2304, bias=True)(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=768, out_features=768, bias=True)(proj_drop): Dropout(p=0.1, inplace=False))(z_norm2_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(z_norm2_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(x_norm2_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(x_norm2_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(z_x_cross_attention): CrossAttention((q): Linear(in_features=768, out_features=768, bias=True)(kv): Linear(in_features=768, out_features=1536, bias=True)(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=768, out_features=768, bias=True)(proj_drop): Dropout(p=0.1, inplace=False))(x_z_cross_attention): CrossAttention((q): Linear(in_features=768, out_features=768, bias=True)(kv): Linear(in_features=768, out_features=1536, bias=True)(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=768, out_features=768, bias=True)(proj_drop): Dropout(p=0.1, inplace=False))(z_norm3): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(x_norm3): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(z_mlp): Mlp((fc1): Linear(in_features=768, out_features=3072, bias=True)(act): GELU()(fc2): Linear(in_features=3072, out_features=768, bias=True)(drop): Dropout(p=0.1, inplace=False))(x_mlp): Mlp((fc1): Linear(in_features=768, out_features=3072, bias=True)(act): GELU()(fc2): Linear(in_features=3072, out_features=768, bias=True)(drop): Dropout(p=0.1, inplace=False))(drop_path): Identity())(1): FeatureFusion((z_norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(x_norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(z_self_attn): SelfAttention((qkv): Linear(in_features=768, out_features=2304, bias=True)(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=768, out_features=768, bias=True)(proj_drop): Dropout(p=0.1, inplace=False))(x_self_attn): SelfAttention((qkv): Linear(in_features=768, out_features=2304, bias=True)(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=768, out_features=768, bias=True)(proj_drop): Dropout(p=0.1, inplace=False))(z_norm2_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(z_norm2_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(x_norm2_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(x_norm2_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(z_x_cross_attention): CrossAttention((q): Linear(in_features=768, out_features=768, bias=True)(kv): Linear(in_features=768, out_features=1536, bias=True)(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=768, out_features=768, bias=True)(proj_drop): Dropout(p=0.1, inplace=False))(x_z_cross_attention): CrossAttention((q): Linear(in_features=768, out_features=768, bias=True)(kv): Linear(in_features=768, out_features=1536, bias=True)(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=768, out_features=768, bias=True)(proj_drop): Dropout(p=0.1, inplace=False))(z_norm3): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(x_norm3): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(z_mlp): Mlp((fc1): Linear(in_features=768, out_features=3072, bias=True)(act): GELU()(fc2): Linear(in_features=3072, out_features=768, bias=True)(drop): Dropout(p=0.1, inplace=False))(x_mlp): Mlp((fc1): Linear(in_features=768, out_features=3072, bias=True)(act): GELU()(fc2): Linear(in_features=3072, out_features=768, bias=True)(drop): Dropout(p=0.1, inplace=False))(drop_path): Identity())(2): FeatureFusion((z_norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(x_norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(z_self_attn): SelfAttention((qkv): Linear(in_features=768, out_features=2304, bias=True)(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=768, out_features=768, bias=True)(proj_drop): Dropout(p=0.1, inplace=False))(x_self_attn): SelfAttention((qkv): Linear(in_features=768, out_features=2304, bias=True)(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=768, out_features=768, bias=True)(proj_drop): Dropout(p=0.1, inplace=False))(z_norm2_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(z_norm2_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(x_norm2_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(x_norm2_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(z_x_cross_attention): CrossAttention((q): Linear(in_features=768, out_features=768, bias=True)(kv): Linear(in_features=768, out_features=1536, bias=True)(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=768, out_features=768, bias=True)(proj_drop): Dropout(p=0.1, inplace=False))(x_z_cross_attention): CrossAttention((q): Linear(in_features=768, out_features=768, bias=True)(kv): Linear(in_features=768, out_features=1536, bias=True)(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=768, out_features=768, bias=True)(proj_drop): Dropout(p=0.1, inplace=False))(z_norm3): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(x_norm3): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(z_mlp): Mlp((fc1): Linear(in_features=768, out_features=3072, bias=True)(act): GELU()(fc2): Linear(in_features=3072, out_features=768, bias=True)(drop): Dropout(p=0.1, inplace=False))(x_mlp): Mlp((fc1): Linear(in_features=768, out_features=3072, bias=True)(act): GELU()(fc2): Linear(in_features=3072, out_features=768, bias=True)(drop): Dropout(p=0.1, inplace=False))(drop_path): Identity()))(z_pos_enc): Untied2DPositionalEncoder((pos): Learned2DPositionalEncoder()(norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(pos_q_linear): Linear(in_features=768, out_features=768, bias=True)(pos_k_linear): Linear(in_features=768, out_features=768, bias=True))(x_pos_enc): Untied2DPositionalEncoder((pos): Learned2DPositionalEncoder()(norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(pos_q_linear): Linear(in_features=768, out_features=768, bias=True)(pos_k_linear): Linear(in_features=768, out_features=768, bias=True))(z_rel_pos_bias_table): RelativePosition2DEncoder()(x_rel_pos_bias_table): RelativePosition2DEncoder()(z_x_rel_pos_bias_table): RelativePosition2DEncoder()(x_z_rel_pos_bias_table): RelativePosition2DEncoder())(decoder): TargetQueryDecoderBlock((layers): ModuleList((0): TargetQueryDecoderLayer((norm_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(self_attn1): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True))(norm_2_query): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(norm_2_memory): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(multihead_attn): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True))(norm_3): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(mlpz): Mlp((fc1): Linear(in_features=768, out_features=3072, bias=True)(act): GELU()(fc2): Linear(in_features=3072, out_features=768, bias=True)(drop): Dropout(p=0.1, inplace=False))(drop_path): Identity())(1): TargetQueryDecoderLayer((norm_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(self_attn1): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True))(norm_2_query): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(norm_2_memory): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(multihead_attn): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True))(norm_3): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(mlpz): Mlp((fc1): Linear(in_features=768, out_features=3072, bias=True)(act): GELU()(fc2): Linear(in_features=3072, out_features=768, bias=True)(drop): Dropout(p=0.1, inplace=False))(drop_path): Identity())(2): TargetQueryDecoderLayer((norm_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(self_attn1): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True))(norm_2_query): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(norm_2_memory): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(multihead_attn): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True))(norm_3): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(mlpz): Mlp((fc1): Linear(in_features=768, out_features=3072, bias=True)(act): GELU()(fc2): Linear(in_features=3072, out_features=768, bias=True)(drop): Dropout(p=0.1, inplace=False))(drop_path): Identity())(3): TargetQueryDecoderLayer((norm_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(self_attn1): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True))(norm_2_query): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(norm_2_memory): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(multihead_attn): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True))(norm_3): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(mlpz): Mlp((fc1): Linear(in_features=768, out_features=3072, bias=True)(act): GELU()(fc2): Linear(in_features=3072, out_features=768, bias=True)(drop): Dropout(p=0.1, inplace=False))(drop_path): Identity())(4): TargetQueryDecoderLayer((norm_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(self_attn1): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True))(norm_2_query): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(norm_2_memory): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(multihead_attn): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True))(norm_3): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(mlpz): Mlp((fc1): Linear(in_features=768, out_features=3072, bias=True)(act): GELU()(fc2): Linear(in_features=3072, out_features=768, bias=True)(drop): Dropout(p=0.1, inplace=False))(drop_path): Identity())(5): TargetQueryDecoderLayer((norm_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(self_attn1): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True))(norm_2_query): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(norm_2_memory): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(multihead_attn): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True))(norm_3): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(mlpz): Mlp((fc1): Linear(in_features=768, out_features=3072, bias=True)(act): GELU()(fc2): Linear(in_features=3072, out_features=768, bias=True)(drop): Dropout(p=0.1, inplace=False))(drop_path): Identity()))(norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)))
)

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/603985.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

C++学习笔记——友元及重载运算符

目录 一、友元 1.1声明友元函数 1.2声明友元类 二、运算符重载 2.1重载加号运算符 2.2重载流插入运算符 三、一个简单的银行管理系统 四、 详细的介绍 一、友元 在 C 中,友元是一个函数或类,它可以访问另一个类的私有成员或保护成员。通常情况下…

uView Alert 提示

用于页面中展示重要的提示信息。 基础用法# Alert 组件不属于浮层元素,不会自动消失或关闭。 Alert 组件提供四种类型,由 type 属性指定,默认值为 info。 success alert info alert warning alert error alert 主题# Alert 组件提供了…

HTML小白入门基础(概述,结构与基本常用标签)

目录 一、什么是HTML 二、HTML的基本结构: 三、结构与属性: 四、常见标签: 一、什么是HTML HTML:超文本标记语言(HyperText Markup Language) 超文本:指的是网页中可以显示的内容(图片&#x…

【Python机器学习】基于随机森林全球经济危机预测

一、引言 全球经济危机是一个复杂的问题,受到多种因素的影响,如金融市场、政策环境、地缘政治等。预测经济危机对于政策制定者、投资者和企业来说至关重要,因为它可以帮助他们提前做出应对措施,减少潜在的损失。然而,准确预测经济危机是一项具有挑战性的任务,因为涉及到…

【LeetCode739】每日温度

1、题目描述 【题目链接】 给定一个整数数组 temperatures ,表示每天的温度,返回一个数组 answer ,其中 answer[i] 是指对于第 i 天,下一个更高温度出现在几天后。如果气温在这之后都不会升高,请在该位置用 0 来代替。…

ROS+moveit+jakaminicob仿真运动

先浅浅的放一个官方的c文档: Motion Planning API — moveit_tutorials Melodic documentation 目录 一、实现运动到目标点的程序 二、在rviz里面新建扫描平台 一、实现运动到目标点的程序 (等我得空了补一个c运行环境部署说明) #inclu…

【Linux】CentOS 7重装保留数据的方法

我们需要重装CentOS 7系统,但是又想保留原来的数据。这篇文章将会从多个方面详细介绍如何重装CentOS 7系统,同时又能保留原有的数据。 一、备份重要数据 在重装CentOS 7系统之前,我们需要备份我们的重要数据。这可以通过多种方式实现&#…

React16源码: React中创建更新的方式及ReactDOM.render的源码实现

React当中创建更新的主要方式 ReactDOM.render || hydrate 这两个API都是我们要把整个应用第一次进行渲染到我们的页面上面能够展现出来我们整个应用的样子的一个过程这是初次渲染 setState 后续更新应用 forceUpdate 后续更新应用 replaceState 在后续被舍弃 关于 ReactDOM…

Qt undefined reference to `vtable for xxx‘

一、问题背景 在编译QT代码时,出现 undefined reference to xxx::entered(),通过鼠标双击QtCreator“问题栏”中的该行,则会跳转到发送信号的代码所在行。与上述代码一同出现在“问题栏”的还有 undefined reference to vtable for xxx’。 …

Git常用命令diff和mv

Git常用命令diff和mv 1、diff # 查看工作区和暂存区所有文件的对比 # 该命令可以显示尚未添加到stage的文件的变更 $ git diff# 查看工作区和暂存区单个文件的对比 $ git diff file# 显示暂存区和上一个commit的差异 # 查看暂存区与指定提交版本的不同,版本可缺省为HEAD $ gi…

力扣(leetcode)第412题Fizz Buzz(Python)

412.Fizz Buzz 题目链接:412.Fizz Buzz 给你一个整数 n ,找出从 1 到 n 各个整数的 Fizz Buzz 表示,并用字符串数组 answer(下标从 1 开始)返回结果,其中: answer[i] “FizzBuzz” 如果 i 同…

Linux-文件系统管理实验2

1、将bin目录下的所有文件列表放到bin.txt文档中,并将一共有多少个命令的结果信息保存到该文件的最后一行。统计出文件中以b开头的所有命令有多少个,并将这些命令保存到b.txt文档中。将文档中以p结尾的所有命令保存到p.txt文件中,并统计有多少…

lv14 ioctl、printk及多个此设备支持 6

1 ioctl操作实现 对相应设备做指定的控制操作(各种属性的设置获取等等) long xxx_ioctl (struct file *filp, unsigned int cmd, unsigned long arg); 功能:对相应设备做指定的控制操作(各种属性的设置获取等等) 参数…

【csharp】依赖注入

依赖注入 依赖注入(Dependency Injection,DI)是一种软件设计模式,旨在降低组件之间的耦合度。在依赖注入中,一个类的依赖关系不是在类内部创建,而是通过外部传递进来。这通常通过构造函数、方法参数或属性…

氢燃料电池技术综述

文章目录 工作原理 系统集成 应用 特点 国家政策 行业发展 机遇和挑战 参考文献 工作原理 氢燃料电池是通过催化剂将氢气和氧气反应生成电能和水的过程,在这个过程中会伴随有热量产生。 系统集成 氢燃料电池需要将氢气供应系统、氧气供应系统、电堆、冷却系…

【基础篇】十二、引用计数法 可达性分析算法

文章目录 1、Garbage Collection2、方法区的回收3、堆对象回收4、引用计数法5、可达性分析算法6、查看GC Root对象 1、Garbage Collection C/C,无自动回收机制,对象不用时需要手动释放,否则积累导致内存泄漏: Java、C#、Python、…

Linux程序、进程以及计划任务(第一部分)

目录 一、程序和进程 1、什么是程序? 2、什么是进程? 3、线程是什么? 4、如何查看是多线程还是单线程 5、进程结束的两种情况: 6、进程的状态 二、查看进程信息的相关命令 1、ps:查看静态的进程统计信息 2、…

c++基础(对c的扩展)

文章目录 命令空间引用基本本质引用作为参数引用的使用场景 内联函数引出基本概念 函数补充默认参数函数重载c中函数重载定义条件函数重载的原理 命令空间 定义 namespace是单独的作用域 两者不会相互干涉 namespace 名字 { //变量 函数 等等 }eg namespace nameA {int num;v…

判断对象是否是垃圾的引用计数法有什么问题

给对象中添加一个引用计数器,每当有一个地方引用它,计数器就加一,当引用失效计数器就减一,任何时候引用计数器为0的对象就是不可能再被使用的(变成垃圾)。 这个方法实现简单、效率高,但是目前主…

STM32F407ZGT6-flash地址-SRAM

2、 2-STM32F407英文数据手册(没有中文).pdf Memory mapping