开源地址
GitHub - Doubiiu/CodeTalker: [CVPR 2023] CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior
提供了预训练
运行报错
File "D:\Program Files\miniconda3\lib\site-packages\transformers\models\wav2vec2\modeling_wav2vec2.py", line 397, in forwardhidden_states = hidden_states.transpose(1, 2)
AttributeError: 'tuple' object has no attribute 'transpose'
原因:Wav2Vec2FeatureProjection返回了两组向量
class Wav2Vec2FeatureProjection(nn.Module):def __init__(self, config):super().__init__()self.layer_norm = nn.LayerNorm(config.conv_dim[-1], eps=config.layer_norm_eps)self.projection = nn.Linear(config.conv_dim[-1], config.hidden_size)self.dropout = nn.Dropout(config.feat_proj_dropout)def forward(self, hidden_states):# non-projected hidden states are needed for quantizationnorm_hidden_states = self.layer_norm(hidden_states)hidden_states = self.projection(norm_hidden_states)hidden_states = self.dropout(hidden_states)return hidden_states, norm_hidden_states
临时解决方法:
把向量取一个
encoder_outputs = self.encoder(hidden_states[0],attention_mask=attention_mask,output_attentions=output_attentions,output_hidden_states=output_hidden_states,return_dict=return_dict,)
报错2,渲染生成mp4报错没有权限
E:\project\audio\audio2face\CodeTalker-main\demo\output2\tmpdxzolz5y.mp4: Permission denied
这个原因是路径有点复杂,把路径名字取得简单点就好了。
一次只能生成20094长度,12秒,否则计算报错
File "D:\Program Files\miniconda3\lib\site-packages\torch\nn\functional.py", line 5359, in multi_head_attention_forward
raise RuntimeError(f"The shape of the 3D attn_mask is {attn_mask.shape}, but should be {correct_3d_size}.")
RuntimeError: The shape of the 3D attn_mask is torch.Size([4, 600, 600]), but should be (4, 601, 601).