tensorrt基础知识+torch版lenet转c++ trt

官网文档

API文档

Docker镜像

自定义Plugin仓库

0.安装

1.安装tensorrt

从官网下载.deb包,要注意的是cuda版本

sudo dpkg -i nv-tensorrt-repo-ubuntu1604-cuda10.0-trt7.0.0.11-ga-20191216_1-1_amd64.deb
sudo apt update
sudo apt install tensorrt

Engine plan 的兼容性依赖于GPU的compute capability 和 TensorRT 版本, 不依赖于CUDA和CUDNN版本.

2.安装opencv

sudo apt-get update
sudo apt install libopencv-dev

apt-get install tensorrt报错

https://github.com/NVIDIA/TensorRT/issues/792

 tensorrt : Depends: libnvinfer7 (= 7.0.0-1+cuda10.0) but 7.2.2-1+cuda11.1 is to be installedDepends: libnvinfer-plugin7 (= 7.0.0-1+cuda10.0) but 7.2.2-1+cuda11.1 is to be installedDepends: libnvparsers7 (= 7.0.0-1+cuda10.0) but 7.2.2-1+cuda11.1 is to be installedDepends: libnvonnxparsers7 (= 7.0.0-1+cuda10.0) but 7.2.2-1+cuda11.1 is to be installedDepends: libnvinfer-bin (= 7.0.0-1+cuda10.0) but it is not going to be installedDepends: libnvinfer-dev (= 7.0.0-1+cuda10.0) but 7.2.2-1+cuda11.1 is to be installedDepends: libnvinfer-plugin-dev (= 7.0.0-1+cuda10.0) but 7.2.2-1+cuda11.1 is to be installedDepends: libnvparsers-dev (= 7.0.0-1+cuda10.0) but 7.2.2-1+cuda11.1 is to be installedDepends: libnvonnxparsers-dev (= 7.0.0-1+cuda10.0) but 7.2.2-1+cuda11.1 is to be installedDepends: libnvinfer-samples (= 7.0.0-1+cuda10.0) but it is not going to be installedDepends: libnvinfer-doc (= 7.0.0-1+cuda10.0) but it is not going to be installed

mv /etc/apt/sources.list.d/nvidia-ml.list /etc/apt/sources.list.d/nvidia-ml.list.bak

在apt-get install tensorrt 即可

1.优化流程:

TensorRT总共有5个阶段:创建网络、构建推理Engine、序列化引擎、反序列化引擎以及执行推理Engine。

其中第1,2,3大概就是c++api写的网络结构或者其他第三方格式,经过NetworkDefinition进行定义,采用builder加载模型权重,进行一些参数的优化,然后再用engine序列化成“Plan”(流图),其不仅保存了计算时所需的网络weights也保存了Kernel执行的调度流程。。

而4,5就是推理:采用engine反序列化,创建运行环境,在进行推理即可。

可看出TensorRT在获得网络计算流图后会针对计算流图进行优化.

深度学习框架在做推理时,会对每一层调用多个/次功能函数。而由于这样的操作都是在GPU上运行的,从而会带来多次的CUDA Kernel launch过程。相较于Kernel launch以及每层tensor data读取来说,kernel的计算是更快更轻量的,从而使得这个程序受限于显存带宽并损害了GPU利用率。

TensorRT通过以下三种方式来解决这个问题:

  1. Kernel纵向融合:通过融合相同顺序的操作来减少Kernel Launch的消耗以及避免层之间的显存读写操作。如上图所示,卷积、Bias和Relu层可以融合成一个Kernel,这里称之为CBR。

  2. Kernel横向融合:TensorRT会去挖掘输入数据且filter大小相同但weights不同的层,对于这些层不是使用三个不同的Kernel而是使用一个Kernel来提高效率,如上图中超宽的1x1 CBR所示,把结构相同但权重不同的层合并成更宽的层,从而减少cuda核心的使用.。

  3. 消除concatenation层,通过预分配输出缓存以及跳跃式的写入方式来避免这次转换。

通过这样的优化,TensorRT可以获得更小、更快、更高效的计算流图,其拥有更少层网络结构以及更少Kernel Launch次数。下表列出了常见几个网络在TensorRT优化后的网络层数量,很明显的看到TensorRT可以有效的优化网络结构、减少网络层数从而带来性能的提升。

2.torch版lenet转trt

2.1 torch版代码:

lenet.py

# coding:utf-8
import torch
from torch import nn
from torch.nn import functional as Fclass Lenet5(nn.Module):"""for cifar10 dataset."""def __init__(self):super(Lenet5, self).__init__()self.conv1 = nn.Conv2d(1, 6, kernel_size=5, stride=1, padding=0)self.pool1 = nn.AvgPool2d(kernel_size=2, stride=2, padding=0)self.conv2 = nn.Conv2d(6, 16, kernel_size=5, stride=1, padding=0)self.fc1 = nn.Linear(16 * 5 * 5, 120)self.fc2 = nn.Linear(120, 84)self.fc3 = nn.Linear(84, 10)def forward(self, x):# print('input: ', x.shape)x = F.relu(self.conv1(x))# print('conv1', x.shape)x = self.pool1(x)# print('pool1: ', x.shape)x = F.relu(self.conv2(x))# print('conv2', x.shape)x = self.pool1(x)# print('pool2', x.shape)x = x.view(x.size(0), -1)# print('view: ', x.shape)x = F.relu(self.fc1(x))# print('fc1: ', x.shape)x = F.relu(self.fc2(x))x = F.softmax(self.fc3(x), dim=1)return xdef main():import osos.environ["CUDA_VISIBLE_DEVICES"] = "1"print('cuda device count: ', torch.cuda.device_count())torch.manual_seed(1234)net = Lenet5()net = net.to('cuda:0')net.eval()import timest_time = time.time()nums = 10000for i in range(nums):tmp = torch.ones(1, 1, 32, 32).to('cuda:0')out = net(tmp)# print('lenet out shape:', out.shape)print('lenet out:', out)end_time = time.time()print('==cost time{}'.format((end_time - st_time)))torch.save(net, "lenet5.pth")if __name__ == '__main__':main()

将模型权重存储为.pth,并测试时间为:

2.2.pth存储为.onnx

为了方便查看网络结构

# coding:utf-8
import torch
from torch import nn
from torch.nn import functional as Fclass Lenet5(nn.Module):"""for cifar10 dataset."""def __init__(self):super(Lenet5, self).__init__()self.conv1 = nn.Conv2d(1, 6, kernel_size=5, stride=1, padding=0)self.pool1 = nn.AvgPool2d(kernel_size=2, stride=2, padding=0)self.conv2 = nn.Conv2d(6, 16, kernel_size=5, stride=1, padding=0)self.fc1 = nn.Linear(16 * 5 * 5, 120)self.fc2 = nn.Linear(120, 84)self.fc3 = nn.Linear(84, 10)def forward(self, x):# print('input: ', x.shape)x = F.relu(self.conv1(x))# print('conv1', x.shape)x = self.pool1(x)# print('pool1: ', x.shape)x = F.relu(self.conv2(x))# print('conv2', x.shape)x = self.pool1(x)# print('pool2', x.shape)x = x.view(x.size(0), -1)# print('view: ', x.shape)x = F.relu(self.fc1(x))# print('fc1: ', x.shape)x = F.relu(self.fc2(x))x = F.softmax(self.fc3(x), dim=1)return xdef main():import osos.environ["CUDA_VISIBLE_DEVICES"] = "1"print('cuda device count: ', torch.cuda.device_count())torch.manual_seed(1234)net = Lenet5()net = net.to('cuda:0')net.eval()import timest_time = time.time()nums = 10000for i in range(nums):tmp = torch.ones(1, 1, 32, 32).to('cuda:0')out = net(tmp)# print('lenet out shape:', out.shape)print('lenet out:', out)end_time = time.time()print('==cost time{}'.format((end_time - st_time)))torch.save(net, "lenet5.pth")def model_onnx():input = torch.ones(1, 1, 32, 32, dtype=torch.float32).cuda()model = Lenet5()model = model.cuda()torch.onnx.export(model, input, "./lenet.onnx", verbose=True)if __name__ == '__main__':# main()model_onnx()

抓换onnx,遇到好几种问题,用这种基本都解决了.

torch.onnx.export(model,  # model being runinput,  # model input (or a tuple for multiple inputs)"./xxxx.onnx",opset_version=10,verbose=False,  # store the trained parameter weights inside the model filetraining=False,do_constant_folding=True,input_names=['input'],output_names=['output'])

2.3 .pth存储为.wts

将模型权重按照key,value形式存储为16进制文件, inference.py

import torch
from torch import nn
from lenet5 import Lenet5
import os
import structdef main():print('cuda device count: ', torch.cuda.device_count())net = torch.load('lenet5.pth')net = net.to('cuda:0')net.eval()#print('model: ', net)#print('state dict: ', net.state_dict()['conv1.weight'])tmp = torch.ones(1, 1, 32, 32).to('cuda:0')#print('input: ', tmp)out = net(tmp)print('lenet out:', out)f = open("lenet5.wts", 'w')print('==net.state_dict().keys():', net.state_dict().keys())f.write("{}\n".format(len(net.state_dict().keys())))for k, v in net.state_dict().items():print('key: ', k)print('value: ', v.shape)vr = v.reshape(-1).cpu().numpy()f.write("{} {}".format(k, len(vr)))for vv in vr:# print('=vv:', vv)f.write(" ")# print(struct.pack(">f", float(vv)).hex())#f.write(struct.pack(">f", float(vv)).hex())f.write("\n")print('==f:', f)def  test_struct():vv = 16print(struct.pack(">f", float(vv)))  #if __name__ == '__main__':main()# test_struct()

2.4 .wts转换成.engine与利用.engine推理

lenet.cpp 

#include <map>
#include <chrono>
#include <fstream>
#include "NvInfer.h"
#include "logging.h"
#include "cuda_runtime_api.h"static const int INPUT_H=32;
static const int INPUT_W=32;
static const int BATCH_SIZE=32;
static const int OUTPUT_SIZE=10;
static const int INFER_NUMS=10000;
const char* INPUT_BLOB_NAME = "data";
const char* OUTPUT_BLOB_NAME = "prob";using namespace nvinfer1;
static Logger gLogger;#define CHECK(status) \do\{\auto ret = (status);\if (ret != 0)\{\std::cerr << "Cuda failure: " << ret << std::endl;\abort();\}\} while (0)std::map<std::string, Weights> loadWeights(const std::string file)
{std::cout << "Loading weights: " << file << std::endl;std::map<std::string, Weights> weightMap;// Open weights filestd::ifstream input(file);assert(input.is_open() && "Unable to load weight file.");// Read number of weight blobsint32_t count;input >> count;assert(count > 0 && "Invalid weight map file.");while (count--){Weights wt{DataType::kFLOAT, nullptr, 0};uint32_t size;// Read name and type of blobstd::string name;input >> name >> std::dec >> size;wt.type = DataType::kFLOAT;// Load blobuint32_t* val = reinterpret_cast<uint32_t*>(malloc(sizeof(val) * size));for (uint32_t x = 0, y = size; x < y; ++x){input >> std::hex >> val[x];}wt.values = val;wt.count = size;weightMap[name] = wt;}return weightMap;
}ICudaEngine* createLenetEngine(unsigned int maxBatchSize, IBuilder* builder, IBuilderConfig* config, DataType dt)
{//开始定义网络 0U无符号整型0INetworkDefinition* network =  builder->createNetworkV2(0U);ITensor* input = network->addInput(INPUT_BLOB_NAME, dt, Dims3{1, INPUT_H, INPUT_W});assert(input);std::map<std::string, Weights> weightMap = loadWeights("../lenet5.wts");//载入权重放入weightMap// std::cout<<weightMap["conv1.weight"]<<std::endl;  //卷积层IConvolutionLayer* conv1 = network->addConvolution(*input, 6, DimsHW{5, 5}, weightMap["conv1.weight"], weightMap["conv1.bias"]);//设置步长assert(conv1);conv1->setStrideNd(DimsHW{1, 1});//激活层IActivationLayer* relu1 = network->addActivation(*conv1->getOutput(0), ActivationType::kRELU);assert(relu1);//pooling层IPoolingLayer* pool1 = network->addPoolingNd(*relu1->getOutput(0), PoolingType::kAVERAGE, DimsHW{2, 2});assert(pool1);pool1->setStrideNd(DimsHW{2, 2});//卷积层IConvolutionLayer* conv2 = network->addConvolution(*pool1->getOutput(0), 16, DimsHW{5, 5}, weightMap["conv2.weight"], weightMap["conv2.bias"]);//设置步长assert(conv2);conv2->setStrideNd(DimsHW{1, 1});    //激活层IActivationLayer* relu2 = network->addActivation(*conv2->getOutput(0), ActivationType::kRELU);assert(relu2);//pooling层IPoolingLayer* pool2 = network->addPoolingNd(*relu2->getOutput(0), PoolingType::kAVERAGE, DimsHW{2, 2});assert(pool2);pool2->setStrideNd(DimsHW{2, 2});//全连接IFullyConnectedLayer* fc1  = network->addFullyConnected(*pool2->getOutput(0), 120, weightMap["fc1.weight"], weightMap["fc1.bias"]);assert(fc1);//激活层IActivationLayer* relu3 = network->addActivation(*fc1->getOutput(0), ActivationType::kRELU);assert(relu3);//全连接IFullyConnectedLayer* fc2  = network->addFullyConnected(*relu3->getOutput(0), 84, weightMap["fc2.weight"], weightMap["fc2.bias"]);assert(fc2);//激活层IActivationLayer* relu4 = network->addActivation(*fc2->getOutput(0), ActivationType::kRELU);assert(relu4);//全连接IFullyConnectedLayer* fc3  = network->addFullyConnected(*relu4->getOutput(0), OUTPUT_SIZE, weightMap["fc3.weight"], weightMap["fc3.bias"]);assert(fc3);//分类层ISoftMaxLayer *prob = network->addSoftMax(*fc3->getOutput(0));assert(prob);prob->getOutput(0)->setName(OUTPUT_BLOB_NAME);network->markOutput(*prob->getOutput(0));//构造enginebuilder->setMaxBatchSize(maxBatchSize);config->setMaxWorkspaceSize(1<<20);ICudaEngine* engine = builder->buildEngineWithConfig(*network, *config);//放入engine 所以network可以销毁了network->destroy();// 释放资源for (auto& mem : weightMap){free((void*) (mem.second.values));}return engine;
}
void APIToModel(unsigned int maxBatchSize, IHostMemory** modelStream)
{//创建builderIBuilder* builder = createInferBuilder(gLogger);//网络入口 类似pytorch的modelIBuilderConfig* config = builder->createBuilderConfig();//创建模型 搭建网络层ICudaEngine* engine = createLenetEngine(maxBatchSize, builder, config, DataType::kFLOAT);assert(engine!=nullptr);//序列化engine(*modelStream)= engine->serialize();//销毁对象   engine->destroy();builder->destroy();}void doInference(IExecutionContext& context, float* input, float *output, int batchSize)
{//使用传进来的context恢复engine。const ICudaEngine& engine = context.getEngine();//输入输出总共有两个,做一下验证assert(engine.getNbBindings()==2);//void void* buffers[2];//获取与这个engine相关的输入输出tensor的索引sconst int inputIndex = engine.getBindingIndex(INPUT_BLOB_NAME);const int outputIndex = engine.getBindingIndex(OUTPUT_BLOB_NAME);//为输入输出tensor开辟显存。CHECK(cudaMalloc(&buffers[inputIndex], batchSize * INPUT_H * INPUT_W * sizeof(float)));CHECK(cudaMalloc(&buffers[outputIndex], batchSize * OUTPUT_SIZE * sizeof(float)));//创建cuda流,用于管理数据复制,存取,和计算的并发操作cudaStream_t stream;CHECK(cudaStreamCreate(&stream));//从内存到显存,input是读入内存中的数据;buffers[inputIndex]是显存上的存储区域,用于存放输入数据CHECK(cudaMemcpyAsync(buffers[inputIndex], input, batchSize * INPUT_H * INPUT_W * sizeof(float), cudaMemcpyHostToDevice, stream));// //启动cuda核,异步执行推理计算context.enqueue(batchSize, buffers, stream, nullptr);//从显存到内存,buffers[outputIndex]是显存中的存储区,存放模型输出;output是内存中的数据CHECK(cudaMemcpyAsync(output, buffers[outputIndex], batchSize * OUTPUT_SIZE * sizeof(float), cudaMemcpyDeviceToHost, stream));//如果使用了多个cuda流,需要同步cudaStreamSynchronize(stream);// Release stream and buffers    cudaStreamDestroy(stream);CHECK(cudaFree(buffers[inputIndex]));CHECK(cudaFree(buffers[outputIndex]));}
int main(int argc, char ** argv)  
{   if (argc!=2){   std::cerr << "arguments not right!" << std::endl;std::cerr << "./lenet -s   // serialize model to plan file" << std::endl;std::cerr << "./lenet -d   // deserialize plan file and run inference" << std::endl;return -1;}//序列化模型为.engine文件if(std::string(argv[1])=="-s"){   IHostMemory* modelStream{nullptr};//modelStream是一块内存区域,用来保存序列化文件APIToModel(1, &modelStream);assert(modelStream!=nullptr);//变换为.engine文件std::ofstream p("lenet.engine");if (!p){std::cerr<<"can not open plan file"<<std::endl;return -1;}p.write(reinterpret_cast<const char *>(modelStream->data()), modelStream->size());// p.write(reinterpret_cast<const char*>(modelStream->data()), modelStream->size());//销毁对象modelStream->destroy();}else if (std::string(argv[1])=="-d"){      char *trtModelStream{nullptr};size_t size{0};std::ifstream file("lenet.engine", std::ios::binary);if (file.good()) {file.seekg(0, file.end);size = file.tellg();file.seekg(0, file.beg);trtModelStream = new char[size];assert(trtModelStream);file.read(trtModelStream, size);file.close();}else{return -1;}//模拟数据float data[INPUT_H*INPUT_W];for (int i=0;i<INPUT_W*INPUT_H;i++){data[i] = 1.0;}//创建运行时环境IRuntime对象IRuntime* runtime = createInferRuntime(gLogger);assert(runtime !=nullptr);ICudaEngine* engine = runtime->deserializeCudaEngine(trtModelStream,size,nullptr);assert(engine !=nullptr);//创建上下文环境,主要用与inference函数中启动cuda核IExecutionContext* context = engine->createExecutionContext();assert(context !=nullptr);//开始推理, 模拟推理1000次,存储推理结果float prob[OUTPUT_SIZE];auto start = std::chrono::system_clock::now();//开始时间for (int i=0;i<INFER_NUMS;i++){               // std::cout<<"data[i]:"<<data[i]<<std::endl;doInference(*context, data, prob, 1);            }auto end = std::chrono::system_clock::now();//结束时间std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << "ms" << std::endl;context->destroy();engine->destroy();runtime->destroy();std::cout<<"prob:";for (int i=0;i<OUTPUT_SIZE;i++){         std::cout<<prob[i]<<",";}   }else{return -1;}return 0;
}

CMakeLists.txt

cmake_minimum_required(VERSION 2.6)project(lenet)add_definitions(-std=c++11)set(TARGET_NAME "lenet")option(CUDA_USE_STATIC_CUDA_RUNTIME OFF)
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_BUILD_TYPE Debug)include_directories(${PROJECT_SOURCE_DIR}/include)
# include and link dirs of cuda and tensorrt, you need adapt them if yours are different
# cuda
include_directories(/usr/local/cuda/include)
link_directories(/usr/local/cuda/lib64)
# tensorrt
include_directories(/usr/include/x86_64-linux-gnu)
link_directories(/usr/lib/x86_64-linux-gnu)#tar包 tensorrt
#include_directories(/red_detection/tensorrt_learn/software/TensorRT-7.0.0.11/include)
#link_directories(/red_detection/tensorrt_learn/software/TensorRT-7.0.0.11/lib)FILE(GLOB SRC_FILES ${PROJECT_SOURCE_DIR}/lenet.cpp ${PROJECT_SOURCE_DIR}/include/*.h)add_executable(${TARGET_NAME} ${SRC_FILES})
target_link_libraries(${TARGET_NAME} nvinfer)
target_link_libraries(${TARGET_NAME} cudart)add_definitions(-O2 -pthread)

./lenet -s 转换成 .engine文件

./lenet -d 进行推理

推理时间:

可看出时间和torch的相比加快了至少4倍,而结果却差不多。

一些很不错的仓库:

https://github.com/wang-xinyu/tensorrtx

https://github.com/zerollzeng/tiny-tensorrt

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/493130.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

《自然》预测2019年重大科学事件

携带传感器的象海豹将帮助研究人员收集海洋数据&#xff0c;这是研究南极洲思韦茨冰川任务的一部分。来源&#xff1a;科技日报新年的钟声即将响起&#xff0c;在此辞旧迎新之际&#xff0c;除了埋头总结过去一年的得失&#xff0c;回味一下往事的喜乐&#xff0c;也应该抬头展…

python刷题+leetcode(第三部分)

200.最大正方形 思路:与岛屿&#xff0c;水塘不同的是这个相对要规则得多&#xff0c;而不是求连通域&#xff0c;所以动态规划构造出状态转移方程即可 动态规划 if 0, dp[i][j] 0 if 1, dp[i][j] min(dp[i-1][j-1],dp[i-1][j],dp[i][j-1])1 class Solution:def maximalSqu…

在ubuntu 12.04 x64下编译hadoop2.4

自己编译hadoop&#xff1a;x64 1.安装依赖包 sudo apt-get install g autoconf automake libtool cmake zlib1g-dev pkg-config libssl-dev openssh-server maven openssh-client 2.下载hadoop源码 wget http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-2.4.0/hadoop…

中科院脑科学与智能技术卓越创新中心:在“脑海”中突破进取

来源&#xff1a;中国科学报 2017年底&#xff0c;“中中”和“华华”两个可爱的小猴子降临人世&#xff0c;标志着中国率先开启了以体细胞克隆猴作为实验动物模型的新时代。这是继2016年建立食蟹猴自闭症模型后&#xff0c;中科院脑科学与智能技术卓越创新中心再一次取得非人灵…

python(c++)刷题+剑指offer

03. 数组中重复的数字 思路:hash class Solution:def findRepeatNumber(self, nums: List[int]) -> int:dict_ dict()for i in range(len(nums)):if nums[i] in dict_:return nums[i]else:dict_[nums[i]] i class Solution { public:int findRepeatNumber(vector<int…

erlang开发环境配置

第一步 从源码安装erlang git clone https://github.com/erlang/otp 目前最新版本为17.X cd otp/ ./configer 检查编译环境 sudo make & make install 编译并安装 我是在ubuntu 系统下配置的 其余的linux 环境我是直接切换到root用户操作 这样安装需要自己一个一个解决…

美国免费为全世界提供GPS服务,为什么中国还要搞“北斗”?

来源&#xff1a;雷锋网摘要&#xff1a;12月27日&#xff0c;在国务院新闻办公室新闻发布会上&#xff0c;中国卫星导航系统管理办公室主任、北斗卫星导航系统新闻发言人冉承其宣布——北斗三号基本系统完成建设&#xff0c;于今日开始提供全球服务。这标志着北斗系统服务范围…

IDC与百度联合发报告:预测2019年人工智能十大趋势

来源&#xff1a;网络大数据12月20日&#xff0c;国际数据公司(IDC)与百度AI产业研究中心(BACC)联合发布《百度大脑领导力白皮书》&#xff0c;白皮书预测了2019年中国人工智能市场发展趋势&#xff0c;通过实际案例解析人工智能如何从技术到落地&#xff0c;并提出“100天AI部…

链表的一些leetcode题目+python(c++)

主要常见下面几个知识点: 1-1.请编写一个函数&#xff0c;使其可以删除某个链表中给定的&#xff08;非末尾&#xff09;节点&#xff0c;你将只被给定要求被删除的节点。 python: # Definition for singly-linked list. # class ListNode: # def __init__(self, x): # …

暑训day1解题报告

A - Painting the sticks因为不能覆盖涂/涂两次&#xff0c;所以就数数有几个三个一块儿就行了。#include<cstdio> int a[100],ans ; int main() {int n , t 0 ;while (scanf("%d",&n)!EOF) {for (int i1; i<n; i) scanf("%d",ai);ans 0 ; …

2019展望:超级智能崛起,人类智慧与机器智能将深度融合

作者&#xff1a;刘锋 计算机博士 互联网进化论作者摘要&#xff1a;在2019年到来的脚步声中&#xff0c;感觉要写点文字&#xff0c;对2019年的科技趋势进行展望&#xff0c;也算是对2018年思考的总结&#xff0c;这篇展望的文章主要表达经过50年的科技和商业拉动&#xff0c;…

二叉树的一些leetcode题目+python(c++)

二叉树考点主要有: 1.三种遍历方式,以及构造二叉树等&#xff1b; 2.求深度,最长直径&#xff0c;最长路径,公共祖先等等; 3.合并二叉树&#xff0c;翻转二叉树&#xff0c;判断平衡性,对称性等; 4.从前序与中序构造二叉树&#xff0c;中序与后序构造二叉树&#xff0c;二叉…

Eclipse下如何导入jar包

原地址&#xff1a;http://blog.csdn.net/justinavril/article/details/2783182 我们在用Eclipse开发程序的时候&#xff0c;经常想要用到第三方的jar包。这时候我们就需要在相应的工程下面导入这个jar包。以下配图说明导入jar包的步骤。 1.右击工程的根目录&#xff0c;点击Pr…

面对“超人革命”,我们是否已做好准备?

来源&#xff1a;资本实验室人类自诞生以来&#xff0c;就一直处在自然规律的支配之下。但自从第一次制造工具并学会使用火的那一刻起&#xff0c;人类就开始走在了持续摆脱大自然生物束缚的道路上。千里听音、力大无穷、不死之身、翱翔天际、深入大洋……各种神话中无需借助外…

RepVGG

论文链接 一.背景: 现在的一些复杂模型虽然有很高准确度,但是缺点也很明显: 1.多分支,带来了速度的减慢和降低显存的使用率; 2.Mobilenet虽然采用可分离卷积.shufflenet采用分组卷积,带来了flop降低,但是却增加了内存的访问成本(MAC) 二.网络结构 1. 模型 RepVGG在train和…

2018十大科技丑闻,连娱乐圈都甘拜下风

来源&#xff1a;科技日报这一年里&#xff0c;风起云涌的科技界很忙有些人尝到了甜头有些人却吃到了苦头有些事件令人瞠目结舌、难以置信“404”教授梁莹120篇论文凭空蒸发韩春雨论文调查结果被吐槽酸碱体质论与心脏干细胞的创始人双双走下神坛日本福岛核电站污水处理报告错误…

华为究竟做了多少芯片?

来源&#xff1a;半导体行业观察华为依托于旗下的海思大举进军芯片的这个事实已经广为人知了。除了Kirin芯片外&#xff0c;华为在IPC视频编解码和图像信号处理的芯片、电视芯片&#xff0c;Balong基带和NB-IoT芯片等多个领域都取得不错的市场表现&#xff0c;也获得了大家的高…

leetcode hot100(第一部分) + python(c++)

1-1.两数之和 思路&#xff11;&#xff1a;两层for循环 &#xff2f;&#xff08;&#xff4e;2&#xff09; class Solution:def twoSum(self, nums, target):res []for i in range(len(nums)):for j in range(i1, len(nums)):if nums[i]nums[j]target:res.extend([i, j])b…

云计算深度报告:算力时代迎巨变

来源&#xff1a;中泰证券云计算正进入算力时代供给端:数据存量增长与计算成本下降推动算力需求增长信息技术快速发展与传统产业数据化转型带来大量数据存量。随着云计算、大数据、物联网、人工智能等信息技术的快速发展和传统产业数字化的转型&#xff0c;数据量呈现几何级增长…

c++将.cpp编译为.so文件

1.首先看文件形式 其中cpp1是直接调用.cpp和生成.so的文件夹。 cpp2是测试生成的.so可以使用的文件夹。 2.先直接编译.cpp检查代码没问题 a.cpp内容: #include <iostream> #include "a.h"void A::readImg(char* path) {B b;b.readImg(path);} a.h内容: …