CUDA系列-Mem-9

这里写目录标题

  • Static Architecture
    • .Abstractions provided by CUSW_UNIT_MEM_MANAGER
      • Memory Object (CUmemobj)
    • Memory Descriptor(CUmemdesc)
    • Memory Block(CUmemblock)
      • Memory Bins
      • Suballocations in Memory Block
      • Functional description
    • Memory Manager

你可能觉得奇怪,这些不是cuda programming model中的内容啊,其实这是cuda runtimes ,还记得那份泄漏出来的代码吗?
This section describes static aspects of the CUSW_UNIT_MEM_MANAGER unit’s architecture.


Static Architecture

The CUSW_UNIT_MEM_MANAGER provides abstractions for allocating memory by other units of CUDA driver or from user space applications through CUSW_UNIT_CUDART. It also provides abstractions to share the memory allocated in the Host with the Device.

CUDA driver sees memory allocations in the form of “Memory objects”. A Memory object represents the chunk of memory allocated. It abstracts all layers involved in the memory management and provides APIs to other units of CUDA driver to allocate/free and map/unmap memory along with other auxiliary functionalities.

To avoid fragmentation of memory, the memory objects are allocated from a bigger fixed size “Memory Block”. Each Memory block has a “Memory Descriptor” which contains all the memory attributes associated with a memory block. The “Memory Manager” maintains all the memory blocks which are allocated in a given CUDA context.

The diagram below shows how CUSW_UNIT_MEM_MANAGER interfaces with different units of CUDA driver. Since memory management needs support from the underlying drivers, CUSW_UNIT_MEM_MANAGER depends on the NVRM driver to accomplish its tasks in memory management. In line with the top-level architectural philosophy (refer to the CUDA Architecture document), the unit also contains parts of Hardware Abstraction Layer (HAL) and Driver Model Abstraction layer (DMAL), which it uses internally.
在这里插入图片描述

.Abstractions provided by CUSW_UNIT_MEM_MANAGER

CUSW_UNIT_MEM_MANAGER primarily consists of three types of abstractions. The Memory object, Memory Descriptor and the Memory Manager. This section describes each abstraction in detail.

Memory Object (CUmemobj)

A memory object represents a memory allocation. It contains the size, device virtual address, host virtual address and other related information about a memory allocation. It is also possible for one context to share memory with other, in which case the memory object also contains the sharing information. Memory object abstracts all underlying implementation and is the only way for other units in the CUDA driver to interface with CUSW_UNIT_MEM_MANAGER.
Functional description
Memory Object abstraction provides functionalities to

Map/unmap an existing memory object to host memory.

Get parent block’s CUmemflags/CUmemdesc for a given memory object.

To get the data needed to share given memory object with another context.

To get the absolute device virtual address of memory object.

To get the memory object for a given Virtual Address or Range.

To get the user-visible device pointer/host pointer for a given memory object and vice versa.

To get the logical byte size of a given memory object.

To get the context of a given memory object.

To get the shared instance of a given memory object.

To check if the memory allocated in host/device and its associated cache attributes.

To check if the memory allocated is pinned/managed memory.

To execute the given cache operation on cpu side on the memory region pointed by given memory object.

To Mark/Unmark the given memory object to ensure that synchronous memory operations on the given memory object are always fully synchronous.

Memory Descriptor(CUmemdesc)

Memory descriptor contains memory attributes associated with a memory allocation.

While requesting for memory, the other units of CUDA driver can provide specific attributes for the memory allocation request. To get/set the attributes for a memory allocation, other units of the CUDA driver can use the interfaces provided by the Memory object.

Memory Block(CUmemblock)

Memory block is a superblock which encapsulates the physical allocation.

Each Memory Block is associated with a Memory Descriptor as explained above which describes the attributes of the memory block. It contains a OS specific structure(dmal) to hold OS driver specific handles and data. The size of a particular memory block is based on specified memory type(like pushbuffer, generic etc) and arch specific requirements for the MMU. Generally the size of a memory block is the size of HUGE Page supported by the Device. Memory block can be created with shared allocation from another context.

Memory Bins

An effective method of avoiding fragmentation of memory is by grouping similar sized allocations together. Since the size of the memory block is big, sub-allocations can be made inside a memory block. Memory bins try to achieve that by creating bins of varying size and associating each memory block with it. For example, 5 bins are created with size 1KB, 4KB, 16KB, 64KB and 256KB during initialization of memory manager. During a memory allocation request when a new memblock is created, based on the allocation request size one of the above memory bin is assigned to the memblock. Future similar sized allocations requests are serviced by suballocating from that memblock.

Suballocations in Memory Block

During a memory allocation request, efforts are made to find an already existing suitable Memory Block which has the same memory attributes(in the form of Memory descriptor) as that of the incoming request and has free area in which the requested size worth of memory can be safely allocated. If such a memory block is found and it belongs to the same memory bin as that of requested size then memory will be sub allocated in that block and a memory object is created to represent the same and returned, otherwise new block is created.

Functional description

Memory Block abstraction provides some functionalities which are used internally and not exposed to components outside CUDA memory manager.

Alloc/Free memblock

Allocate UVA for a given memblock

Map/Unmap given memblock to host

Map/Unmap given memblock to device

To get memory range based on the kind of device mapping chosen

To get the UVA address for a given memblock

Memory Manager

Memory allocations happen within a CUDA Context. So each CUDA context has an instance of Memory Manager which has data structures to track all memory allocations done in a given CUDA context. It also provides synchronization primitives to protect the common data structures from concurrent access.

CUSW_UNIT_MEM_MANAGER maintains different Virtual Address regions within which the VA is assigned for a particular memory allocation request. The choice of the VA region depends mainly on page size and the requested memory type.

The available memory ranges are

Dptr32 - It is the 32 bit device pointer range to which memory is mapped for special allocations that need 32 bit addresses due to some H/W unit requirements

Function Memory - Function memory range for CUDA GPU function code

Small Page Region - If the requested page size is 4KB then allocations are made in this region

Big Page Region - If the requested page size is device specific page size (64KB or 128KB depending on device architecture) then allocations are made in this region.

The class diagram below depicts the relationship between various data types involved in memory management in cuda driver.
在这里插入图片描述

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/bicheng/30400.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

使用SQLite

自学python如何成为大佬(目录):https://blog.csdn.net/weixin_67859959/article/details/139049996?spm1001.2014.3001.5501 与许多其他数据库管理系统不同,SQLite不是一个客户端/服务器结构的数据库引擎,而是一种嵌入式数据库,它的数据库就…

Centos 配置安装Mysql

linux安装配置mysql的方法主要有yum安装和配置安装两种,由于yum安装比较简单,但是会将文件分散到不同的目录结构下面,配置起来比较麻烦,这里主要研究一下配置安装mysql的方法 1、环境说明 centos 7.9 mysql 5.7.372、环境检查 …

反激式开关电源是如何工作的

反激的变压器可以看作一个带变压功能的电感,是一个buck-boost电路。 反击式开关变压器 反激式开关电源是指使用反激高频变压器隔离输入输出回路的开关电源。“反激”指的是在开关管接通的情况下,当输入为高电平时输出线路中串联的电感为放电状态&#x…

ABAP-03基础数据类型

基本数据类型 数据类型默认大小(byte)有效大小初始值说明示例C11-65535SPACE文本字符(串)‘Name’N11-65535‘00…0’数字文本‘0123’T66‘000000’时间(HHMMSS)‘123010’D88‘00000000’日期(yyyymmdd)‘20090901’I4-231~232…

算法基础精选题单 动态规划(dp)(递推+线性dp)(个人题解)

前言&#xff1a; 一些简单的dp问题。 正文&#xff1a; 题单&#xff1a;237题】算法基础精选题单_ACM竞赛_ACM/CSP/ICPC/CCPC/比赛经验/题解/资讯_牛客竞赛OJ_牛客网 (nowcoder.com) 递推&#xff1a; NC235911 走楼梯&#xff1a; #include<bits/stdc.h> using na…

在k8s上部署一个简单的应用

部署一个简单的应用 实验目标&#xff1a; 部署一个简单的 web 应用&#xff0c;比如 Nginx 或者一个自定义的 Node.js 应用。 实验步骤&#xff1a; 创建一个 Deployment。创建一个 Service 来暴露应用。验证应用是否可以通过 Service 访问。 今天我们来做一下昨天分享的可…

Debian12的#!bash #!/bin/bash #!/bin/env bash #!/usr/bin/bash #!/usr/bin/env bash

bash脚本开头可写成 #!/bin/bash , #!/bin/env bash , #!/usr/bin/bash , #!/usr/bin/env bash #!/bin/bash , #!/usr/bin/bash#!/bin/env bash , #!/usr/bin/env bash Debian12的 /bin 是 /usr/bin 的软链接, /sbin 是 /usr/sbin 的软链接, (Debian12默认没有ll命令,用的ls …

Python的pandas读取excel文件中的数据

一、前言 hello呀&#xff01;各位铁子们大家好呀&#xff0c;我是一个在软件测试行业摸爬滚打十几年的老江湖了&#xff0c;今天呢来和大家聊一聊用Python的pandas读取excel文件中的数据。 二、读取Excel文件 使用pandas的read_excel()方法&#xff0c;可通过文件路径直接读…

Techviz:XR协作工作流程,重塑远程电话会议新形式

在当今快速发展的数字环境中&#xff0c;无缝远程协作的需求正在成为企业多部门协同工作的重中之重&#xff0c;尤其是对于制造业、建筑和设计等行业的专业人士而言&#xff0c;这一需求更加迫切。传统的远程电话会议协作形式存在着延滞性&#xff0c;已经渐渐跟不上当今快节奏…

项目三OpenStack基础环境配置与API使用

任务一 了解OpenStack基础环境配置 1.1 •数据库服务器 1.2 •消息队列服务 •AMQP系统的组成 任务二 了解并使用OpenStack API 2.1 •什么是RESTful API • RESTful API 是目前比较成熟的 一套Internet应用程序的API软件架构 。 • 表现 层&#xff08; Representation …

汽车IVI中控开发入门及进阶(三十一):视频知识扫盲

有效的视频资源管理需要集成许多不同的底层技术,共同为用户提供给定应用程序的最佳体验。其中许多技术是从早期电视广播中使用的技术演变而来的。其他方法,如用于通过网络流式传输视频的压缩方法,相对较新且不断发展。 以下详细概述了与图形和视频处理和传输相关的一些基本…

云上宝库:三大厂商对象存储安全性及差异性比较

前言 看了几家云厂商的对象存储&#xff0c;使用上有相似也有差异&#xff0c;聊聊阿里云、腾讯云、京东云三家对象存储在使用中存在的风险以及防护措施。 0x01 云存储命名 阿里云对象存储OSS(Object Storage Service)&#xff0c;新用户免费试用三个月&#xff0c;存储包容…

安装idea后配置的全局配置

1、打开IDEA应用&#xff1a;Customize→All settings...&#xff0c;如果启动IDEA后&#xff0c;默认打开的是之前的项目&#xff0c;可以关闭当前项目&#xff1a;File→Close Project&#xff0c;就退到全局配置界面了。 2、打开全局配置界面&#xff1a;Editor→File Encod…

斯巴达(Spartanhost)VPS的性能评测

原创原文链接&#xff1a;详细斯巴达&#xff08;Spartanhost&#xff09;VPS的性能和购买价值评测 | BOBO Blog (soulcloser.com)https://www.soulcloser.com/3398/ 引言 最近看了全球的VPS商家&#xff0c;想搞台网站高性能的服务器&#xff0c;发现一个特别有意思的商家竟…

Mathtype7永久无限免费安装包下载地址2024最新方法步骤

亲爱的数学爱好者们&#xff0c;今天我要分享一个让数学表达变得超级简单的神器——Mathtype7最新破解版&#xff01;&#x1f389; 这不仅仅是个软件&#xff0c;而是打开高效学习和工作的钥匙。准备好了吗&#xff1f;让我们一起探索这个神奇的工具&#xff01; MathType最新…

通过ModelScope开源Embedding模型将图片转换为向量

本文介绍如何通过ModelScope魔搭社区中的视觉表征模型将图片转换为向量&#xff0c;并入库至向量检索服务DashVector中进行向量检索。 ModelScope魔搭社区旨在打造下一代开源的模型即服务共享平台&#xff0c;为泛AI开发者提供灵活、易用、低成本的一站式模型服务产品&#xf…

空间复杂度的相关概念

1. 空间复杂度 空间复杂度&#xff08;space complexity&#xff09;用于衡量算法占用内存空间随着数据量变大时的增长趋势。 统计哪些空间&#xff1a; ● 暂存数据&#xff1a;用于保存算法运行过程中的各种常量、变量、对象等。 ● 栈帧空间&#xff1a;用于保存调用函数…

keil MDK自动生成带版本bin文件

作为嵌入式单片机开发&#xff0c;在Keil MDK&#xff08;Microcontroller Development Kit&#xff09;中开发完编译完后&#xff0c;经常需要手动进行版本号添加用于发版&#xff0c;非常麻烦&#xff0c;如果是对外发行的话&#xff0c;更容易搞错&#xff0c;特此码哥提供一…

SpringCloud Alibaba Sentinel 流量控制之流控模式实践总结

官网文档&#xff1a;https://sentinelguard.io/zh-cn/docs/flow-control.html wiki地址&#xff1a;https://github.com/alibaba/Sentinel/wiki/%E6%B5%81%E9%87%8F%E6%8E%A7%E5%88%B6 本文版本&#xff1a;spring-cloud-starter-alibaba&#xff1a;2.2.0.RELEASE 如下图所…

企业如何选择合适的CRM工具?除Salesforce之外的10大主流选择

对比salesforce&#xff0c;其他10款优秀CRM&#xff1a;纷享销客CRM、Zoho CRM、腾讯企点、销售易、企业微信 (WeCom)、Odoo CR、OroCRM、金蝶、用友CRM、EspoCRM 虽然Salesforce以其全面的功能和强大的市场占有率在海外收获了许多客户&#xff0c;但Salesforce在国内市场的接…