CUDA系列-Mem-9

这里写目录标题

  • Static Architecture
    • .Abstractions provided by CUSW_UNIT_MEM_MANAGER
      • Memory Object (CUmemobj)
    • Memory Descriptor(CUmemdesc)
    • Memory Block(CUmemblock)
      • Memory Bins
      • Suballocations in Memory Block
      • Functional description
    • Memory Manager

你可能觉得奇怪,这些不是cuda programming model中的内容啊,其实这是cuda runtimes ,还记得那份泄漏出来的代码吗?
This section describes static aspects of the CUSW_UNIT_MEM_MANAGER unit’s architecture.


Static Architecture

The CUSW_UNIT_MEM_MANAGER provides abstractions for allocating memory by other units of CUDA driver or from user space applications through CUSW_UNIT_CUDART. It also provides abstractions to share the memory allocated in the Host with the Device.

CUDA driver sees memory allocations in the form of “Memory objects”. A Memory object represents the chunk of memory allocated. It abstracts all layers involved in the memory management and provides APIs to other units of CUDA driver to allocate/free and map/unmap memory along with other auxiliary functionalities.

To avoid fragmentation of memory, the memory objects are allocated from a bigger fixed size “Memory Block”. Each Memory block has a “Memory Descriptor” which contains all the memory attributes associated with a memory block. The “Memory Manager” maintains all the memory blocks which are allocated in a given CUDA context.

The diagram below shows how CUSW_UNIT_MEM_MANAGER interfaces with different units of CUDA driver. Since memory management needs support from the underlying drivers, CUSW_UNIT_MEM_MANAGER depends on the NVRM driver to accomplish its tasks in memory management. In line with the top-level architectural philosophy (refer to the CUDA Architecture document), the unit also contains parts of Hardware Abstraction Layer (HAL) and Driver Model Abstraction layer (DMAL), which it uses internally.
在这里插入图片描述

.Abstractions provided by CUSW_UNIT_MEM_MANAGER

CUSW_UNIT_MEM_MANAGER primarily consists of three types of abstractions. The Memory object, Memory Descriptor and the Memory Manager. This section describes each abstraction in detail.

Memory Object (CUmemobj)

A memory object represents a memory allocation. It contains the size, device virtual address, host virtual address and other related information about a memory allocation. It is also possible for one context to share memory with other, in which case the memory object also contains the sharing information. Memory object abstracts all underlying implementation and is the only way for other units in the CUDA driver to interface with CUSW_UNIT_MEM_MANAGER.
Functional description
Memory Object abstraction provides functionalities to

Map/unmap an existing memory object to host memory.

Get parent block’s CUmemflags/CUmemdesc for a given memory object.

To get the data needed to share given memory object with another context.

To get the absolute device virtual address of memory object.

To get the memory object for a given Virtual Address or Range.

To get the user-visible device pointer/host pointer for a given memory object and vice versa.

To get the logical byte size of a given memory object.

To get the context of a given memory object.

To get the shared instance of a given memory object.

To check if the memory allocated in host/device and its associated cache attributes.

To check if the memory allocated is pinned/managed memory.

To execute the given cache operation on cpu side on the memory region pointed by given memory object.

To Mark/Unmark the given memory object to ensure that synchronous memory operations on the given memory object are always fully synchronous.

Memory Descriptor(CUmemdesc)

Memory descriptor contains memory attributes associated with a memory allocation.

While requesting for memory, the other units of CUDA driver can provide specific attributes for the memory allocation request. To get/set the attributes for a memory allocation, other units of the CUDA driver can use the interfaces provided by the Memory object.

Memory Block(CUmemblock)

Memory block is a superblock which encapsulates the physical allocation.

Each Memory Block is associated with a Memory Descriptor as explained above which describes the attributes of the memory block. It contains a OS specific structure(dmal) to hold OS driver specific handles and data. The size of a particular memory block is based on specified memory type(like pushbuffer, generic etc) and arch specific requirements for the MMU. Generally the size of a memory block is the size of HUGE Page supported by the Device. Memory block can be created with shared allocation from another context.

Memory Bins

An effective method of avoiding fragmentation of memory is by grouping similar sized allocations together. Since the size of the memory block is big, sub-allocations can be made inside a memory block. Memory bins try to achieve that by creating bins of varying size and associating each memory block with it. For example, 5 bins are created with size 1KB, 4KB, 16KB, 64KB and 256KB during initialization of memory manager. During a memory allocation request when a new memblock is created, based on the allocation request size one of the above memory bin is assigned to the memblock. Future similar sized allocations requests are serviced by suballocating from that memblock.

Suballocations in Memory Block

During a memory allocation request, efforts are made to find an already existing suitable Memory Block which has the same memory attributes(in the form of Memory descriptor) as that of the incoming request and has free area in which the requested size worth of memory can be safely allocated. If such a memory block is found and it belongs to the same memory bin as that of requested size then memory will be sub allocated in that block and a memory object is created to represent the same and returned, otherwise new block is created.

Functional description

Memory Block abstraction provides some functionalities which are used internally and not exposed to components outside CUDA memory manager.

Alloc/Free memblock

Allocate UVA for a given memblock

Map/Unmap given memblock to host

Map/Unmap given memblock to device

To get memory range based on the kind of device mapping chosen

To get the UVA address for a given memblock

Memory Manager

Memory allocations happen within a CUDA Context. So each CUDA context has an instance of Memory Manager which has data structures to track all memory allocations done in a given CUDA context. It also provides synchronization primitives to protect the common data structures from concurrent access.

CUSW_UNIT_MEM_MANAGER maintains different Virtual Address regions within which the VA is assigned for a particular memory allocation request. The choice of the VA region depends mainly on page size and the requested memory type.

The available memory ranges are

Dptr32 - It is the 32 bit device pointer range to which memory is mapped for special allocations that need 32 bit addresses due to some H/W unit requirements

Function Memory - Function memory range for CUDA GPU function code

Small Page Region - If the requested page size is 4KB then allocations are made in this region

Big Page Region - If the requested page size is device specific page size (64KB or 128KB depending on device architecture) then allocations are made in this region.

The class diagram below depicts the relationship between various data types involved in memory management in cuda driver.
在这里插入图片描述

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/bicheng/30400.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

使用SQLite

自学python如何成为大佬(目录):https://blog.csdn.net/weixin_67859959/article/details/139049996?spm1001.2014.3001.5501 与许多其他数据库管理系统不同,SQLite不是一个客户端/服务器结构的数据库引擎,而是一种嵌入式数据库,它的数据库就…

Centos 配置安装Mysql

linux安装配置mysql的方法主要有yum安装和配置安装两种,由于yum安装比较简单,但是会将文件分散到不同的目录结构下面,配置起来比较麻烦,这里主要研究一下配置安装mysql的方法 1、环境说明 centos 7.9 mysql 5.7.372、环境检查 …

讲一下v-model的底层实现原理?

什么是v-model? 在Vue.js中,v-model是一个用于实现双向数据绑定的指令。它通常用于表单控件上,以便能够在视图和数据模型之间自动同步数据。具体来说,当用户在输入框中输入内容时,数据模型会自动更新;当数…

MVVM 架构和MVI架构的优缺点对比

Jetpack MVVM 架构讲解 MVVM(Model-View-ViewModel)架构是 Android 开发中一种常用的架构模式,利用 Android Jetpack 组件,可以更简洁和高效地实现 MVVM。以下是 MVVM 的各个组件及其职责: Model 职责:处理…

浅谈AI对生活中文化领域的影响

随着人工智能技术的飞速发展,它已经渗透到我们生活的方方面面,包括文化领域。AI不仅仅是一个技术工具,它正在以前所未有的方式重塑我们的文化景观,影响着我们对艺术、语言、历史和传统的认识与体验。 艺术创作与欣赏 在文化艺术…

反激式开关电源是如何工作的

反激的变压器可以看作一个带变压功能的电感,是一个buck-boost电路。 反击式开关变压器 反激式开关电源是指使用反激高频变压器隔离输入输出回路的开关电源。“反激”指的是在开关管接通的情况下,当输入为高电平时输出线路中串联的电感为放电状态&#x…

ABAP-03基础数据类型

基本数据类型 数据类型默认大小(byte)有效大小初始值说明示例C11-65535SPACE文本字符(串)‘Name’N11-65535‘00…0’数字文本‘0123’T66‘000000’时间(HHMMSS)‘123010’D88‘00000000’日期(yyyymmdd)‘20090901’I4-231~232…

【PHP小课堂】深入学习PHP中的SESSION(二)

深入学习PHP中的SESSION(二) 今天的学习内容没有太多的代码,主要还是以理论经验为主,当然,主要的依据还是来源于 PHP 官方文档中的说明。在日常的业务开发中,SESSION 安全一直是我们最主要也是最关心的内容…

算法基础精选题单 动态规划(dp)(递推+线性dp)(个人题解)

前言&#xff1a; 一些简单的dp问题。 正文&#xff1a; 题单&#xff1a;237题】算法基础精选题单_ACM竞赛_ACM/CSP/ICPC/CCPC/比赛经验/题解/资讯_牛客竞赛OJ_牛客网 (nowcoder.com) 递推&#xff1a; NC235911 走楼梯&#xff1a; #include<bits/stdc.h> using na…

在k8s上部署一个简单的应用

部署一个简单的应用 实验目标&#xff1a; 部署一个简单的 web 应用&#xff0c;比如 Nginx 或者一个自定义的 Node.js 应用。 实验步骤&#xff1a; 创建一个 Deployment。创建一个 Service 来暴露应用。验证应用是否可以通过 Service 访问。 今天我们来做一下昨天分享的可…

Debian12的#!bash #!/bin/bash #!/bin/env bash #!/usr/bin/bash #!/usr/bin/env bash

bash脚本开头可写成 #!/bin/bash , #!/bin/env bash , #!/usr/bin/bash , #!/usr/bin/env bash #!/bin/bash , #!/usr/bin/bash#!/bin/env bash , #!/usr/bin/env bash Debian12的 /bin 是 /usr/bin 的软链接, /sbin 是 /usr/sbin 的软链接, (Debian12默认没有ll命令,用的ls …

Python的pandas读取excel文件中的数据

一、前言 hello呀&#xff01;各位铁子们大家好呀&#xff0c;我是一个在软件测试行业摸爬滚打十几年的老江湖了&#xff0c;今天呢来和大家聊一聊用Python的pandas读取excel文件中的数据。 二、读取Excel文件 使用pandas的read_excel()方法&#xff0c;可通过文件路径直接读…

AI音乐时代的挑战与机遇

近期&#xff0c;音乐领域迎来了一波AI大模型的上线潮&#xff0c;这些模型极大地降低了素人生产音乐的门槛&#xff0c;引发了关于音乐圈是否会被AI彻底颠覆的热烈讨论。虽然这一现象带来了短暂的兴奋&#xff0c;但同时也引发了一系列问题&#xff0c;如AI音乐产品的版权归属…

Techviz:XR协作工作流程,重塑远程电话会议新形式

在当今快速发展的数字环境中&#xff0c;无缝远程协作的需求正在成为企业多部门协同工作的重中之重&#xff0c;尤其是对于制造业、建筑和设计等行业的专业人士而言&#xff0c;这一需求更加迫切。传统的远程电话会议协作形式存在着延滞性&#xff0c;已经渐渐跟不上当今快节奏…

动态更新自建的Redis连接池连接数量

/*** 定时更新Redis连接池信息&#xff0c;防止资源让费*/private static final ScheduledThreadPoolExecutor DYNAMICALLY_UPDATE_REDIS_POOL_THREAD new ScheduledThreadPoolExecutor(1, new ThreadFactory() {Overridepublic Thread newThread(Runnable r) {Thread thread …

项目三OpenStack基础环境配置与API使用

任务一 了解OpenStack基础环境配置 1.1 •数据库服务器 1.2 •消息队列服务 •AMQP系统的组成 任务二 了解并使用OpenStack API 2.1 •什么是RESTful API • RESTful API 是目前比较成熟的 一套Internet应用程序的API软件架构 。 • 表现 层&#xff08; Representation …

汽车IVI中控开发入门及进阶(三十一):视频知识扫盲

有效的视频资源管理需要集成许多不同的底层技术,共同为用户提供给定应用程序的最佳体验。其中许多技术是从早期电视广播中使用的技术演变而来的。其他方法,如用于通过网络流式传输视频的压缩方法,相对较新且不断发展。 以下详细概述了与图形和视频处理和传输相关的一些基本…

云上宝库:三大厂商对象存储安全性及差异性比较

前言 看了几家云厂商的对象存储&#xff0c;使用上有相似也有差异&#xff0c;聊聊阿里云、腾讯云、京东云三家对象存储在使用中存在的风险以及防护措施。 0x01 云存储命名 阿里云对象存储OSS(Object Storage Service)&#xff0c;新用户免费试用三个月&#xff0c;存储包容…

安装idea后配置的全局配置

1、打开IDEA应用&#xff1a;Customize→All settings...&#xff0c;如果启动IDEA后&#xff0c;默认打开的是之前的项目&#xff0c;可以关闭当前项目&#xff1a;File→Close Project&#xff0c;就退到全局配置界面了。 2、打开全局配置界面&#xff1a;Editor→File Encod…

FreeRTOS(一)

一.汇编指令 读内存&#xff1a;Load LDR RO&#xff0c;[R1&#xff0c;#4];读地址"R14"&#xff0c;得到的4字节数据存入RO 写内存&#xff1a;Store STR RO&#xff0c;[R1&#xff0c;#4]:把R0的4字节数据写入地址"R14" 加减 ADD RO.R1&#xff0c;R2R…