设计工具
内存

内存湖:内存景观是如何随着CXL™发展的

美光可扩展存储系统寻路组| 2024年1月

不断变化的数据需求

自从有了电脑, 有效地从处理器中获取信息一直是一个挑战. 那堆可怕的打孔卡片, magnetic tape reels 和 then floppy drives gave way to spinning hard drives where large (for that time) amounts of data could be read 和 stored quickly. 这些驱动器连接到一台计算机上, 如果用户想在计算机之间移动数据, sneakernet和FTP是最好的选择. But these approaches resulted in many copies of the same file that were difficult to keep in sync 和 manage.

在80年代中期, some clever engineers at Sun Microsystems solved the file-copy problem by creating the Network File System (NFS), 它允许多台计算机访问驻留在一个位置的文件. 起初, this location was another computer; later, 该位置位于网络附加存储(NAS)设备上.

数据集市, 数据仓库数据仓库 已经让位于 数据的湖泊, 哪个术语用于描述非易失性中可用的大量数据, block-addressable storage accessible over a network for a variety of users 和 purposes, 如图1所示.

内存-lake-graph-2.png

随着数据集从兆字节增长到太字节再到拍字节, the cost of moving data from the block storage devices across interconnects into system 内存, performing computation 和 then storing the large dataset back to persistent storage is rising in terms of time 和 power (watts). 另外, heterogeneous computing hardware increasingly needs access to the same datasets. 例如, a general-purpose CPU may be used for assembling 和 preprocessing a dataset 和 scheduling tasks, but a specialized compute engine (like a GPU) is much faster at training an AI model. A more efficient solution is needed that reduces the transfer of large datasets from storage directly to processor-accessible 内存.

Several organizations have pushed the industry toward solutions to these problems by keeping the datasets in large, byte-addressable, 共享内存. 在20世纪90年代, the scalable coherent interface (SCI) allowed multiple CPUs to access 内存 in a coherent way within a system. 异构系统架构(HSA)1 specification allowed 内存 sharing between devices of different types on the same bus. 从2010年开始的十年, the Gen-Z st和ard delivered a 内存-semantic bus protocol with high b和width 和 low latency with coherency. These efforts culminated in the widely adopted 计算快通 (CXLTM) st和ard being used today. 自计算快速链路(计算快通, CXL)联盟成立以来, 美光一直是并且仍然是一个积极的贡献者.

CXL共享、零拷贝内存

计算快通 打开节省时间和电力的大门. 新的cxl3.1标准允许字节可寻址, load-store-accessible 内存 like DRAM to be shared between different hosts over a low-latency, 采用工业标准组件的高带宽接口.

This sharing opens new doors previously only possible through expensive, proprietary equipment. 使用共享内存系统, the data can be loaded into shared 内存 once 和 then processed multiple times by multiple hosts 和 accelerators in a pipeline, 而不会产生将数据复制到本地内存的成本, 块存储协议和延迟.

此外,还可以消除一些网络数据传输. 例如, data can be ingested 和 stored in shared 内存 over time by a host connected to a sensor array. 曾经驻留在记忆中, 为此目的而优化的第二个主机可以清理和预处理数据, 然后由第三台主机处理数据. 与此同时,第一个主机一直在摄取第二个数据集. The only information that needs to be passed between the hosts is a message pointing to the data to indicate it is ready for processing. The large dataset never has to move or be copied, saving b和width, energy 和 内存 space.

Another example of zero-copy data sharing is a producer–consumer data model where a single host is responsible for collecting data in 内存, 然后,多个其他主机在写入数据后使用数据. 像之前一样, 生产者只需要发送一个指向数据地址的消息, 向其他主机发出信号,表明它已经准备好了.

增强的记忆功能

Zero-copy data sharing can be further enhanced by CXL 内存 modules having built-in processing capabilities. 例如, if a CXL 内存 module can perform a repetitive mathematical operation or data transformation on a data object entirely in the module, 节省系统带宽和功耗. These savings are achieved by comm和ing the 内存 module to execute the operation without the data ever leaving the module using a capability called near 内存 compute (NMC).

另外, the low-latency CXL fabric can be leveraged to send messages with low overhead very quickly from one host to another, 主机与内存条之间, 或者在内存模块之间. These connections can be used to synchronize steps 和 share pointers between producers 和 consumers.

除了NMC和通信优势之外, 高级内存遥测可以添加到CXL模块中 提供一个了解共享设备中实际应用程序流量的新窗口2 不增加主机处理器的负担. 随着深入了解, operating systems 和 management software can optimize data placement (内存 tiering) 和 tune other system parameters to meet operating goals, 从性能到能耗. Additional 内存-intensive, value-add functions such as transactions are also ideally suited to NMC.

内存湖

美光对合并大型公司感到兴奋, scale-out CXL global shared 内存 和 enhanced 内存 features into our 内存 lake concept. 内存湖利用了cxl3的新特性.1 specification 和 adds the capabilities discussed in this blog 和 more, as shown in Figure 2.

内存-lake-graph-1.Png:内存湖框图

内存湖包括以下特性:

  • 高效的容量和成本
    • Hundreds of terabytes to petabytes of globally addressable shared 内存 to allow nonsharded access to the largest datasets
    • 内存分层,其中最关键的数据总是在最快的内存中, but costs 和 data persistence are controlled by keeping less critical data in more cost-effective 内存
    • 可配置的拓扑
  • 通过共享实现性能
    • Data sharing where byte-addressable data is accessible by up to dozens (or hundreds) of hosts through load-store semantics without having to be copied
  • 低延迟实现
    • 低于600纳秒的数据加载和存储时间
    • 通过CXL结构进行同步(少于1微秒)
  • 近内存计算加速性能
    • Compute capabilities with the data never leaving the 内存 module (near- or in-内存 compute)
    • 本机内存模块支持原子操作
对于CXL和共享内存来说,这是一个激动人心的时刻. Keep up on the latest by joining our technology enablement program (TEP) if you’re currently testing CXL, 或者关注我们这里了解未来的更新.

1 异构系统架构基础.org)

2 D. ,维. 沃丁顿和D. A. Roberts,《沙巴体育安卓版下载》

可扩展内存系统寻路组,美光
高级内存解决方案组从事研究, 设计和测试新的存储技术. 我们的专家团队与合作伙伴密切合作, 客户, 大学, 和 st和ards bodies to ensure that the 微米 内存 solutions remain on the leading edge of 内存 technology.