CL_DRAM_HBM_DMA CustomLogic Example

⚠️ Features using XDMA engine are currently unsupported on F2 instances

Table of Content

  1. Overview

  2. Functional Description

  3. Software

Overview

The CL_DRAM_HBM_DMA example demonstrates the use and connectivity for many of the Shell/CL interfaces and functionality, including:

  1. Register access over OCL AXI-Lite interface

  2. Mapping of the external DRAM channel to instance memory via PCIe AppPF BAR4, and the 512-bit sh_cl_dma_pcis_* AXI4 bus

  3. Mapping of HBM memory to instance memory via PCIe AppPF BAR4, and the 512-bit sh_cl_dma_pcis_* AXI4 bus

  4. Debugging using Virtual JTAG and Xilinx Integrated Logic Analyzer (ILA) cores.

  5. User-defined interrupts

  6. cl_sh_pcim_* AXI4 traffic for Host memory accesses from CL

System diagram

cl_dram_hbm_dma

Functional Description

Memory Interface

The cl_dram_hbm_dma demonstrates accessing 64GB of DDR memory and 16GB of single channel HBM memory over the sh_cl_dma_pcis AXI4 bus.

The CL provides access to the DDR memory through sh_ddr.sv which consists of the DDR controller required to interact with the external DDR DIMM on the FPGA card. The sh_ddr.sv is also connected to the Shell’s sh_cl_ddr_stat_* ports to enable Shell to manage DDR calibration upon CL AFI loads.

The CL also provides access to the on-chip HBM available in the FPGA. The HBM IP is configured for dual stack 16GB capacity which exposes 32 AXI3 (not AXI4) channels running on 450MHz clock. However, only one HBM channel is being used in this example design to demonstrate datapath connection to the HBM. The cl_hbm_axi4.sv does the AXI4-to-AXI3 conversion, crosses clock domain between the 250 MHz clk_main_a0 and HBM’s up to 450 MHz axi_clk for the datapath and feeds the cl_hbm_wrapper.sv. The cl_hbm_wrapper.sv instantiates the HBM IP, houses MMCM to generate 450MHz interface clock, and connects the datapath to Channel#0 of the HBM IP core.

NOTE: Since the CL utilizes only one HBM Channel which is fed by data source running at 250MHz clock, it does not accomplish the max available bandwidth from HBM. For demonstration of the max performance from HBM. refer to the cl_mem_perf example.

Design Changes

Individual controllers (DDR, HBM) can be enabled and disabled by simply updating the appropriate defines in the cl_dram_hbm_dma_defines.vh file. The sh_ddr.sv file contains the necessary logic to remove and tie-off appropriate interfaces when the DDR controller is disabled.

For the cl_dram_hbm_dma example memory ranges have been allocated for DDR and HBM. So, if a particular memory is disabled, then the developer should take care to handle transactions to that address range since there will be no controller to respond to the request. The address ranges for each controller is described below in the dma_pcis AXI4 bus section.

Test Changes

If a particular controller is disabled, make sure that the test is not accessing address space for that controller.

Please look at dma_pcis AXI4 bus section for address ranges.

If DDR is enabled, make sure that the DDRs are initialized using the poke_stat commands below.

eg: To initialize DDR

tb.poke_stat(.addr(8'h0c), .ddr_idx(0), .data(32'h0000_0000));

Make sure that the Host to Card and Card to Host DMA transfers only access enabled DDR controller address space.

dma_pcis AXI4 bus

sh_cl_dma_pcis exposes a address windows of 128GB matching AppPF BAR4.

This memory space is mapped to the 64GB DRAM space and 16GB HBM space. An axi_crossbar interconnects the incoming address requests to the target memories as shown below:

  • DDR base_addr=0x00_0000_0000 (range=64GB)

  • HBM base_addr=0x10_0000_0000 (range=16GB)

NOTE: Please refer to Supported DDR Modes for details on supported DDR configurations in sh_ddr.sv.

ocl_ AXI-Lite

The cl_ocl_ AXI-Lite bus is connected to cl_ocl_slv.sv module, and is used for register access to the Automatic Test Generator (ATG) etc.

The valid address map is as shown below:

Module

OCL Start Address

OCL End Address

Description

PCIM ATG

0x0000

0x00FF

PCIM Traffic Generator (refer to cl_tst.sv).

DDR ATG

0x0100

0x01FF

DDR Traffic Generator

HBM ATG

0x0200

0x02FF

HBM Traffic Generator

HBM Status

0x0300

0x0300

  • bit[31:3] - Reserved (RO)

  • bit[31:3] - Reserved (RO)

  • bit[2:1] - HBM Initialized

  • bit[0] - SW Reset for HBM. (1 = Reset HBM)

Any access to invalid address shall return 32'hDEADBEEF

sda_ AXI-Lite

The sh_cl_sda_ AXI-Lite bus is connected to cl_sda_slv.sv module, which provides 1KiB of scratch RAM.

Address bits [9:0] will be used to access the location of the RAM, but the upper bits of the address are ignored.

pcim_ AXI4

The cl_sh_pcim_ AXI4 bus is driven by Automatic Test Generator (ATG) and connected to cl_pcim_mstr.sv. It can be used to read/write from the host memory.

irq/ack

cl_int_slv.sv provides an example for generating the IRQ requests and checks if ACK has been received.

Virtual JTAG

3 ILA cores are integrated, one to monitoring the sh_c_dma_pcis bus, one to monitor the AXI4 signals on DDR and the third to monitor the AXI4 signals on HBM. An example usage is provided in cl_ila.sv. An example usage for Xilinx VIO is provided in cl_vio.sv

Clocks

CL_DRAM_HBM_DMA uses the main clk_main_a0. It’s frequency is set in cl_timing_user.xdc

Software

DMA accesses rely on the XDMA driver

The DRAM_HBM DMA example includes runtime software to demonstrate working DMA accesses. The runtime example is located in the runtime directory

There are three example tests in cl_dram_hbm_dma example.

test_dram_dma.c

This test runs a software test with data transfer with DDR enabled.

test_hbm_dma.c

This test runs a software test with data transfer with HBM enabled.

test_dram_hbm_dma.c

This test runs a software test with data transfer with both DDR and HBM enabled

Compile and run instructions

cd ${CL_DIR}/software/runtime
make all
sudo ./test_dram_hbm_dma

test_dram_hbm_dma_hwsw_cosim.c

This test runs a software test with HW/SW co-simulation enabled with both DDR and HBM enabled.

Compile and run instructions

cd ${CL_DIR}/software/runtime
make TEST=test_dram_hbm_dma_hwsw_cosim
sudo ./test_dram_hbm_dma_hwsw_cosim

The test can be simulated with XSIM as follows.

cd ${CL_DIR}/verif/scripts
make C_TEST=test_dram_hbm_dma_hwsw_cosim