AWS FPGA Hardware Development Kit (HDK)#
Table of Contents#
HDK Overview#
The HDK design flow enables developers to create RTL-based accelerator designs for F2 instances using AMD Vivado. HDK designs must be integrated with Small Shell, which does not include a built-in Direct Memory Access (DMA) engine and offers full resources in the top Super Logic Region (SLR) of the FPGA to developers.
Getting Started#
Quick Start HW and SW Example: Host-to-FPGA Communication via the OCL Interface#
The test_aws_clk_gen.c software runtime example utilizes the OCL AXI interface to program the AWS Clock Generation IP within the CL_MEM_PERF AFI.
The example can be run by following the steps in the following documentation references:
Build and ingest the CL_MEM_PERF example by following the Build Accelerator AFI using HDK Design Flow section below
Load the AGFI generated by the
create-fpga-imagecommandFollow the CL_MEM_PERF software runtime compilation instructions and execute
./test_aws_clk_gen
Build Accelerator AFI using HDK Design Flow#
This section provides a step-by-step guide to build an F2 AFI using the HDK design flow. The flow starts with an existing Customer Logic (CL) example design. Steps 1 through 3 demonstrate how to set up the HDK development environment. Steps 4 through 5 show the commands used to generate CL Design Checkpoint (DCP) files and other build artifacts. Steps 6 and 7 demonstrate how to submit the DCP file to generate an AFI for use on F2 instances.
Step 1. Setup Development Environment#
Developers can either use the AWS-provided developer AMI for F2 or their on-premise development environment for this demo.
Step 2. Clone Developer Kit Repository#
git clone https://github.com/aws/aws-fpga.git
Step 3. Setup Environment for HDK Design Flow#
The hdk_setup.sh script needs to be sourced for each terminal and takes ~2 minutes to complete when first run.
cd aws-fpga
source hdk_setup.sh
After the setup is done successfully, you should see
AWS HDK setup PASSED. Sourcing hdk_setup.sh does the following:
Verifies a supported Vivado installation
Sets up all environment variables required by the HDK design flow
Generates IP simulation models for CL examples
Downloads all required shell files from a shared S3 bucket
Step 4. Build CL Design Check Point (DCP)#
After the HDK design environment is set up, you are ready to build a design example. Run the following commands to build CL DCP files in Vivado. This tutorial uses the cl_sde example. The same steps can be used for any other CL examples.
cd hdk/cl/examples/cl_sde
export CL_DIR=$(pwd)
cd build/scripts
./aws_build_dcp_from_cl.py -c cl_sde
The Shell supplies two base clocks to the CL: a 250MHz clk_main_a0
clock and a 100MHz clk_hbm_ref clock. However, the CL can run at
higher frequencies using locally generated clocks. F2 Developer Kit
offers an AWS Clock Generation (AWS_CLK_GEN)
IP that you can leverage in your design
to generate CL clocks with frequencies specified in the Clock Recipes
User Guide.
Run the command below to build a DCP with desired clock recipes:
cd hdk/cl/examples/cl_mem_perf
export CL_DIR=$(pwd)
cd build/scripts
./aws_build_dcp_from_cl.py -c cl_mem_perf --aws_clk_gen --clock_recipe_a A1 --clock_recipe_b B2 --clock_recipe_c C0 --clock_recipe_hbm H2
NOTE: The cl_sde example does not contain the AWS_CLK_GEN component. This command uses the cl_mem_perf example to demonstrate the AWS_CLK_GEN usage.
A few more notes on aws_build_dcp_from_cl.py:
Use
--mode small_shelloption to build CL designs with Small Shell.Use
--cl <CL name>option to build a different CL design. This is default tocl_dram_hbm_dma.Use
--aws_clk_genoption to annotate the use of AWS clock generation block and customer clock recipes.Use
--no-encryptoption to disable encryption of the design’s source code and DCPs. Encryption, enabled by default, may impede debugging as errors from encrypted envelope do not provide meaningful information.The script also allows developers to pass different Vivado directives as shown below:
--place <directive>: Default toSSI_SpreadLogic_highplacement strategy. Please refer to Vivado User Guide for supported directives.--phy_opt <directive>: Default toAggressiveExplorephysical optimization strategy. Please refer to Vivado User Guide for supported directives--route <directive>: Default toAggressiveExplorerouting strategy. Please refer to Vivado User Guide for supported directives.
Run
./aws_build_dcp_from_cl.py --helpto see more build options available in building CL designs.
Step 5. Explore Build Artifacts#
While Vivado is running, a build log file
YYYYY_MM_DD-HHMMSS.vivado.log will be created in
$CL_DIR/build/scripts to track the build’s progress. DCP build times
will vary based on the design size and complexity. The examples in the
development kit take between 30 to 90 minutes to build. After the design
is finished building, the following information will be shown at the
bottom of the log file:
tail <YYYYY_MM_DD-HHMMSS.vivado.log>
...
AWS FPGA: (16:05:44): Finished building design checkpoints for customer design cl_sde
...
INFO: [Common 17-206] Exiting Vivado at ...
Generated post-route DCP and design manifest files are archived into a
tarball file <YYYY_MM_DD-HHMMSS>.Developer_CL.tar and saved in the
$CL_DIR/build/checkpoints/ directory. All design timing reports are
saved in the $CL_DIR/build/reports/ directory.
⚠️ If Vivado cannot achieve timing closure for the design, the
post-route DCP file name will be marked with .VIOLATED as an
indicator. Developers need to refer to the DCPs and timing reports for
detailed timing failures.
⚠️ The build process will generate a DCP tarball file regardless of the design’s timing closure state. However, in case of a DCP with timing failures, the design’s functionality is no longer guaranteed. Therefore, the AFI created using this DCP should be used for testing purpose ONLY. The following warning is shown in this case:
!!! WARNING: Detected a post-route DCP with timing failure for AFI creation. Design functionalities are NOT guaranteed.
Step 6. Submit Generated DCP for AFI Creation#
Once developers have built their DCP, they may submit their FPGA design for AFI creation.
Before doing so, an IAM role capable of S3 and EC2 access must be attached to the instance where the DCP will be submitted from. See the Setting Up IAM Roles for Use with the AWS EC2 FPGA Development Kit guide for the required permissions.
Execute the create_afi.py utility from anywhere within the
aws-fpgarepository:$AWS_FPGA_REPO_DIR/hdk/scripts/create_afi.pyMay require a Python virtual environment which can be started with:
source $AWS_FPGA_REPO_DIR/hdk/scripts/start_venv.sh
OR: Upload the DCP to S3 and specify all fields to the
aws ec2 create-fpga-imageutility according to instructions in Manual AFI Creation
NOTE: Additional information about AFI’s and surrounding tools can be found in the Amazon FPGA Images (AFIs) Guide
Step 7. Load Accelerator AFI on F2 Instance#
Now that your AFI is available, it can be tested on an F2 instance. The instance can be launched using any preferred AMI, private or public, from the AWS EC2 AMI Catalog. AWS recommends using AMIs with similar OS and kernel versions to those of our developer AMIs.
Now you need to install the FPGA Management tools by sourcing the
sdk_setup.sh script:
cd aws-fpga
source sdk_setup.sh
Once the tools are installed, you can load the AFI onto a slot on the F2 instance. It is a good practice to clear any previously loaded AFI from that slot:
$ sudo fpga-clear-local-image -S 0
AFI 0 No AFI cleared 1 ok 0 0x10212415
AFIDEVICE 0 0x1d0f 0x9048 0000:00:1e.0
You can also invoke the fpga-describe-local-image command to learn
which AFI, if any, is loaded onto a particular slot. For example, if the
slot is cleared (slot 0 in this example), you should get an output
similar to the following:
$ sudo fpga-describe-local-image -S 0 -H
Type FpgaImageSlot FpgaImageId StatusName StatusCode ErrorName ErrorCode ShVersion
AFI 0 No AFI cleared 1 ok 0 0x10212415
Type FpgaImageSlot VendorId DeviceId DBDF
AFIDEVICE 0 0x1d0f 0x9048 0000:00:1e.0
If fpga-describe-local-image API call returns a status busy, the
FPGA is still performing the previous operation in the background.
Please wait until the status is cleared as above.
Now, let’s load your AFI onto the FPGA on slot 0:
$ sudo fpga-load-local-image -S 0 -I agfi-0925b211f5a81b071
AFI 0 agfi-0925b211f5a81b071 loaded 0 ok 0 0x10212415
AFIDEVICE 0 0x1d0f 0x9048 0000:00:1e.0
NOTE: The FPGA Management tools use the AGFI ID (not the AFI ID).
Now, you can verify that the AFI was loaded properly. The output shows
the FPGA in the loaded state after the FPGA image “load” operation.
The -R option performs a PCI device remove and rescan in order to
expose the unique AFI Vendor and Device Id.
Type FpgaImageSlot FpgaImageId StatusName StatusCode ErrorName ErrorCode ShVersion
AFI 0 agfi-0925b211f5a81b071 loaded 0 ok 0 0x10212415
Type FpgaImageSlot VendorId DeviceId DBDF
AFIDEVICE 0 0x1d0f 0x9048 0000:00:1e.0
Step 8. Validate your AFI using Example Runtime Software#
Each CL example includes a runtime software binary, located in the
$CL_DIR/software/runtime/ subdirectory. Executing the software
requires the corresponding AFI to be loaded onto the FPGA. This step
demonstrates runtime software execution using the CL_SDE example.
# Ensure the $CL_DIR is pointing to the CL_SDE example directory
$ cd $CL_DIR/software/runtime/
$ make
...
Logical Core 1 (socket 0) forwards packets on 1 streams:
RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
io packet forwarding packets/burst=32
nb forwarding cores=1 - nb forwarding ports=1
port 0: RX queue number: 1 Tx queue number: 1
Rx offloads=0x0 Tx offloads=0x0
RX queue: 0
RX desc=0 - RX free threshold=0
RX threshold registers: pthresh=0 hthresh=0 wthresh=0
RX Offloads=0x0
TX queue: 0
TX desc=0 - TX free threshold=0
TX threshold registers: pthresh=0 hthresh=0 wthresh=0
TX offloads=0x0 - TX RS bit threshold=0
Press enter to exit
Telling cores to stop...
Waiting for lcores to finish...
---------------------- Forward statistics for port 0 ----------------------
RX-packets: 10771136 RX-dropped: 0 RX-total: 10771136
TX-packets: 8160479 TX-dropped: 2610689 TX-total: 10771168
----------------------------------------------------------------------------
+++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++
RX-packets: 10771136 RX-dropped: 0 RX-total: 10771136
TX-packets: 8160479 TX-dropped: 2610689 TX-total: 10771168
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Done.
Stopping port 0...
Stopping ports...
Done
Shutting down port 0...
Closing ports...
Done
Bye...
AFI PCIe IDs#
Customers can customize the PCIe IDs for generated AFIs, including
Vendor ID (VID), Device ID (DID), Subsystem Vendor ID (SVID) and
Subsystem Device ID (SSID), to facilitate the proper driver binding.
These PCIe IDs are required for the AFI generation process and must be
defined in the
cl_id_defines.vh file
under each example. Here is an example in the CL_SDE example:
// CL_SH_ID0
// - PCIe Vendor/Device ID Values
// 31:16: PCIe Device ID
// 15: 0: PCIe Vendor ID
// - A Vendor ID value of 0x8086 is not valid.
// - If using a Vendor ID value of 0x1D0F (Amazon) then valid
// values for Device ID's are in the range of 0xF000 - 0xF0FF.
// - A Vendor/Device ID of 0 (zero) is not valid.
`define CL_SH_ID0 32'hF002_1D0F
// CL_SH_ID1
// - PCIe Subsystem/Subsystem Vendor ID Values
// 31:16: PCIe Subsystem ID
// 15: 0: PCIe Subsystem Vendor ID
// - A PCIe Subsystem/Subsystem Vendor ID of 0 (zero) is not valid
`define CL_SH_ID1 32'h1D51_FEDC
When a DCP tarball file gets generated, the IDs are included in the manifest file within the tarball:
pci_device_id=0xF002
pci_vendor_id=0x1D0F
pci_subsystem_id=0x1D51
pci_subsystem_vendor_id=0xFEDC
CL Examples#
All examples have the following features:
Simulation model, tests, and scripts
Xilinx Vivado implementation scripts for generating bitstream
cl_sde#
The cl_sde example implements the Streaming Data Engine (SDE) IP block into FPGA custom logic to demonstrate the Virtual Ethernet Application.
See cl_sde for more information
cl_dram_hbm_dma#
The cl_dram_hbm_dma example demonstrates the use and connectivity for many of the Shell/CL interfaces and functionality. The OCL (AXI-Lite) interface is used for general configuration, the PCIS (AXI4) interface is used for data traffic from the host to DDR and HBM DRAM channels in the CL (initiated by the host), and the PCIM (AXI4) interface is used for data traffic between the host and the CL (initiated by the CL).
See cl_dram_hbm_dma for more information
cl_mem_perf#
The cl_mem_perf is a reference design for F2 where the objective is to demonstrate fine tuned data paths to HBM and DDR to achieve maximum throughput to the memories. The example also demonstrates datapath connectivity between Host, AWS Shell, Custom Logic (CL) region in the FPGA, HBM and DDR DIMM on the FPGA card.
See cl_mem_perf for more information
CL_TEMPLATE to Create your own design#
CL_TEMPLATE is targeted to help customers create a new CustomLogic example. Users can update the design, verification, and build flow to meet their needs without having to tear down a separate example. We recommend going through other CL examples before creating a new CL.
All of the design files and tests can be compiled, simulated, built, and deployed on hardware (without any modifications). Users can add/update design files, add new verification tests, and add new build directives to meet their needs.
A full guide on creating your own CL design can be found in CL_TEMPLATE
To create a new CL example:
export NEW_CL_NAME='New CL Name'
cd hdk/cl/examples
./create_new_cl.py --new_cl_name ${NEW_CL_NAME}
CL Example Hierarchy#
The following sections describe common functionality across all CL examples. CL_TEMPLATE can be used as a reference for what features are available in all CL examples; as well as what’s required to verify, test, and build.
Design#
All CL examples store the design files under
/hdk/cl/examples/$CL_DIR/design/For example: /hdk/cl/examples/CL_TEMPLATE/design/
All IP designs available by default are stored in /hdk/common/ip/cl_ip/
More can be added from the Xilinx Vivado IP catalog
Verification#
All CL examples utilize infrastructure found under /hdk/common/verif/
Simulation libraries are generated under
/hdk/common/verif/ip_simulation_libraries/All examples should list out the
/hdk/cl/examples/$CL_DIR/verif/tests/andMakefile.testsFor example $AWS_FPGA_REPO_DIR/hdk/cl/examples/CL_TEMPLATE/verif/tests/
All HDK examples support a SH_DDR with 64GB access with an optional user controlled auto-precharge mode. Users can select the DDR access modes as follows:
export TEST_NAME=test_ddr
# To Run simulations with a 64 GB DDR DIMM
make TEST=${TEST_NAME} USE_64GB_DDR_DIMM=1
# To Run simulations with a 64 GB DDR DIMM and DDR core with user controlled auto-precharge mode
make TEST=${TEST_NAME} USE_AP_64GB_DDR_DIMM=1
NOTE: Please refer to Supported_DDR_Modes.md for details on supported DDR configurations.
After adding new design IPs, make sure to add the new simulation
COMMON_LIBLISTS in
$AWS-FPGA/hdk/common/verif/tb/scripts/Makefile.common.inc
⚠️ Required for XSIM and Questa simulations
Make sure to add the new simulation libraries to
COMMON_LIBLISTSin $AWS_FPGA_REPO_DIR/hdk/common/verif/tb/scripts/Makefile.common.incThis is required for XSIM and Questa simulations
These libraries can be found in $AWS_FPGA_REPO_DIR/hdk/common/ip/cl_ip/cl_ip.ip_user_files/sim_scripts followed by
"IP_NAME"/"SIMULATOR"/"IP_NAME".sh
After adding new IP’s to $AWS_FPGA_REPO_DIR/hdk/common/ip/ the simulation libraries need to be recompiled
Run
make regenerate_sim_libs <XSIM/VCS/QUESTA>=1
Software#
All software runtime code can be found under the software directory.
Build#
All CL examples utilize infrastructure found under $AWS_FPGA_REPO_DIR/hdk/common/shell_stable/build
Users can modify the following files to meet their build requirements:
synth_CL_NAME.tcl - top level script that reads design, IP, and constraint files
cl_synth_user.xdc - synthesis build constraints specific to that example
cl_timing_user.xdc - timing build constraints specific to that example
small_shell_cl_pnr_user.xdc - place and route constraints specific to that example’s small shell build
For more information on synth_CL_NAME.tcl see:
After adding new design IPs:
Make sure to add the new
.xcifiles to your synthesis TCL script
HDK Common Library#
This directory includes the shell versions, scripts, timing constraints and compile settings required during the AFI generation process.
Developers should not modify or remove these files.
/shell_stable#
The shell_stable contains all the IPs, constraints and scripts for each shell release.
/verif#
The verif directory includes reference verification modules to be used as Bus Functional Models (BFM) as the external interface to simulate the CL. The verification related files common to all the CL examples are located in this directory. It has models, include, scripts, tb directories.
The verif models directory includes simple models of the DRAM interface around the FPGA, shell, and card. You can also find Xilinx protocol checkers in this directory.
The verif scripts directory includes scripts needed to generate DDR models and other scripts needed for HDK setup.
The verif include directory includes sh_dpi_tasks.vh needed for DPI-C.
The verif tb directory includes top level test bench related files common for all the CL examples.
The verif ip_simulation_libraries directory is created during runtime and includes the simulation libraries and CL IP compilation for all supported simulators.
/ip#
The ip directory includes basic IP that is used by CL’s.
/lib#
The lib directory includes basic “library” elements that may be used by CL’s.
aws_clk_gen.sv - Generate clocks and resets to the CL design
aws_clk_regs.sv - Houses all the Control/Status Regs for AWS_CLK_GEN design
axi_clock_conv.sv - AXI-4 bus clock converter
axil_to_cfg_cnv.sv - Convert AXIL transaction into a simple CFG bus
axis_flop_fifo.sv - Flop based FIFO for AXI-Stream protocol
bram_1w1r.sv - BRAM (1 write/1 read port) RTL model.
bram_wr2.sv - BRAM (2 read/write ports) RTL model.
ccf_ctl.v - Clock crossing FIFO control block (pointers, address generation, etc…)
cdc_async_fifo.sv - Async FF-based FIFO for CDC
cdc_sync.sv - Single- or Multi-bit Synchronizer based on Xilinx XPM
flop_ccf.sv - Flop based clock crossing FIFO.
flop_fifo.sv - Flop based FIFO.
flop_fifo_in.sv - Flop based FIFO, where input is flopped by common flops (can be used for input signal registering).
ft_fifo.v - Flow through FIFO.
ft_fifo_p.v - Flow through FIFO to be used with pipelined RAM.
gray.inc - Gray code
hbm_wrapper.sv - Wrapper for HBM IP
interfaces.sv - Generic interfaces (AXI-4, AXI-L, etc…)
lib_pipe.sv - Pipeline block.
macros.svh - Instantiation macros (AXI-4, AXI-L, etc…)
mgt_acc_axl.sv - Used by AWS provided sh_ddr.sv
mgt_gen_axl.sv - Used by AWS provided sh_ddr.sv
ram_fifo_ft.sv - Ram based FIFO
rr_arb.sv - Round robin arbiter.
srl_fifo.sv - Shift register based fifo.
sync.v - Synchronizer
xpm_fifo.sv - Synchronous clock FIFO
Next Steps#
Review the cl_dram_hbm_dma and cl_sde examples
Run RTL Simulations on the example designs
Dive deep into Shell interface specifications and PCIe Memory map
Create your own designs/Port F1 designs to F2 systems
Try the Virtual JTAG XVC debug flow and understand the shell timeout behavior
(Optional) After creating your accelerator design, create your own runtime AMI
Additional HDK Documentation#
- AWS Shell Interface Specification
- AWS EC2 F2 Shell Errata
- Shell Floorplan Reference
- AXI Slave Timeouts (DMA_PCIS)
- AWS_CLK_GEN - CL Clock Generator
- F2 Clock Recipes User Guide
- AWS CLI FPGA Commands
- AWS FPGA PCIe Memory Map
- Supported DDR Configurations in sh_ddr.sv
- RTL Simulation Guide for HDK Design Flow
- Amazon FPGA Images (AFIs) Guide
- Listing Your AFI on AWS Marketplace
- AWS AFI Manifest File Specification
- Enabling on-premises development with Xilinx tools
- XDMA Driver Installation Instructions
- Virtual JTAG for Real-time FPGA Debug
- Vivado IP Integrator Setup
- AWS FPGA IP for IP Integrator Overview
- AWS GUI Workflow with Vivado IP Integrator Quick Start Examples
- HLx GUI Flows with Vivado IP Integrator