Running on actual hardware
Creating a SNAP FPGA image
Once you have successfully tested your AFU in the simulator, it is time to actually deploy it to the FPGA.
To do that — assuming you've executed
make config before — just run
make image inside the hardware directory. Synthesizing the image is a memory intensive process and can take any time from 50 minutes to a couple of hours depending on the complexity of your action.
TIMING_LABLIMITenvironment variable to a large negative value (default: -250 ps) before starting the build. This avoids build failures at a late stage in the process, but may produce images that are not suited for production use. However, you might have success trying them out in a lab environment.
Once the build process has successfully finished, you can find the resulting bitstream files in the
build/Image folder under the
hardware directory. The version ending in *.bit can be flashed using the JTAG programmer; the *.bin file is meant for the capi-flash-script (see the below sections).
Installing the FPGA in the POWER machine
We are not responsible for damage you do to your own hardware.
- Find a CAPI capable PCIe port which you can use to install the FPGA. For the S824L OpenPOWER machine we used, this information can be obtained in the 'Internal I/O subsystem' section of its technical overview document (page 59 of the PDF in this case).
- Install the FPGA
- Attach the JTAG programmer to the FPGA and connect its USB cable to an Intel-based machine that will be used to initialize the FPGA with Xilinx' hw_server. Its activity LED will only come on after the Xilinx software has successfully established a connection to the card, so don't worry if it stays off once the systems have booted.
Initializing the FPGA using a JTAG Programmer
The official documentation on this topic can be found here.
After the FPGA was installed, usually the Linux OS on the POWER machine will not detect it as a CAPI-enabled device. This is because the image that comes pre-installed on the factory partition of the FPGA doesn't support CAPI; instead a suitable image needs to be flashed onto the user partition by using the external USB programmer.
/dev/cxl, new images can be flashed directly from the POWER host using the CAPI tooling provided by IBM. This is described in the next section.
If you would like to set up a headless Vivado installation on the Intel machine connected to the JTAG programmer, you can instead choose to install a lighter weight version of Vivado that includes the hardware server tool
In any case, the process accessing the programmer (either Vivado or
hw_server) might need to run with root privileges. Open Vivado's Hardware Manager and 'open' the target device either on the local server or remotely. Once it has successfully connected to the FPGA, the programmer's activity LED should come on.
You will now need a device image (*.bit) that includes at least the PSL component. One way to obtain such an image is to run
make image for one of the SNAP examples or your own action.
snap_settings, you might notice the
FACTORY_IMAGEswitch. Because you don't want to overwrite the FPGA's factory partition (but its user partition) you should leave it disabled.
Because the user partition's contents will be cleared on each power cycle of the POWER machine if the card was not yet detected as a CAPI device, a certain procedure is required to initialize it: The image needs to be flashed after the system was powered on, but before the OS performs the PCI Express walk.
Therefore, set up a ping to the host OS and trigger a reboot (a normal OS reboot should suffice, although sometimes an ipmi power cycle is necessary). Once the ping is lost, start the programming process in Vivado. There should be enough time to fully program the card before the host OS has completed booting.
Programming the FPGA directly from the POWER Host
Once the FPGA is detected as a CAPI-device it is a lot easier and faster to flash new images onto it. The JTAG programmer and the programming server are no longer needed. You can obtain the necessary tool from GitHub:
~ $ git clone https://github.com/ibm-capi/capi-utils ~/capi-utils $ cd capi-utils ~/capi-utils $ make ~/capi-utils $ make install
Afterwards, start the tool like this and follow the instructions:
~ $ capi-flash-script my_image.bin
Testing the breadth-first search action on hardware
Assuming you have built the bitstream (using
make image) and set up the FPGA as described above, the
hls_bfs action should by now be programmed onto the card.
You will now need a version of SNAP on the POWER server:
Because now the 'real' version of libcxl is needed (compared to the 'mock' version used earlier by the PSL Simulation Engine), install it (e.g. on Ubuntu):
~ $ apt install libcxl-dev
Clone SNAP, build the actions and tooling
~ $ git clone https://github.com/open-power/snap ~/snap/software $ make ~/snap/actions $ make
Before you can use the action for the first time, it needs to be detected by SNAP. Use the
snap_mainttool to trigger this:
~/snap/software/tools $ ./snap_maint -v
Now you can execute the breadth-first search on hardware!
~/snap/actions/hls_bfs/sw $ ./snap_bfs -v
If you want to compare the accelerated implementation's performance against the software version, use the
SNAP_CONFIGenvironment variable to enable the software mode of the
~/snap/actions/hls_bfs/sw $ SNAP_CONFIG=0x1 ./snap_bfs -v