2024 ICCAD

HLSPilot: LLM-based High-Level Synthesis

Author: Chenwei Xiong, Cheng Liu, Huawei Li, Xiaowei Li

Affiliation: SKLP, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Dept. of Computer Science, University of Chinese Academy of Sciences, Beijing, China

Abstract:

Large language models (LLMs) have catalyzed anupsurge in automatic code generation, garnering significant at-tention for register transfer level (RTL) code generation. Despit ethe potential of RTL code generation with natural language,it remains error-prone and limited to relatively small modulesbecause of the substantial semantic gap between natural languageexpressions and hardware design intent. In response to the limitations, we propose a methodology that reduces the semantic gaps by utilizing C/C++ for generating hardware designs via High-Level Synthesis (HLS) tools. Basically, we build a set of C-to-HLS optimization strategies catering to various code patterns, such as nested loops and local arrays. Then, we apply these strategies to sequential C/C++ code through in-context learning, which provides the LLMs with exemplary C/C++ to HLS prompts. Withthis approach, HLS designs can be generated  effectively.  Since LLMs still face in determining the optimized pragma param-eters precisely, we have a design space exploration (DSE) toolintegrated for pragma parameter tuning. Furthermore, we alsoemploy profiling tools to pinpoint the performance bottlenecks within a program and selectively convert bottleneck componentsto HLS code for hardware acceleration. By combining the LLM-based  profiling, C/C++ to HLS translation, and DSE, we haveestablished HLSPilot—the first LLM-enabled high-level synthesisframework, which can fully automate the high-level application acceleration on hybrid CPU-FPGA architectures. According to our experiments on real-world application benchmarks, HLSPi-lot achieve comparable performance in general and can evenoutperform manually crafted counterparts, thereby underscoringthe substantial promise of LLM-assisted hardware designs.

 

 

 

 

2023 ICCAD

Robust GNN-Based Representation Learning for HLS.

Author:Atefeh Sohrabizadeh, Yunsheng Bai, Yizhou Sun, Jason Cong

Affiliation:Computer Science Department, University of California - Los Angeles, USA

Abstract:

The efficient and timely optimization of microarchitecture for a target application is hindered by the long evaluation runtime of a design candidate, creating a serious burden. To tackle this problem, researchers have started using learning algorithms such as graph neural networks (GNNs) to accelerate the process by developing a surrogate of the target tool. However, challenges arise when developing such modes for HLS tools due to the program's long dependency range and deeply coupled input program and transformations (i.e., pragmas). To address them, in this paper, we present HARP (HierarchicalAugmentation forRepresentation withPragma optimization) with a novel hierarchical graph representation of the HLS design by introducing auxiliary nodes to include high-level hierarchical information about the design. Additionally, HARP decouples the representation of the program and its transformations and includes a neural pragma transformer (NPT) approach to facilitate a more systematic treatment of this process. Our proposed graph representation and mode architecture of HARP not only enhance the performance of the mode and design space exploration based on it but also improve the mode's transfer learning capability, enabling easier adaptation to new environments11All materials available at https://github.com/UCLA-VAST/HARP.

 

 

 

 

2022 HPCA

ScaleHLS: A New Scalable High-Level Synthesis Framework on Multi-Level Intermediate Representation.

Author:Hanchen Ye, Cong Hao, Jianyi Cheng, Hyunmin Jeong, Jack Huang, Stephen Neuendorffer, Deming Chen

Affiliation:Georgia Institute of Technology; Imperial College London; Xilinx Inc; University of Illinois at Urbana-Champaign

Abstract:

High-level synthesis (HLS) has been widely adopted as it significantly improves the hardware design productivity and enables efficient design space exploration (DSE). Existing HLS tools are built using compiler infrastructures largely based on a single-level abstraction, such as LLVM. How-ever, as HLS designs typically come with intrinsic structural or functional hierarchies, different HLS optimization problems are often better solved with different levels of abstractions. This paper proposes ScaleHLS1, a new scalable and customizable HLS framework, on top of a multi-level compiler infrastructure called MLIR. ScaleHLS represents HLS designs at multiple representation levels and provides an HLS-dedicated analysis and transform library to solve the optimization problems at the suitable levels. Using this library, we provide a DSE engine to generate optimized HLS designs automatically. In addition, we develop an HLS C front-end and a C/C++ emission back-end to translate HLS designs into/from MLIR for enabling an end-to-end compilation flow. Experimental results show that, comparing to the baseline designs without manual directives insertion and code-rewriting, that are only optimized by Xilinx Vivado HLS, ScaleHLS improves the performances with amazing quality-of-results – up to 768.1× better on computation kernel level programs and up to 3825.0× better on neural network modes.

 

 

 

 

2022 DATE

PowerGear: Early-Stage Power Estimation in FPGA HLS via Heterogeneous Edge-Centric GNNs

Author:Zhe Lin, Zike Yuan, Jieru Zhao, Wei Zhang, Hui Wang, and Yonghong Tian.

Affiliation:Peng Cheng Laboratory, China The University of Auckland, New Zealand Shanghai Jiao Tong University, China The Hong Kong University of Science and Technology, Hong Kong, China Peking University, China

Abstract:

Power estimation is the basis of many hardware optimization strategies. However, it is still challenging to offer accurate power estimation at an early stage such as high-level synthesis (HLS). In this paper, we propose PowerGear, a graph-learning-assisted power estimation approach for FPGA HLS, which features high accuracy, efficiency and transferability. PowerGear comprises two main components: a graph construction flow and a customized graph neural network (GNN) mode. Specifically, in the graph construction flow, we introduce buffer insertion, datapath merging, graph trimming and feature annotation techniques to transform HLS designs into graph-structured data, which encode both intra-operation micro-architectures and inter-operation interconnects annotated with switching activities. Furthermore, we propose a novel power-aware heterogeneous edge-centric GNN mode which effectively learns heterogeneous edge semantics and structural properties of the constructed graphs via edge-centric neighborhood aggregation, and fits the formulation of dynamic power. Compared with on-board measurement, PowerGear estimates total and dynamic power for new HLS designs with errors of 3.60% and 8.81%, respectively, which outperforms the prior arts in research and the commercial product Vivado. In addition, PowerGear demonstrates a speedup of 4× over Vivado power estimator. Finally, we present a case study in which PowerGear is exploited to facilitate design space exploration for FPGA HLS, leading to a performance gain of up to 11.2%, compared with methods using state-of-the-art predictive modes.

 

 

 

 

2021 DATE

Correlated multi-objective multi-fidelity optimization for HLS directives design.

Author:Qi Sun, Tinghuan Chen, Siting Liu, Jianli Chen, Hao Yu, and Bei Yu.

Affiliation:The Chinese University of Hong Kong Synopsys Fudan University SusTech

Abstract:

High-level synthesis (HLS) tools have gained great attention in recent years because it emancipates engineers from the complicated and heavy hardware description language writing and facilitates the implementations of modern applications (e.g., deep learning modes) on Field-programmable Gate Array (FPGA), by using high-level languages and HLS directives. However, finding good HLS directives is challenging, due to the time-consuming design processes, the balances among different design objectives, and the diverse fidelities (accuracies of data) of the performance values between the consecutive FPGA design stages.To find good HLS directives, a novel automatic optimization algorithm is proposed to explore the Pareto designs of the multiple objectives while making full use of the data with different fidelities from different FPGA design stages. Firstly, a non-linear Gaussian process (GP) is proposed to mode the relationships among the different FPGA design stages. Secondly, for the first time, the GP mode is enhanced as correlated GP (CGP) by considering the correlations between the multiple design objectives, to find better Pareto designs. Furthermore, we extend our mode to be a deep version deep CGP (DCGP) by using the deep neural network to improve the kernel functions in Gaussian process modes, to improve the characterization capability of the modes, and learn better feature representations. We test our design method on some public benchmarks (including general matrix multiplication and sparse matrix-vector multiplication) and deep learning-based object detection mode iSmart2 on FPGA. Experimental results show that our methods outperform the baselines significantly and facilitate the deep learning designs on FPGA.

 

 

 

 

2020 DAC

Machine Leaming to Set Meta-Heuristic Specific Parameters for High-Level Synthesis Design Space Exploration

Author:Zi Wang and Benjamin Carrión Schäfer.

Affiliation:Department of Electrical and Computer Engineering The University of Texas at Dallas

Abstract:

Raising the level of VLSI design abstraction to C leads to many advantages compared to the use of low-level Hardware Description Languages (HDLs). One key advantage is that it allows the generation of micro-architectures with different trade-offs by simply setting unique combinations of synthesis options. Because the number of these synthesis options is typically very large, exhaustive enumerations are not possible. Hence, heuristics are required. Meta-heuristics like Simulated Annealing (SA), Genetic Algorithm (GA) and Ant Colony Optimizations (ACO) have shown to lead to good results for these types of multi-objective optimization problems. The main problem with these meta-heuristics is that they are very sensitive to their hyper-parameter settings, e.g. in the GA case, the mutation and crossover rate and the number of parents pairs. To address this, in this work we present a machine learning based approach to automatically set the search parameters for these three meta-heuristics such that a new unseen behavioral description given in C can be effectively explored. Moreover, we present an exploration technique that combines the SA, GA and ACO together and show that our proposed exploration method outperforms a single meta-heuristic.

 

 

 

 

2020 ICCAD

Accurate Operation Delay Prediction for FPGA HLS Using Graph Neural Networks

Author:Ecenur Ustun, Chenhui Deng, Debjit Pal, Zhijing Li, and Zhiru Zhang

Affiliation:School of Electrical and Computer Engineering, Cornell University, Ithaca,NY

Abstract:

Modern heterogeneous FPGA architectures incorporate a variety of hardened blocks for boosting the performance of arithmetic-intensive designs, such as DSP blocks and carry blocks. Since hardened blocks can be configured in different ways, a variety of datapath patterns can be mapped into these blocks. We observe that existing high-level synthesis (HLS) tools often fail to capture some of the operation mapping patterns, leading to limited estimation accuracy in terms of resource usage and delay. To address this deficiency, we propose to exploit graph neural networks (GNN) to automatically learn operation mapping patterns. We apply GNN modes that are trained on microbenchmarks directly to realistic designs through inductive learning. Experimental results show that our approach can effectively infer various valid mapping patterns on both microbenchmarks and realistic designs. Furthermore, the proposed framework is exploited to improve the accuracy of delay estimation in HLS.

 

 

 

 

2019 FPL

Pyramid: Machine Learning Framework to Estimate the Optimal Timing and Resource Usage of a High-Level Synthesis Design

Author:Hosein Mohammadi Makrani, Farnoud Farahmand, Hossein Sayadi, Sara Bondi,Sai Manoj Pudukotai Dinakarrao, Houman Homayoun, and Setareh Rafatirad

Affiliation:George Mason University, California State University Long Beach

Abstract:

The emergence of High-Level Synthesis (HLS) tools shifted the paradigm of hardware design by making the process of mapping high-level programming languages to hardware design such as C to VHDL/Verilog feasible. HLS tools offer a plethora of techniques to optimize designs for both area and performance, but resource usage and timing reports of HLS tools mostly deviate from the post-implementation results. In addition, to evaluate a hardware design performance, it is critical to determine the maximum achievable clock frequency. Obtaining such information using static timing analysis provided by CAD tools is difficult, due to the multitude of tool options. Moreover, a binary search to find the maximum frequency is tedious, time-consuming, and often does not obtain the optimal result. To address these challenges, we propose a framework, called Pyramid, that uses machine learning to accurately estimate the optimal performance and resource utilization of an HLS design. For this purpose, we first create a database of C-to- FPGA results from a diverse set of benchmarks. To find the achievable maximum clock frequency, we use Minerva, which is an automated hardware optimization tool. Minerva determines the close-to-optimal settings of tools, using static timing analysis and a heuristic algorithm, and targets either optimal throughput or throughput-to-area. Pyramid uses the database to train an ensemble machine learning mode to map the HLS-reported features to the results of Minerva. To this end, Pyramid recalibrates the results of HLS to bridge the accuracy gap, and enable developers to estimate the throughput or throughputto- area of hardware design with more than 95% accuracy and alleviates the need to perform actual implementation for estimation.

 

 

 

 

2019 DATE

Machine learning based routing congestion prediction in fpga high-level synthesis.

Author:Jieru Zhao, Tingyuan Liang, Sharad Sinha, and Wei Zhang.

Affiliation: Department of ECE, Hong Kong University of Science and Technology Department of CSE, Indian Institute of Technology Goa

Abstract:

High-level synthesis (HLS) shortens the development time of hardware designs and enables faster design space exploration at a higher abstraction level. Optimization of complex applications in HLS is challenging due to the effects of implementation issues such as routing congestion. Routing congestion estimation is absent or inaccurate in existing HLS design methods and tools. Early and accurate congestion estimation is of great benefit to guide the optimization in HLS and improve the efficiency of implementation. However, routability, a serious concern in FPGA designs, has been difficult to evaluate in HLS without analyzing post-implementation details after Place and Route. To this end, we propose a novel method to predict routing congestion in HLS using machine learning and map the expected congested regions in the design to the relevant high-level source code. This is greatly beneficial in early identification of routability oriented bottlenecks in the high-level source code without running time-consuming register-transfer level (RTL) implementation flow. Experiments demonstrate that our approach accurately estimates vertical and horizontal routing congestion with errors of 6.71% and 10.05% respectively. By presenting Face Detection application as a case study, we show that by discovering the bottlenecks in high-level source code, routing congestion can be easily and quickly resolved compared to the efforts involved in RTL level implementation and design feedback.

 

 

 

 

2018 FCCM

Fast and Accurate Estimation of Quality of Results in High-Level Synthesis with Machine Learning

Author: Steve Dai, Yuan Zhou, Hang Zhang, Ecenur Ustun, Evangeline F. Y. Young, and Zhiru Zhang.

Affiliation: School of Electrical and Computer Engineering, Cornell University, Ithaca, NY, USA Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, NT, Hong Kong

Abstract:

While high-level synthesis (HLS) offers sophisticated techniques to optimize designs for area and performance, HLS-estimated resource usage and timing often deviate significantly from actual quality of results (QoR) achieved by FPGA-targeted designs. Inaccurate HLS estimates prevent designers from performing meaningful design space exploration without resorting to the time-consuming downstream implementation process. To address this challenge, we first build a large collection of C-to-FPGA results from a diverse set of realistic HLS applications and identify relevant features from HLS reports for estimating post-implementation metrics. We then leverage these features and data to train and compare a number of promising machine learning modes to effectively and efficiently bridge the accuracy gap. Experiments demonstrate that our proposed approach is able to dramatically reduce the estimation errors for different families of FPGA devices. By extracting domain-specific insights from our experiments, we explore the implications of our modes and predictive influence of various features for enabling fast and accurate QoR estimation in HLS. We have released our dataset to springboard future efforts in this area.

 

 

 

 

2015 DATE

Dynamic power and performance back-annotation for fast and accurate functional hardware simulation

Author:Dongwook Lee, Lizy K John, and Andreas Gerstlauer

Affiliation:Department of Electrical Computer Engineering The University of Texas Austin

Abstract:

Virtual platform prototypes are widely used for early design space exploration at the system level. There is, however, a lack of accurate and fast power and performance modes of hardware components at such high levels of abstraction. In this paper, we present an approach that extends fast functional hardware modes with the ability to produce detailed, cycle-level timing and power estimates. Our approach is based on back-annotating behavioral hardware descriptions with a dynamic power and performance mode that allows capturing cycle-accurate and data-dependent activity without a significant loss in simulation speed. By integrating with existing high-level synthesis (HLS) flows, back-annotation is fully automated for custom hardware synthesized by HLS. We further leverage state-of-the-art machine learning techniques to synthesize abstract power modes, where we introduce a structural decomposition technique to reduce mode complexities and increase estimation accuracy. We have applied our back-annotation approach to several industrial-strength design examples under various architecture configurations. Results show that our modes predict average power consumption to within 1% and cycle-by-cycle power dissipation to within 10% of a commercial gate-level power estimation tool, all while running several orders of magnitude faster.

 

 

 

 

2014 ESLsyn

Machine-learning based simulated annealer method for high level synthesis design space exploration

Author: Anushree Mahapatra and Benjamin Carrion Schafer.

Affiliation: Department of Electronics and Information Engineering The Hong Kong Polytechnic University

Abstract:

This paper presents a modified technique of simulated annealing, based on machine learning for effective multi-objective design space exploration in High Level Synthesis (HLS). In this work, we present a more efficient simulated annealing called Fast Simulated Annealer (FSA) which is based on a decision tree machine learning algorithm. Our proposed exploration method makes use of a standard simulated annealer to generate a training set, and uses this set to implement a decision tree. Based on the outcome of the decision tree, the algorithm fixes the synthesis directives (pragmas) which contribute to minimizing/maximizing one of the cost function objectives and continues the annealing procedure using the decision tree. Experimental results show that the average execution time of our proposed tree based simulated annealing algorithm is on average 36% faster than the standard annealer and can be up to 48% faster, while leading to similar results.

 

 

 

 

2013 DAC

On learning-based methods for design-space exploration with high-level synthesis

Author:Hung-Yi Liu and Luca P. Carloni

Affiliation:Department of Computer Science, Columbia University

Abstract:

This paper makes several contributions to address the challenge of supervising HLS tools for design space exploration (DSE). We present a study on the application of learning-based methods for the DSE problem, and propose a learning model for HLS that is superior to the best models described in the literature. In order to speedup the convergence of the DSE process, we leverage transductive experimental design, a technique that we introduce for the first time to the CAD community. Finally, we consider a practical variant of the DSE problem, and present a solution based on randomized selection with strong theory guarantee.

 

 

AI+EDA

High level synthesis