2024 TCAS I

DNN-Based Optimization to Significantly Speed Up and Increase the Accuracy of Electronic Circuit Design.

Author: Sayed Alireza Sajjadi, Sayed Alireza Sadrossadat, Ali Moftakharzadeh, Morteza Nabavi, Mohamad Sawan

Affiliation: Electrical engineering with the Electrical Engineering Department, Yazd University, Yazd Iran, Research and Development Group, Sidense Company, Ottawa

Abstract:

Efficient design and optimization of flip-flops can significantly affect overall circuit performance as they have many applications in digital systems which can impact the overall power consumption and timings of the emerging system on chips (SOCs). In this paper, modeling, design, and optimization of transmission gate-based master-slave positive-edge-triggered flip-flop (TGFF) in 16 nm complementary metal-oxide semiconductor (CMOS) is proposed. The proposed deep neural network (DNN)-based optimization method first generates an accurate model for different performance metrics by using the training data obtained from transistor-level models which are over 100 times faster than them. Then, these accurate DNN-based models are used to optimize design goals such as dynamic and static power, setup time, and propagation delay (Data to Output). Using these fast, accurate models significantly speed up the design procedure and leads to a considerably more optimized design. Additionally, as the DNN is a universal approximator that can catch any nonlinear input-output relationship, the proposed method can be used to optimize circuits for any performance metric, even if no analytical formula is available. Additionally, circuit design based on the proposed method is automated which, facilitates the tasks of circuit designers.

 

 

 

 

2024 ISPD

FastTuner: Transferable Physical Design Parameter Optimization using Fast Reinforcement Learning.

Author: Hao-Hsiang Hsiao, Yi-Chen Lu, Pruek Vanna-Iampikul, Sung Kyu Lim

Affiliation: Georgia Institute of Technology Atlanta, GA, USA

Abstract:

Current state-of-the-art Design Space Exploration (DSE) methods in Physical Design (PD), including Bayesian optimization (BO) and Ant Colony Optimization (ACO), mainly rely on black-boxed rather than parametric (e.g., neural networks) approaches to improve end-of-flow Power, Performance, and Area (PPA) metrics, which often fail to generalize across unseen designs as netlist features are not properly leveraged. To overcome this issue, in this paper, we develop a Reinforcement Learning (RL) agent that leverages Graph Neural Networks (GNNs) and Transformers to perform "fast" DSE on unseen designs by sequentially encoding netlist features across different PD stages. Particularly, an attention-based encoder-decoder framework is devised for "conditional" parameter tuning, and a PPA estimator is introduced to predict end-of-flow PPA metrics for RL reward estimation. Extensive studies across 7 industrial designs under the TSMC 28nm technology node demonstrate that the proposed framework FastTuner, significantly outperforms existing state-of-the-art DSE techniques in both optimization quality and runtime. where we observe improvements up to 79.38% in Total Negative Slack (TNS), 12.22% in total power, and 50x in runtime.

 

 

 

 

2024 DATE

DeepSeq: Deep Sequential Circuit Learning

Author: Sadaf Khan, Zhengyuan Shi, Min Li, Qiang Xu

Affiliation: The Chinese University of Hong Kong

Abstract:

In this work, we propose DeepSeq, a novel representation learning framework for sequential netlists. It employs a graph neural network (GNN) with customized propagation to capture temporal correlations. To ensure effective learning, we propose a multi-task training objective with two sets of strongly related supervision: logic probability and transition probability at each logic gate. A novel dual attention aggregation mechanism is introduced to facilitate learning both tasks efficiently. Experimental results validate DeepSeq’s superiority over other GNN modes in sequential circuit learning. It demonstrates accurate reliability and power estimation across diverse circuits and workloads.

 

 

 

 

2023 ISCAS

GRASPE: Accurate Post Synthesis Power Estimation from RTL using Graph Representation Learning

Author: Rakesh M B, Pabitra Das, Anant Terkar, Amit Acharyya

Affiliation: Department of Electrical Engineering, IIT, Hyderabad, India, Department of Computer Science and Engineering, IIIT, Dharwad, India

Abstract:

In this paper, we propose GRASPE, a graph representation learning-based methodology to accurately estimate post-synthesis average power consumption from the RTL to expedite the time to market in the ASIC design. Our proposed methodology uses novel graph neural network architecture (GNN) to work on unoptimized and unmapped post-translated netlist files. The GRASPE learns to propagate the average toggle rates with embedded feature values as vectors on each logic cell during training and then predicts the average toggle rates of a new design during testing. We attain a mean improvement of 19.84% and 4.42% in average toggle rates prediction, 14.12% and 2.67% in average power estimation over the commercial RTL power estimation tool and Graph Convolutional Network as GNN, respectively and 17.96X faster than the commercial gate-level power estimation tool. Subsequently, we evaluate GRASPE with the state-of-the-art GRANNITE for inference latency and average power estimation and demonstrate an average improvement of 3.985X and 1.28%, respectively.

 

 

 

 

2023 ICCAD

MasterRTL: A Pre-Synthesis PPA Estimation Framework for any RTL Design

Author: Wenji Fang, Yao Lu, Shang Liu, Qijun Zhang, Ceyu Xu, Lisa Wu Wills, Hongce Zhang, and Zhiyao Xie

Affiliation: Hong Kong University of Science and Technology Duke University

Abstract:

In modern VLSI design flow, the register-transfer level (RTL) stage is a critical point, where designers define precise design behavior with hardware description languages (HDLs) like Verilog. Since the RTL design is in the format of HDL code, the standard way to evaluate its quality requires time-consuming subsequent synthesis steps with EDA tools. This time-consuming process significantly impedes design optimization at the early RTL stage. Despite the emergence of some recent ML-based solutions, they fail to maintain high accuracy for any given RTL design. In this work, we propose an innovative pre-synthesis PPA estimation framework named MasterRTL. It first converts the HDL code to a new bit-level design representation named the simple operator graph (SOG). By only adopting single-bit simple operators, this SOG proves to be a general representation that unifies different design types and styles. The SOG is also more similar to the target gate-level netlist, reducing the gap between RTL representation and netlist. In addition to the new SOG representation, MasterRTL proposes new ML methods for the RTL-stage modeing of timing, power, and area separately. Compared with state-of-the-art solutions, the experiment on a comprehensive dataset with 90 different designs shows accuracy improvement by 0.33, 0.22, and 0.15 in correlation for total negative slack (TNS), worst negative slack (WNS), and power, respectively.

 

 

 

 

2022 ISCA

SNS's not a Synthesizer: a Deep-Learning-Based Synthesis Predictor

Author: Ceyu Xu, Chris Kjellqvist, Lisa Wu Wills

Affiliation: Duke University Durham, North Carolina, USA

Abstract:

The number of transistors that can fit on one monolithic chip has reached billions to tens of billions in this decade thanks to Moore's Law. With the advancement of every technology generation, the transistor counts per chip grow at a pace that brings about exponential increase in design time, including the synthesis process used to perform design space explorations. Such a long delay in obtaining synthesis results hinders an efficient chip development process, significantly impacting time-to-market. In addition, these large-scale integrated circuits tend to have larger and higher-dimension design spaces to explore, making it prohibitively expensive to obtain physical characteristics of all possible designs using traditional synthesis tools.In this work, we propose a deep-learning-based synthesis predictor called SNS (SNS's not a Synthesizer), that predicts the area, power, and timing physical characteristics of a broad range of designs at two to three orders of magnitude faster than the Synopsys Design Compiler while providing on average a 0.4998 RRSE (root relative square error). We further evaluate SNS via two representative case studies, a general-purpose out-of-order CPU case study using RISC-V Boom open-source design and an accelerator case study using an in-house Chisel implementation of DianNao, to demonstrate the capabilities and validity of SNS.

 

 

 

 

2022 ICCAD

Applying GNNs to Timing Estimation at RTL

Author: Daniela S´ anchez Lopera and Wolfgang Ecker

Affiliation: Infineon Technologies AG Technical University of Munich

Abstract:

In the Electronic Design Automation (EDA) flow, signoff checks, such as timing analysis, are performed only after physical synthesis. Encountered timing violations cause re-iterations of the design flow. Hence, timing estimations at initial design stages, such as Register Transfer Level (RTL), would increase the quality of the results and lower the flow iterations. Machine learning has been used to estimate the timing behavior of chip components. However, existing solutions map EDA objects to Euclidean data without considering that EDA objects are represented naturally as graphs. Recent advances in Graph Neural Networks (GNNs) motivate the mapping from EDA objects to graphs for design metric prediction tasks at different stages. This paper maps RTL designs to directed, featured graphs with multidimensional node and edge features. These are the input to GNNs for estimating component delays and slews. An in-house hardware generation framework and open-source EDA tools for ASIC synthesis are employed for collecting training data. Experiments over unseen circuits show that GNN-based modes are promising for timing estimation, even when the features come from early RTL implementations. Based on estimated delays, critical areas of the design can be detected, and proper RTL micro-architectures can be chosen without running long design iterations.

 

 

 

 

2022 ICCAD

How Good Is Your Verilog RTL Code? A Quick Answer from Machine Learning

Author: Prianka Sengupta, Aakash Tyagi, Yiran Chen, Jiang Hu

Affiliation: Texas A&M University, College Station, Texas, USA, Duke University, Durham, N Carolina, USA

Abstract:

Hardware Description Language (HDL) is a common entry point for designing digital circuits. Differences in HDL coding styles and design choices may lead to considerably different design quality and performance-power tradeoff. In general, the impact of HDL coding is not clear until logic synthesis or even layout is completed. However, running synthesis merely as a feedback for HDL code is computationally not economical especially in early design phases when the code needs to be frequently modified. Furthermore, in late stages of design convergence burdened with high-impact engineering change orders (ECO's), design iterations become prohibitively expensive. To this end, we propose a machine learning approach to Verilog-based Register-Transfer Level (RTL) design assessment without going through the synthesis process. It would allow designers to quickly evaluate the performance-power tradeoff among different options of RTL designs. Experimental results show that our proposed technique achieves an average of 95% prediction accuracy in terms of post-placement analysis, and is 6 orders of magnitude faster than evaluation by running logic synthesis and placement.

 

 

 

 

2021 ICCAD

Fast and Accurate PPA Modeling with Transfer Learning

Author: W. Rhett Davis, Paul D. Franzon, Luis Francisco, Billy Huggins, Rajeev Jain

Affiliation: Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC, USA, Qualcomm, San Diego, CA, USA

Abstract:

The power, performance and area (PPA) of digital blocks can vary 10:1 based on their synthesis, place, and route tool recipes. With rapid increase in number of PVT corners and complexity of logic functions approaching 10M gates, industry has an acute need to minimize the human resources, compute servers, and EDA licenses needed to achieve a Pareto optimal recipe. We first present modes for fast accurate PPA prediction that can reduce the manual optimization iterations with EDA tools. Secondly we investigate techniques to automate the PPA optimization using evolutionary algorithms. For PPA prediction, a baseline mode is trained on a known design using Latin hypercube sample runs of the EDA tool, and transfer learning is then used to train the mode for an unseen design. For a known design the baseline needed 150 training runs to achieve a 95% accuracy. With transfer learning the same accuracy was achieved on a different (unseen) design in only 15 runs indicating the viability of transfer learning to generalize PPA modes. The PPA optimization technique, based on evolutionary algorithms, effectively combines the PPA modeing and optimization. Our approach reached the same PPA solution as human designers in the same or fewer runs for a CORTEX-M0 system design. This shows potential for automating the recipe optimization without needing more runs than a human designer would need.

 

 

 

 

2020 DAC

GRANNITE: Graph Neural Network Inference for Transferable Power Estimation

Author: Yanqing Zhang, Haoxing Ren, Brucek Khailany

Affiliation: NVIDIA, Santa Clara, CA, USA, NVIDIA, Austin, TX, USA

Abstract:

This paper introduces GRANNITE, a GPU accelerated novel graph neural network (GNN) mode for fast, accurate, and transferable vector-based average power estimation. During training, GRANNITE learns how to propagate average toggle rates through combinational logic: a netlist is represented as a graph, register states and unit inputs from RTL simulation are used as features, and combinational gate toggle rates are used as labels. A trained GNN mode can then infer average toggle rates on a new workload of interest or new netlists from RTL simulation results in a few seconds. Compared to traditional power analysis using gate-level simulations, GRANNITE achieves >18.7X speedup with an error of only <5.5% across a diverse set of benchmark circuits. Compared to a GPU-accelerated conventional probabilistic switching activity estimation approach, GRANNITE achieves much better accuracy (on average 25.9% lower error) at similar runtimes.

 

 

 

 

2019 DAC

PRIMAL: Power Inference using Machine Learning.

Author: YuanZhou, Haoxing Ren, Yanqing Zhang, Ben Keller, Brucek Khailany, and Zhiru Zhang

Affiliation: Cornell University NVIDIA Corporation

Abstract:

This paper introduces PRIMAL, a novel learning-based framework that enables fast and accurate power estimation for ASIC designs. PRIMAL trains machine learning (ML) modes with design verification testbenches for characterizing the power of reusable circuit building blocks. The trained modes can then be used to generate detailed power profiles of the same blocks under different workloads. We evaluate the performance of several established ML modes on this task, including ridge regression, gradient tree boosting, multi-layer perceptron, and convolutional neural network (CNN). For average power estimation, ML-based techniques can achieve an average error of less than 1% across a diverse set of realistic benchmarks, outperforming a commercial RTL power estimation tool in both accuracy and speed (15x faster). For cycle-by-cycle power estimation, PRIMAL is on average 50x faster than a commercial gate-level power analysis tool, with an average error less than 5%. In particular, our CNN-based method achieves a 35x speed-up and an error of 5.2% for cycle-by-cycle power estimation of a RISC-V processor core. Furthermore, our case study on a NoC router shows that PRIMAL can achieve a small estimation error of 4.5% using cycle-approximate traces from SystemC simulation.

 

 

 

 

AI+EDA

PPA prediction toward synthesis, placement, and routing