2024 TCAD

Statistical Hardware Design With Multimodel Active Learning

Author: Alireza Ghaffari, Masoud Asgharian, Yvon Savaria

Affiliation: Department of Mathematics and Statistics, McGill University, Montreal, Canada; Department of Electrical Engineering, Polytechnique Montreal, Montreal, Canada

Abstract:

With the rising complexity of numerous novel applications that serve our modern society comes the strong need to design efficient computing platforms. Designing efficient hardware is, however, a complex multi-objective problem that deals with multiple parameters and their interactions. Given that there is a large number of parameters and objectives involved in hardware design, synthesizing all possible combinations is not a feasible method to find the optimal solution. One promising approach to tackle this problem is statistical modeling of a desired hardware performance. Here, we propose a model-based active learning approach to solve this problem. Our proposed method uses Bayesian models to characterize various aspects of hardware performance. We also use acrlong TL and Gaussian regression bootstrapping techniques in conjunction with active learning to create more accurate models. Our proposed statistical modeling method provides hardware models that are sufficiently accurate to perform design space exploration (DSE) as well as performance prediction simultaneously. We use our proposed method to perform DSE and performance prediction for various hardware setups, such as micro-architecture design and OpenCL kernels for FPGA targets. Our experiments show that the number of samples required to create performance models significantly reduces while maintaining the predictive power of our proposed statistical models. For instance, in our performance prediction setting, the proposed method needs 65% fewer samples to create the model, and in the DSE setting, our proposed method can find the best parameter settings by exploring fewer than 50 samples.

 

 

 

 

2024 DAC

Knowing The Spec to Explore The Design via Transformed Bayesian Optimization

Author: Donger Luo, Qi Sun, Xinheng Li, Chen Bai, Bei Yu, Hao Geng

Affiliation: ShanghaiTech University, Zhejiang University, Chinese University of Hong Kong

Abstract:

AI chip scales expediently in the large language models (LLMs) era. In contrast, the existing chip design space exploration (DSE) methods, aimed at discovering optimal yet often infeasible or unproduceable Pareto-front designs, are hindered by neglect of design specifications. In this paper, we propose a novel Spec-driven transformed Bayesian optimization framework to find expected optimal RISC-V SoC architecture designs for LLM tasks. The highlights of our framework lie in a tailored transformed Gaussian process (GP) model prioritizing specified target metrics and a customized acquisition function (EHRM) in multi-objective optimization. Extensive experiments on large-scale RISC-V SoC architecture design explorations for LLMs, such as Transformer, BERT, and GPT-1, demonstrate that our method not only can effectively find the design according to QoR values from the spec, but also outperforms 34.59% in ADRS over state-of-the-art approach with only 66.67% runtime overhead.

 

 

 

 

2024 DAC

Explainable Fuzzy Neural Network with Multi-Fidelity Reinforcement Learning for Micro-Architecture Design Space Exploration

Author: Hanwei FAN, Ya Wang, Sicheng Li, Tingyuan Liang, Wei Zhang

Affiliation: HKUST, Alibaba Group, Hong Kong University of Science and Technology

Abstract:

With the continuous advancement of processors, modern micro-architecture designs have become increasingly complex. The vast design space presents significant challenges for human designers, making design space exploration (DSE) algorithms a significant tool for $\mu$-arch design. In recent years, efforts have been made in the development of DSE algorithms, and promising results have been achieved. However, the existing DSE algorithms, e.g., Bayesian Optimization and ensemble learning, suffer from poor interpretability, hindering designers' understanding of the decision-making process. To address this limitation, we propose utilizing Fuzzy Neural Networks to induce and summarize knowledge and insights from the DSE process, enhancing the interpretability and controllability of DSE results. Furthermore, to improve efficiency, we introduce a multi-fidelity reinforcement learning approach, which primarily conducts exploration using inexpensive but imprecise data, thereby substantially diminishing the reliance on costly data. Experimental results show that our method achieved excellent results with a very limited sample budget and successfully surpasses the current state-of-the-art.

 

 

 

 

2024 ISPD

Computing architecture for Large-Language Models (LLMs) and Large Multimodal Models (LMMs)

Author: Bor-Sung Liang

Affiliation: Corporate Strategy & Strategic Technology, MediaTek & Department of Computer Science and Information Engineering, EECS, National Taiwan University, Hsinchu, Taiwan

Abstract:

Large-language models (LLMs) have achieved remarkable performance in many AI applications, but they require large parameter size in their models. The parameter size ranges from several billions to trillion parameters, and results in huge computation requirements on both training and inference. General speaking, LLMs increasing more parameters are to explore "Emergent Abilities" for AI models. On the other hands, LLMs with fewer parameters are to reduce computing burden to democratize generative AI applications. To fulfill huge computation requirement, Domain Specific Architecture is important to co-optimize AI models, hardware, and software designs, and to make trade-offs among different design parameters. Besides, there are also trade-offs between AI computation throughput and energy efficiency on different types of AI computing systems. Large Multimodal Models (LMMs), also called Multimodal Large Language Models, integrates multiple data types as input. Multimodal information can provide rich and or environment information for LMMs to generate better user experience. LMM is also a trend for mobile devices, because mobile devices often connect with many sensors, such as video, audio, touch, gyro, navigation system, etc. Recently, there is a trend to run smaller LLMs/LMMs (near or less than 10 billion parameters) on edge device-side, such as Llama 2, Gemini Nano, Phi-2, etc. It shines a light to apply LLMs/LMMs in mobile devices. Several companies provided experimental solutions on edge devices, such as smartphone and PC. Even LLMs/LMMs model size are reduced, they still require more computing resources than previous mobile processor workloads, and face challenges on memory size, bandwidth, and power efficiency requirements. Besides, device-side LLMs/LMMs in mobile processors can collaborate with cloud-side LLMs/LMMs in the data center to deliver better performance. They can off-load computing from cloud-side models to provide seamless response, or to become an agent to prompt cloud-side LLMs/LMMs, or be fine-tuned locally by user data to keep privacy. Those LLMs/LMMs trends and new usage scenarios will shape future computing architecture design. In this talk we will discuss those issues, and especially their impacts on mobile processor design.

 

 

 

 

2024 ASP-DAC

ARS-flow: A Design Space Exploration flow for Accelerator-rich System based on Active Learning

Author: Shuaibo Huang, Yuyang Ye, Hao Yan, and Longxing Shi

Affiliation: National ASIC System Engineering Technology Research Center, Southeast University, Nanjing, China

Abstract:

Surrogate model-based design space exploration (DSE) is the mainstream method to search for optimal microarchitecture designs. However, it is hard to build accurate models for accelerator-rich systems within limited samples due to its high dimensional characteristic. Moreover, it is easy to fall into local optimal or difficult to converge. To solve these two problems, we propose a DSE flow based on active learning, namely ARS-Flow. It is featured with Pareto-region-oriented stochastic resampling method (PRSRS) and multiobjective genetic algorithm with self-adaptive hyperparameter control (SAMOGA). Taking the gem5-SALAM system for illustration, the proposed method can build more accurate models and find better microarchitecture designs with acceptable runtime costs.

 

 

 

 

2023 ICCAD

PANDA: architecture-level power evaluation by unifying analytical and machine learning solutions

Author: Qijun Zhang, Shiyu Li, Guanglei Zhou, Jingyu Pan, Chen-Chia Chang, Yiran Chen, and Zhiyao Xie

Affiliation: Hong Kong University of Science and Technology, Duke University

Abstract:

Power efficiency is a critical design objective in modern microprocessor design. To evaluate the impact of architectural-level design decisions, an accurate yet efficient architecture-level power model is desired. However, widely adopted data-independent analytical power models like McPAT and Wattch have been criticized for their unreliable accuracy. While some machine learning (ML) methods have been proposed for architecture-level power modeling, they rely on sufficient known designs for training and perform poorly when the number of available designs is limited, which is typically the case in realistic scenarios. In this work, we derive a general formulation that unifies existing architecture-level power models. Based on the formulation, we propose PANDA, an innovative architecture-level solution that combines the advantages of analytical and ML power models. It achieves unprecedented high accuracy on unknown new designs even when there are very limited designs for training, which is a common challenge in practice. Besides being an excellent power model, it can predict area, performance, and energy accurately. PANDA further supports power prediction for unknown new technology nodes. In our experiments, besides validating the superior performance and the wide range of functionalities of PANDA, we also propose an application scenario, where PANDA proves to identify high-performance design configurations given a power constraint.

 

 

 

 

2023 MICRO

Micro-Armed Bandit: Lightweight & Reusable Reinforcement Learning for Microarchitecture Decision-Making

Author: Gerasimos Gerogiannis, Josep Torrellas

Affiliation: University of Illinois at Urbana-Champaign, USA

Abstract:

Online Reinforcement Learning (RL) has been adopted as an effective mechanism in various decision-making problems in microarchitecture. Its high adaptability and the ability to learn at runtime are attractive characteristics in microarchitecture settings. However, although hardware RL agents are effective, they suffer from two main problems. First, they have high complexity and storage overhead. This complexity stems from decomposing the environment into a large number of states and then, for each of these states, bookkeeping many action values. Second, many RL agents are engineered for a specific application and are not reusable. In this work, we tackle both of these shortcomings by designing an RL agent that is both lightweight and reusable across different microarchitecture decision-making problems. We find that, in some of these problems, only a small fraction of the action space is useful in a given time window. We refer to this property as temporal homogeneity in the action space. Motivated by this property, we design an RL agent based on Multi-Armed Bandit algorithms, the simplest form of RL. We call our agent Micro-Armed Bandit. We showcase our agent in two use cases: data prefetching and instruction fetch in simultaneous multithreaded (SMT) processors. For prefetching, our agent outperforms non-RL prefetchers Bingo and MLOP by 2.6% and 2.3% (geometric mean), respectively, and attains similar performance as the state-of-the-art RL prefetcher Pythia—with the dramatically lower storage requirement of only 100 bytes. For SMT instruction fetch, our agent outperforms the Hill Climbing method by 2.2% (geometric mean).

 

 

 

 

2023 ISCA

MapZero: Mapping for Coarse-grained Reconfigurable architectures with Reinforcement Learning and Monte-Carlo Tree Search

Author: Xiangyu Kong, Yi Huang, Jianfeng Zhu, Xingchen Man, Yang Liu, Chunyang Feng, Pengfei Gou, Minggui Tang, Shaojun Wei, Leibo Liu

Affiliation: School of Integrated Circuits, Beijing National Research Center for Information Science and Technology (BNRist), Tsinghua University, Beijing, China

, Innovation Institute of High Performance Server, GBA, Guangzhou, Guangdong, China, HEXIN Technologies Co., Ltd., Guangzhou, Guangdong, China

Abstract:

Coarse-grained reconfigurable architecture (CGRA) has become a promising candidate for data-intensive computing due to its flexibility and high energy efficiency. CGRA compilers map data flow graphs (DFGs) extracted from applications onto CGRAs, playing a fundamental role in fully exploiting hardware resources for acceleration. Yet the existing compilers are time-demanding and cannot guarantee optimal results due to the traversal search of enormous search spaces brought about by the spatio-temporal flexibility of CGRA structures and the complexity of DFGs. Inspired by the amazing progress in reinforcement learning (RL) and Monte-Carlo tree search (MCTS) for real-world problems, we consider constructing a compiler that can learn from past experiences and comprehensively understand the target DFG and CGRA. In this paper, we propose an architecture-aware compiler for CGRAs based on RL and MCTS, called MapZero - a framework to automatically extract the characteristics of DFG and CGRA hardware and map operations onto varied CGRA fabrics. We apply Graph Attention Network to generate an adaptive embedding for DFGs and also model the functionality and interconnection status of the CGRA, aiming at training an RL agent to perform placement and routing intelligently. Experimental results show that MapZero can generate superior-quality mappings and reduce compilation time hundreds of times compared to state-of-the-art methods. MapZero can find high-quality mappings very quickly when the feasible solution space is rather small and all other compilers fail. We also demonstrate the scalability and broad applicability of our framework.

 

 

 

 

2023 DAC

Graph Representation Learning for Microarchitecture Design Space Exploration

Author: Xiaoling Yi, Jialin Lu, Xiankui Xiong, Dong Xu, Li Shang, Fan Yang

Affiliation: ZTE Corporation, China; State Key Laboratory of Integrated Chips and Systems, Fudan University, Shanghai, China

Abstract:

Design optimization of modern microprocessors is a complex task due to the exponential growth of the design space. This work presents GRL-DSE, an automatic microarchitecture search framework based on graph embeddings. GRL-DSE uses graph representation learning to build a compact and continuous embedding space. Multi-objective Bayesian optimization using an ensemble surrogate model conducts microarchitecture design space exploration in the graph embedding space to efficiently and holistically optimize performance-power-area (PPA) objectives. Experimental studies on RISC-V BOOM show that GRLDSE outperforms previous techniques by 74.59% on Pareto front quality and outperforms manual designs in terms of PPA.

 

 

 

 

2023 ICCAD

IT-DSE: Invariance risk minimized transfer microarchitecture design space exploration

Author: Ziyang Yu, Chen Bail, Shoubo Hu, Ran Chen, Taohai He, Mingxuan Yuan, Bei Yu, and Martin Wong.

Affiliation: The Chinese University of Hong Kong; Huawei Noah’s Ark Lab; HiSilicon

Abstract:

The microarchitecture design of processors faces growing complexity due to expanding design space and time-intensive verification processes. Utilizing historical design task data can improve the search process, but managing distribution discrepancies between different source tasks is essential for enhancing the search method's generalization ability. In light of this, we introduce IT-DSE, a microarchitecture searching framework with the surrogate model pre-trained to absorb knowledge from previous design tasks. The Feature Tokenizer-Transformer (FT-Transformer) serves as a backbone, facilitating feature extraction from source tasks even with varied design spaces. Concurrently, the invariant risk minimization (IRM) paradigm bolsters generalization ability under data distribution discrepancies. Further, IT-DSE exploits a combination of multi-objective Bayesian optimization and a model ensemble to discover Pareto-optimal designs Experimental results indicate that IT-DSE effectively harnesses the knowledge of existing microarchitecture designs and uncovers designs that outperform previous methods in terms of power, performance, and area (PPA).

 

 

 

 

2022 ASP-DAC

RADARS: Memory Efficient Reinforcement Learning Aided Differentiable Neural architecture Search.

Author: Zheyu Yan, Weiwen Jiang, Xiaobo Sharon Hu, Yiyu Shi

Affiliation: University of Notre Dame George Mason University

Abstract:

Differentiable neural architecture search (DNAS) is known for its capacity in the automatic generation of superior neural networks. However, DNAS based methods suffer from memory usage explosion when the search space expands, which may prevent them from running successfully on even advanced GPU platforms. On the other hand, reinforcement learning (RL) based methods, while being memory efficient, are extremely time-consuming. Combining the advantages of both types of methods, this paper presents RADARS, a scalable RL aided DNAS framework that can explore large search spaces in a fast and memory-efficient manner. RADARS iteratively applies RL to prune undesired architecture candidates and identifies a promising subspace to carry out DNAS. Experiments using a workstation with 12 GB GPU memory show that on CIFAR-10 and ImageNet datasets, RADARS can achieve up to 3.41% higher accuracy with 2.5X search time reduction compared with a state-of-the-art RL-based method, while the two DNAS baselines cannot complete due to excessive memory usage or search time. To the best of the authors’ knowledge, this is the first DNAS framework that can handle large search spaces with bounded memory usage.

 

 

 

 

2022 TCAS I

A Fast, Accurate, and Comprehensive PPA Estimation of Convolutional Hardware Accelerators

Author: Leonardo Rezende Juracy, Alexandre de Morais Amory, Fernando Gehm Moraes

Affiliation: Pontifical Catholic University of Rio Grande do Sul (PUCRS), Porto Alegre, Brazil; Scuola Superiore Sant’anna, Italy

Abstract:

Convolutional Neural Networks (CNN) are widely adopted for Machine Learning (ML) tasks, such as classification and computer vision. GPUs became the reference platforms for both training and inference phases of CNNs due to their tailored architecture to the CNN operators. However, GPUs are power-hungry architectures. A path to enable the deployment of CNNs in energy-constrained devices is adopting hardware accelerators for the inference phase. However, the literature presents gaps regarding analyses and comparisons of these accelerators to evaluate Power-Performance-Area (PPA) trade-offs. Typically, the literature estimates PPA from the number of executed operations during the inference phase, such as the number of MACs, which may not be a good proxy for PPA. Thus, it is necessary to deliver accurate hardware estimations, enabling design space exploration (DSE) to deploy CNNs according to the design constraints. This work proposes a fast and accurate DSE approach for CNNs using an analytical model fitted from the physical synthesis of hardware accelerators. The model is integrated with CNN frameworks, like Tensorflow, to generate accurate results. The analytic model estimates area, performance, power, energy, and memory accesses. The observed average error comparing the analytical model to the data obtained from the physical synthesis is smaller than 7%.

 

 

 

 

2021 ICCAD

A microarchitecture power modeling framework for modern CPUs

Author: Jianwang Zhai, Chen Bai, Binwu Zhu, Yici Cai, Qiang Zhou, and Bei Yu

Affiliation: Tsinghua University; The Chinese University of HongKong

Abstract:

Energy efficiency has become the core issue of modern CPUs, and it is difficult for existing power models to balance speed, generality, and accuracy. This paper introduces McPAT-Calib, a microarchitecture power modeling framework, which combines McPAT with machine learning (ML) calibration methods. McPAT-Calib can quickly and accurately estimate the power of different benchmarks running on different CPU configurations, and provide an effective evaluation tool for the design of modern CPUs. First, McPAT-7nm is introduced to support the analytical power modeling for the 7nm technology node. Then, a wide range of modeling features are identified, and automatic feature selection and advanced regression methods are used to calibrate the McPAT-7nm modeling results, which greatly improves the generality and accuracy. Moreover, a sampling algorithm based on active learning (AL) is leveraged to effectively reduce the labeling cost. We use up to 15 configurations of 7nm RISC-V Berkeley Out-of-Order Machine (BOOM) along with 80 benchmarks to extensively evaluate the proposed framework. Compared with state-of-the-art microarchitecture power models, McPAT-Calib can reduce the mean absolute percentage error (MAPE) of shuffle-split cross-validation by 5.95%. More importantly, the MAPE is reduced by 6.14% and 3.64% for the evaluations of unknown CPU configurations and benchmarks, respectively. The AL sampling algorithm can reduce the demand of labeled samples by 50 %, while the accuracy loss is only 0.44 %.

 

 

 

 

2021 ICCAD

BOOM-Explorer: RISC-V BOOM microarchitecture design space exploration framework

Author: ChenBai, QiSun, Jianwang Zhai, Yuzhe Ma, Bei Yu, and Martin DF Wong.

Affiliation: The Chinese University of HongKong; Tsinghua University

Abstract:

The microarchitecture design of a processor has been increasingly difficult due to the large design space and time-consuming verification flow. Previously, researchers rely on prior knowledge and cycle-accurate simulators to analyze the performance of different microarchitecture designs but lack sufficient discussions on methodologies to strike a good balance between power and performance. This work proposes an automatic framework to explore microarchitecture designs of the RISC-V Berkeley Out-of-Order Machine (BOOM), termed as BOOM-Explorer, achieving a good trade-off on power and performance. Firstly, the framework utilizes an advanced microarchitecture-aware active learning (MicroAL) algorithm to generate a diverse and representative initial design set. Secondly, a Gaussian process model with deep kernel learning functions (DKL-GP) is built to characterize the design space. Thirdly, correlated multi-objective Bayesian optimization is leveraged to explore Pareto-optimal designs. Experimental results show that BOOM-Explorer can search for designs that dominate previous arts and designs developed by senior engineers in terms of power and performance within a much shorter time.

 

 

 

 

2019 MLCAD

Learning-based CPU power modeling

Author: Ajay Krishna Ananda Kumar and Andreas Gerstlauer

Affiliation: Department of Electrical and Computer Engineering The University of Texas at Austin

Abstract:

With the end of Dennard scaling, energy efficiency has become an important metric driving future processor architectures, particularly in the fields of mobile and embedded devices. To support rapid, power-aware micro-architectural design space exploration, it is important to accurately quantify the power consumption of the processors early in the design flow and at a high level of abstraction. Existing CPU power models rely on either generic analytical power models or simple regression-based techniques that suffer from large inaccuracies. More recently, machine learning techniques have been proposed to build accurate power models. However, existing approaches still require slow RTL simulations or have only been demonstrated for fixed-function accelerators at higher levels. In this work, we present a machine learning-based approach for power modeling of programmable CPUs at the micro-architecture level. Our models provide cycle-accurate and hierarchical power estimates down to sub-block granularity. Using only high-level information that can be obtained from micro-architecture simulations, we extract representative features and develop low-complexity learning formulations that require a small number of gate-level simulations for training. Results show that our hierarchically composed model predicts cycle-by-cycle power consumption of RISC-V processor core within 2.2% of a gate-level power estimation on average.

 

 

 

 

2018 ICCAD

Machine learning for design space exploration and optimization of manycore systems

Author: Ryan Gary Kim, Janardhan Rao Doppa, and Partha Pratim Pande

Affiliation: Department of Electrical and Computer Engineering Colorado State University School of Electrical Engineering and Computer Science Washington State University School of Electrical Engineering and Computer Science Washington State University

Abstract:

In the emerging data-driven science paradigm, computing systems ranging from IoT and mobile to manycores and datacenters play distinct roles. These systems need to be optimized for the objectives and constraints dictated by the needs of the application. In this paper, we describe how machine learning techniques can be leveraged to improve the computational-efficiency of hardware design optimization. This includes generic methodologies that are applicable for any hardware design space. As an example, we discuss a guided design space exploration framework to accelerate application-specific manycore systems design and advanced imitation learning techniques to improve onchip resource management. We present some experimental results for application-specific manycore system design optimization and dynamic power management to demonstrate the efficacy of these methods over traditional EDA approaches.

 

 

 

 

2017 ISLPED

A learning bridge from architectural synthesis to physical design for exploring power efficient high-performance adders

Author: Subhendu Roy, Yuzhe Ma, Jin Miao, Bei Yu

Affiliation: Cadence Design Systems, San Jose, CA, USA; CSE Department, The Chinese University of Hong Kong, NT, Hong Kong

Abstract:

In spite of maturity to the modern electronic design automation (EDA) tools, optimized designs at architectural stage may become sub-optimal after going through physical design flow. Adder design has been such a long studied fundamental problem in VLSI industry yet designers cannot achieve optimal solutions by running EDA tools on the set of available prefix adder architectures. In this paper, we enhance a state-of-the-art prefix adder synthesis algorithm to obtain a much wider solution space in architectural domain. On top of that, a machine learning based design space exploration methodology is applied to predict the Pareto frontier of the adders in physical domain, which is infeasible by exhaustively running EDA tools for innumerable architectural solutions. Experimental results demonstrate that our framework can achieve near-optimal delay vs. power/area Pareto frontier over a wide design space, bridging the gap between architeon the set of available prefix adder architectures. In this paper, we enhance a state-of-the-art prefix adder synthesis algorithm to obtain a much wider solution space in architectural domain. On top of that, a machine learning based design space exploration methodology is applied to predict the Pareto frontier of the adders in physical domain, which is infeasible by exhaustively running EDA tools for innumerable architectural solutions. Experimental results demonstrate that our framework can achieve near-optimal delay vs. power/area Pareto frontier over a wide design space, bridging the gap between architectural andctural and physical designs.

 

 

 

 

2015 HPCA

GPGPU performance and power estimation using machine learning

Author: Gene Wu, Joseph L Greathouse, Alexander Lyashevsky, Nuwan Jayasena, and Derek Chiou

Affiliation: Electrical and Computer Engineering The University of Texas at Austin AMD Research Advanced Micro Devices, Inc

Abstract:

Graphics Processing Units (GPUs) have numerous configuration and design options, including core frequency, number of parallel compute units (CUs), and available memory bandwidth. At many stages of the design process, it is important to estimate how application performance and power are impacted by these options. This paper describes a GPU performance and power estimation model that uses machine learning techniques on measurements from real GPU hardware. The model is trained on a collection of applications that are run at numerous different hardware configurations. From the measured performance and power data, the model learns how applications scale as the GPU's configuration is changed. Hardware performance counter values are then gathered when running a new application on a single GPU configuration. These dynamic counter values are fed into a neural network that predicts which scaling curve from the training data best represents this kernel. This scaling curve is then used to estimate the performance and power of the new application at different GPU configurations. Over an 8× range of the number of CUs, a 3.3× range of core frequencies, and a 2.9× range of memory bandwidth, our model's performance and power estimates are accurate to within 15% and 10% of real hardware, respectively. This is comparable to the accuracy of cycle-level simulators. However, after an initial training phase, our model runs as fast as, or faster than the program running natively on real hardware.

 

 

 

 

2006 HPCA

Construction and use of linear regression models for processor performance analysis

Author: P.J. Joseph, Kapil Vaswani, M.J. Thazhuthaveetil

Affiliation: Department of Computer Science & Automation; Indian Institute of Science, Bangalore, India.

Abstract:

Processor architects have a challenging task of evaluating a large design space consisting of several interacting parameters and optimizations. In order to assist architects in making crucial design decisions, we build linear regression models that relate processor performance to micro-architectural parameters, using simulation based experiments. We obtain good approximate models using an iterative process in which Akaike's information criteria is used to extract a good linear model from a small set of simulations, and limited further simulation is guided by the model using D-optimal experimental designs. The iterative process is repeated until desired error bounds are achieved. We used this procedure to establish the relationship of the CPI performance response to 26 key micro-architectural parameters using a detailed cycle-by-cycle superscalar processor simulator. The resulting models provide a significance ordering on all micro-architectural parameters and their interactions, and explain the performance variations of micro-architectural techniques.

 

AI+EDA

Architecture or microarchitecture design optimization