Michael J. Schulte

Mike Ignatowski

Vignesh Adhinarayanan

Kishore Punniyamurthy

Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

2021

What Made Us Stronger: An Inside Look Back at the History of AMD Microprocessor Development.

[BibT_eX]

[DOI]

Dave Christie

Mike Clark

Mike Schulte

IEEE Micro, 2021

2020

Approximate Computing: From Circuits to Applications [Scanning the Issue].

[BibT_eX]

[DOI]

Weiqiang Liu

Fabrizio Lombardi

Thiruvengadam Vijayaraghavan

Proc. IEEE, 2020

2017

Design and Analysis of an APU for Exascale Computing.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

Accelerating Matrix Processing with GPUs.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE Symposium on Computer Arithmetic, 2017

2015

Achieving Exascale Capabilities through Heterogeneous Computing.

[BibT_eX]

[DOI]

IEEE Micro, 2015

2014

Low-Cost Per-Core Voltage Domain Support for Power-Constrained High-Performance Processors.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2014

Energy-Efficient Pixel-Arithmetic.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2014

Process variation-aware workload partitioning algorithms for GPUs supporting spatial-multitasking.

[BibT_eX]

[DOI]

Paula Aguilera

Jungseob Lee

Katherine Morrow

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

Memory scheduling towards high-throughput cooperative heterogeneous computing.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013

High-Energy Physics.

[BibT_eX]

[DOI]

Proceedings of the Handbook of Signal Processing Systems, 2013

Instruction Set Extensions for Matrix Decompositions on Software Defined Radio Architectures.

[BibT_eX]

[DOI]

Murugappan Senthilvelan

J. Signal Process. Syst., 2013

Binary Integer Decimal-Based Floating-Point Multiplication.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2013

Modular Design of High-Throughput, Low-Latency Sorting Units.

[BibT_eX]

[DOI]

Henry J. Duwe III

Madhu Saravana Sibi Govindan

IEEE Trans. Computers, 2013

Automating Stressmark Generation for Testing Processor Voltage Fluctuations.

[BibT_eX]

[DOI]

William Lloyd Bircher

IEEE Micro, 2013

Exploiting GPU peak-power and performance tradeoffs through reduced effective pipeline latency.

[BibT_eX]

[DOI]

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

REEL: Reducing effective execution latency of floating point operations.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

Performance boosting under reliability and power constraints.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2013

Dynamic bandwidth scaling for embedded DSPs with 3D-stacked DRAM and wide I/Os.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2013

Power-efficient computing for compute-intensive GPGPU applications.

[BibT_eX]

[DOI]

Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

Reevaluating the latency claims of 3D stacked memories.

[BibT_eX]

[DOI]

Proceedings of the 18th Asia and South Pacific Design Automation Conference, 2013

2012

A study of decimal left shifters for binary numbers.

[BibT_eX]

[DOI]

Javier Hormigo

Madhu Saravana Sibi Govindan

Inf. Comput., 2012

AUDIT: Stress Testing the Automatic Way.

[BibT_eX]

[DOI]

William Lloyd Bircher

Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Something old and something new: P-states can borrow microarchitecture techniques too.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Low Power Electronics and Design, 2012

The case for GPGPU spatial multitasking.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

Cost-effective power delivery to support per-core voltage domains for power-constrained processors.

[BibT_eX]

[DOI]

Proceedings of the 49th Annual Design Automation Conference 2012, 2012

A Linear Algebra Core Design for Efficient Level-3 BLAS.

[BibT_eX]

[DOI]

Ardavan Pedram

Robert A. van de Geijn

Andreas Gerstlauer

Proceedings of the 23rd IEEE International Conference on Application-Specific Systems, 2012

Virtual Floating-Point Units for Low-Power Embedded Processors.

[BibT_eX]

[DOI]

Proceedings of the 23rd IEEE International Conference on Application-Specific Systems, 2012

Session MP6a: Computer arithmetic (invited).

[BibT_eX]

[DOI]

Proceedings of the Conference Record of the Forty Sixth Asilomar Conference on Signals, 2012

Workload and power budget partitioning for single-chip heterogeneous processors.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads.

[BibT_eX]

[DOI]

Vijay Sathish

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011

Hardware Designs for Binary Integer Decimal-Based Rounding.

[BibT_eX]

[DOI]

Samuel Tsen

IEEE Trans. Computers, 2011

Modular high-throughput and low-latency sorting units for FPGAs in the Large Hadron Collider.

[BibT_eX]

[DOI]

Proceedings of the IEEE 9th Symposium on Application Specific Processors, 2011

Analyzing the performance and energy impact of 3D memory integration on embedded DSPs.

[BibT_eX]

[DOI]

Daniel W. Chang

Proceedings of the 2011 International Conference on Embedded Computer Systems: Architectures, 2011

Scratchpad memory optimizations for digital signal processing applications.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation and Test in Europe, 2011

Energy-efficient floating-point arithmetic for software-defined radio architectures.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Conference on Application-specific Systems, 2011

A decimal floating-point fused multiply-add unit with a novel decimal leading-zero anticipator.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Conference on Application-specific Systems, 2011

Truncated-matrix multipliers with coefficient shifting.

[BibT_eX]

[DOI]

E. George Walters III

Proceedings of the Conference Record of the Forty Fifth Asilomar Conference on Signals, 2011

Session MP7b: Model-based design optimization.

[BibT_eX]

[DOI]

Proceedings of the Conference Record of the Forty Fifth Asilomar Conference on Signals, 2011

Session MP8a4: DSP algorithms and architectures.

[BibT_eX]

[DOI]

Proceedings of the Conference Record of the Forty Fifth Asilomar Conference on Signals, 2011

Energy-efficient floating-point arithmetic for digital signal processors.

[BibT_eX]

[DOI]

Proceedings of the Conference Record of the Forty Fifth Asilomar Conference on Signals, 2011

Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage/Frequency and Core Scaling.

[BibT_eX]

[DOI]

Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010

Instruction set extensions for the advanced encryption standard on a multithreaded software defined radio platform.

[BibT_eX]

[DOI]

Christipher D. Jenkins

Int. J. High Perform. Syst. Archit., 2010

A survey of hardware designs for decimal arithmetic.

[BibT_eX]

[DOI]

IBM J. Res. Dev., 2010

CORDIC-based LMMSE equalizer for Software Defined Radio.

[BibT_eX]

[DOI]

Murugappan Senthilvelan

Proceedings of the 2010 International Conference on Embedded Computer Systems: Architectures, 2010

ARAL-CR: An adaptive reasoning and learning cognitive radio platform.

[BibT_eX]

[DOI]

Proceedings of the 2010 International Conference on Embedded Computer Systems: Architectures, 2010

ERCBench: An Open-Source Benchmark Suite for Embedded and Reconfigurable Computing.

[BibT_eX]

[DOI]

Daniel W. Chang

Christipher D. Jenkins

Proceedings of the International Conference on Field Programmable Logic and Applications, 2010

Galois field hardware architectures for network coding.

[BibT_eX]

[DOI]

Aishwarya Nagarajan

Parameswaran Ramanathan

Proceedings of the 2010 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, 2010

High-Energy Physics.

[BibT_eX]

[DOI]

Proceedings of the Handbook of Signal Processing Systems, 2010

2009

Hardware Designs for Decimal Floating-Point Addition and Related Operations.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2009

Low-Power Multiple-Precision Iterative Floating-Point Multiplier with SIMD Support.

[BibT_eX]

[DOI]

Dimitri Tan

Carl Lemonds

IEEE Trans. Computers, 2009

Decimal Floating-Point Multiplication.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2009

Instruction set extensions for software defined radio.

[BibT_eX]

[DOI]

Microprocess. Microsystems, 2009

The Emerging Landscape of Computer Performance Evaluation.

[BibT_eX]

[DOI]

Adv. Comput., 2009

Performance analysis of decimal floating-point libraries and its impact on decimal hardware and software solutions.

[BibT_eX]

[DOI]

Proceedings of the 27th International Conference on Computer Design, 2009

FPGA Design Analysis of the Clustering Algorithm for the CERN Large Hadron Collider.

[BibT_eX]

[DOI]

Proceedings of the FCCM 2009, 2009

A Combined Decimal and Binary Floating-Point Multiplier.

[BibT_eX]

[DOI]

Proceedings of the 20th IEEE International Conference on Application-Specific Systems, 2009

A Decimal Floating-Point Adder with Decoded Operands and a Decimal Leading-Zero Anticipator.

[BibT_eX]

[DOI]

Proceedings of the 19th IEEE Symposium on Computer Arithmetic, 2009

2008

Improved combined binary/decimal fixed-point multipliers.

[BibT_eX]

[DOI]

Proceedings of the 26th International Conference on Computer Design, 2008

Implementing communications systems on an SDR SoC.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

2007

A Decimal Floating-Point Divider Using Newton-Raphson Iteration.

[BibT_eX]

[DOI]

J. VLSI Signal Process., 2007

The Sandbridge SB3011 Platform.

[BibT_eX]

[DOI]

EURASIP J. Embed. Syst., 2007

A New Era of Performance Evaluation.

[BibT_eX]

[DOI]

Sean M. Pieper

JoAnn M. Paul

Computer, 2007

Software Solutions for Converting a MIMO-OFDM Channel into Multiple SISO-OFDM Channels.

[BibT_eX]

[DOI]

Mihai Sima

Murugappan Senthilvelan

Proceedings of the Third IEEE International Conference on Wireless and Mobile Computing, 2007

Trends in Low Power Handset Software Defined Radio.

[BibT_eX]

[DOI]

Proceedings of the Embedded Computer Systems: Architectures, 2007

Benchmarks and performance analysis of decimal floating-point applications.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Computer Design, 2007

Hardware design of a Binary Integer Decimal-based floating-point adder.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Computer Design, 2007

Floating-point division algorithms for an x86 microprocessor with a rectangular multiplier.

[BibT_eX]

[DOI]

Dimitri Tan

Carl Lemonds

Proceedings of the 25th International Conference on Computer Design, 2007

A parallel IEEE P754 decimal floating-point multiplier.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Computer Design, 2007

Hardware Design of a Binary Integer Decimal-based IEEE P754 Rounding Unit.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Application-Specific Systems, 2007

Architecture Support for Reconfigurable Multithreaded Processors in Programmable Communication Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Application-Specific Systems, 2007

Decimal Floating-Point Adder and Multifunction Unit with Injection-Based Rounding.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE Symposium on Computer Arithmetic (ARITH-18 2007), 2007

Decimal Floating-Point Multiplication Via Carry-Save Addition.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE Symposium on Computer Arithmetic (ARITH-18 2007), 2007

2006

Reciprocal and Reciprocal Square Root Units with Operand Modification and Multiplication.

[BibT_eX]

[DOI]

J. VLSI Signal Process., 2006

A Low-Power Multithreaded Processor for Software Defined Radio.

[BibT_eX]

[DOI]

J. VLSI Signal Process., 2006

Generation and visualization of four-dimensional MR angiography data using an undersampled 3-D projection trajectory.

[BibT_eX]

[DOI]

IEEE Trans. Medical Imaging, 2006

Integer Multipliers with Overflow Detection.

[BibT_eX]

[DOI]

Mustafa Gök

Mark G. Arnold

IEEE Trans. Computers, 2006

Dual-mode floating-point multiplier architectures with parallel operations.

[BibT_eX]

[DOI]

J. Syst. Archit., 2006

An Overview of Reconfigurable Hardware in Embedded Systems.

[BibT_eX]

[DOI]

EURASIP J. Embed. Syst., 2006

2005

Guest Editorial.

[BibT_eX]

[DOI]

Shuvra S. Bhattacharyya

Robert Schreiber

J. VLSI Signal Process., 2005

High-Speed Multioperand Decimal Adders.

[BibT_eX]

[DOI]

Robert D. Kenney

IEEE Trans. Computers, 2005

Sandbridge Software Tools.

[BibT_eX]

[DOI]

Proceedings of the Embedded Computer Systems: Architectures, 2005

A combined two's complement and floating-point comparator.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2005), 2005

Future wireless convergence platforms.

[BibT_eX]

[DOI]

Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2005

Instruction set extensions for software defined radio on a multithreaded processor.

[BibT_eX]

[DOI]

Proceedings of the 2005 International Conference on Compilers, 2005

Decimal Floating-Point Square Root Using Newton-Raphson Iteration.

[BibT_eX]

[DOI]

Proceedings of the 16th IEEE International Conference on Application-Specific Systems, 2005

Instruction Set Extensions for Reed-Solomon Encoding and Decoding.

[BibT_eX]

[DOI]

Proceedings of the 16th IEEE International Conference on Application-Specific Systems, 2005

Efficient Function Approximation Using Truncated Multipliers and Squarers.

[BibT_eX]

[DOI]

E. George Walters III

Proceedings of the 17th IEEE Symposium on Computer Arithmetic (ARITH-17 2005), 2005

Decimal Multiplication with Efficient Partial Product Generation.

[BibT_eX]

[DOI]

Eric M. Schwarz

Proceedings of the 17th IEEE Symposium on Computer Arithmetic (ARITH-17 2005), 2005

2004

Intrinsic Compiler Support for Interval Arithmetic.

[BibT_eX]

[DOI]

Numer. Algorithms, 2004

A Low-Power Multithreaded Processor for Baseband Communication Systems.

[BibT_eX]

[DOI]

Proceedings of the Computer Systems: Architectures, 2004

The 4D Cluster Visualization project.

[BibT_eX]

[DOI]

Proceedings of the Medical Imaging 2004: Visualization, 2004

A 64-bit Decimal Floating-Point Adder.

[BibT_eX]

[DOI]

John D. Thompson

Nandini Karra

Proceedings of the 2004 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2004), 2004

A Subword-Parallel Multiplication and Sum-of-Squares Unit.

[BibT_eX]

[DOI]

Shankar Krithivasan

Proceedings of the 2004 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2004), 2004

Multioperand Decimal Addition.

[BibT_eX]

[DOI]

Robert D. Kenney

Proceedings of the 2004 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2004), 2004

A High-Frequency Decimal Multiplier.

[BibT_eX]

[DOI]

Robert D. Kenney

Proceedings of the 22nd IEEE International Conference on Computer Design: VLSI in Computers & Processors (ICCD 2004), 2004

A Static Low-Power, High-Performance 32-bit Carry Skip Adder.

[BibT_eX]

[DOI]

Proceedings of the 2004 Euromicro Symposium on Digital Systems Design (DSD 2004), Architectures, Methods and Tools, 31 August, 2004

Sandblaster low power DSP [parallel DSP arithmetic microarchitecture].

[BibT_eX]

[DOI]

Proceedings of the IEEE 2004 Custom Integrated Circuits Conference, 2004

Decimal Floating-Point Division Using Newton-Raphson Iteration.

[BibT_eX]

[DOI]

Proceedings of the 15th IEEE International Conference on Application-Specific Systems, 2004

A Low-Power Carry Skip Adder with Fast Saturation.

[BibT_eX]

[DOI]

Proceedings of the 15th IEEE International Conference on Application-Specific Systems, 2004

2003

A Quadruple Precision and Dual Double Precision Floating-Point Multiplier.

[BibT_eX]

[DOI]

Proceedings of the 2003 Euromicro Symposium on Digital Systems Design (DSD 2003), 2003

Combined Multiplication and Sum-of-Squares Units.

[BibT_eX]

[DOI]

Louis Marquette

Shankar Krithivasan

E. George Walters III

Proceedings of the 14th IEEE International Conference on Application-Specific Systems, 2003

Decimal Multiplication Via Carry-Save Addition.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Conference on Application-Specific Systems, 2003

The Interval Logarithmic Number System.

[BibT_eX]

[DOI]

Mark G. Arnold

Jesus Garcia

Proceedings of the 16th IEEE Symposium on Computer Arithmetic (Arith-16 2003), 2003

2002

Guest Editorial.

[BibT_eX]

[DOI]

Graham A. Jullien

J. VLSI Signal Process., 2002

A Java-Enabled DSP.

[BibT_eX]

[DOI]

C. John Glossner

Stamatis Vassiliadis

Proceedings of the Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation, 2002

2001

Combined IEEE Compliant and Truncated Floating Point Multipliers for Reduced Power Dissipation.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Computer Design (ICCD 2001), 2001

Design Alternatives for Parallel Saturating Multioperand Adders.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Computer Design (ICCD 2001), 2001

FPGA Resource Reduction Through Truncated Multiplication.

[BibT_eX]

[DOI]

Don McCarley

Proceedings of the Field-Programmable Logic and Applications, 2001

Analysis of Column Compression Multipliers.

[BibT_eX]

[DOI]

K'Andrea C. Bickerstaff

Proceedings of the 15th IEEE Symposium on Computer Arithmetic (Arith-15 2001), 2001

2000

A Family of Variable-Precision Interval Arithmetic Processors.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2000

Integer Multiplication with Overflow Detection or Saturation.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2000

A New Approach to DSP Intrinsic Functions.

[BibT_eX]

[DOI]

Proceedings of the 33rd Annual Hawaii International Conference on System Sciences (HICSS-33), 2000

Parallel saturating multioperand adders.

[BibT_eX]

[DOI]

Proceedings of the 2000 International Conference on Compilers, 2000

A Hardware Algorithm for Variable-Precision Logarithm.

[BibT_eX]

[DOI]

Javier Hormigo

Julio Villalba

Proceedings of the 12th IEEE International Conference on Application-Specific Systems, 2000

1999

The Symmetric Table Addition Method for Accurate Function Approximation.

[BibT_eX]

[DOI]

J. VLSI Signal Process., 1999

Approximating Elementary Functions with Symmetric Bipartite Tables.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1999

The Interval-Enhanced GNU Fortran Compiler.

[BibT_eX]

[DOI]

Reliab. Comput., 1999

Parallel Saturating Fractional Arithmetic Units.

[BibT_eX]

[DOI]

Navindra Yadav

Proceedings of the 9th Great Lakes Symposium on VLSI (GLS-VLSI '99), 1999

High-Speed Inverse Square Roots.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE Symposium on Computer Arithmetic (Arith-14 '99), 1999

1998

Single-Number Interval I/O.

[BibT_eX]

[DOI]

Proceedings of the Developments in Reliable Computing, 1998

A Combined Interval and Floating Point Multiplier.

[BibT_eX]

[DOI]

Proceedings of the 8th Great Lakes Symposium on VLSI (GLS-VLSI '98), 1998

1997

Accurate Function Approximations by Symmetric Table Lookup and Addition.

[BibT_eX]

[DOI]

Proceedings of the 1997 International Conference on Application-Specific Systems, 1997

Symmetric Bipartite Tables for Accurate Function Approximation.

[BibT_eX]

[DOI]

Proceedings of the 13th Symposium on Computer Arithmetic (ARITH-13 '97), 1997

1996

Hardware interval multipliers.

[BibT_eX]

RITA, 1996

Variable-precision, interval arithmetic coprocessors.

[BibT_eX]

[DOI]

Reliab. Comput., 1996

Software for high radix on-line arithmetic.

[BibT_eX]

[DOI]

Thomas W. Lynch

Reliab. Comput., 1996

1995

Parallel reduced area multipliers.

[BibT_eX]

[DOI]

K'Andrea C. Bickerstaff

J. VLSI Signal Process., 1995

A software interface and hardware design for variable-precision interval arithmetic.

[BibT_eX]

[DOI]

Reliab. Comput., 1995

A High Radix On-Line Arithmetic for Credible and Accurate Computing.

[BibT_eX]

[DOI]

Thomas W. Lynch

J. Univers. Comput. Sci., 1995

A coprocessor for accurate and reliable numerical computations.

[BibT_eX]

[DOI]

Proceedings of the 1995 International Conference on Computer Design (ICCD '95), 1995

A Processor for Staggered Interval Arithmetic.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Application Specific Array Processors (ASAP'95), 1995

Hardware Design and Arithmetic Algorithms for a Variable-Precision, Interval Arithmetic Coprocessor.

[BibT_eX]

[DOI]

Proceedings of the 12th Symposium on Computer Arithmetic (ARITH-12 '95), 1995

The K5 transcendental functions.

[BibT_eX]

[DOI]

Proceedings of the 12th Symposium on Computer Arithmetic (ARITH-12 '95), 1995

1994

Hardware Designs for Exactly Rounded Elemantary Functions.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1994

Optimal initial approximations for the Newton-Raphson division algorithm.

[BibT_eX]

[DOI]

J. Omar

Computing, 1994

A variable-precision interval arithmetic processor.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Application Specific Array Processors, 1994

1993

Reduced area multipliers.

[BibT_eX]

[DOI]

K'Andrea C. Bickerstaff

Proceedings of the International Conference on Application-Specific Array Processors, 1993

Exact rounding of certain elementary functions.

[BibT_eX]

[DOI]