William J. Dally
Orcid: 0000-0003-4632-2876Affiliations:
- Stanford University, USA
- NVIDIA
According to our database1,
William J. Dally
authored at least 264 papers
between 1985 and 2024.
Collaborative distances:
Collaborative distances:
Awards
ACM Fellow
ACM Fellow 2002, "For contributions to the architecture and design of interconnections networks and parallel computing.".
IEEE Fellow
IEEE Fellow 2002, "For contributions to parallel computing and interconnection networks".
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on zbmath.org
-
on acm.org
-
on orcid.org
-
on id.loc.gov
-
on d-nb.info
-
on isni.org
-
on dl.acm.org
On csauthors.net:
Bibliography
2024
A 0.190-pJ/bit 25.2-Gb/s/wire Inverter-Based AC-Coupled Transceiver for Short-Reach Die-to-Die Interfaces in 5-nm CMOS.
IEEE J. Solid State Circuits, April, 2024
Leveraging Micro-Bump Pitch Scaling to Accelerate Interposer Link Bandwidths for Future High-Performance Compute Applications.
Proceedings of the IEEE Custom Integrated Circuits Conference, 2024
2023
A Novel High-Efficiency Three-Phase Multilevel PV Inverter With Reduced DC-Link Capacitance.
IEEE Trans. Ind. Electron., 2023
A 0.297-pJ/Bit 50.4-Gb/s/Wire Inverter-Based Short-Reach Simultaneous Bi-Directional Transceiver for Die-to-Die Interface in 5-nm CMOS.
IEEE J. Solid State Circuits, 2023
A 95.6-TOPS/W Deep Learning Inference Accelerator With Per-Vector Scaled 4-bit Quantization in 5 nm.
IEEE J. Solid State Circuits, 2023
Retrospective: EIE: Efficient Inference Engine on Sparse and Compressed Neural Network.
CoRR, 2023
GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System Architecture.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023
2022
GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System Architecture.
Dataset, October, 2022
LNS-Madam: Low-Precision Training in Logarithmic Number System Using Multiplicative Weight Update.
IEEE Trans. Computers, 2022
BaM: A Case for Enabling Fine-grain High Throughput GPU-Orchestrated Access to Storage.
CoRR, 2022
A 0.297-pJ/bit 50.4-Gb/s/wire Inverter-Based Short-Reach Simultaneous Bidirectional Transceiver for Die-to-Die Interface in 5nm CMOS.
Proceedings of the IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits 2022), 2022
A 17-95.6 TOPS/W Deep Learning Inference Accelerator with Per-Vector Scaled 4-bit Quantization for Transformers in 5nm.
Proceedings of the IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits 2022), 2022
Proceedings of the IEEE/ACM International Workshop on Performance Modeling, 2022
Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training.
Proceedings of the International Conference on Machine Learning, 2022
2021
GetMobile Mob. Comput. Commun., 2021
Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update.
CoRR, 2021
VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference.
CoRR, 2021
Commun. ACM, 2021
SPAA'21 Panel Paper: Architecture-Friendly Algorithms versus Algorithm-Friendly Architectures.
Proceedings of the SPAA '21: 33rd ACM Symposium on Parallelism in Algorithms and Architectures, 2021
VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference.
Proceedings of the Fourth Conference on Machine Learning and Systems, 2021
2020
IEEE Trans. Computers, 2020
A 0.32-128 TOPS, Scalable Multi-Chip-Module-Based Deep Neural Network Inference Accelerator With Ground-Referenced Signaling in 16 nm.
IEEE J. Solid State Circuits, 2020
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020
Optimal Operation of a Plug-in Hybrid Vehicle with Battery Thermal and Degradation Model.
Proceedings of the 2020 American Control Conference, 2020
2019
A 1.17-pJ/b, 25-Gb/s/pin Ground-Referenced Single-Ended Serial Link for Off- and On-Package Communication Using a Process- and Temperature-Adaptive Voltage Regulator.
IEEE J. Solid State Circuits, 2019
A 0.11 pJ/Op, 0.32-128 TOPS, Scalable Multi-Chip-Module-based Deep Neural Network Accelerator with Ground-Reference Signaling in 16nm.
Proceedings of the 2019 Symposium on VLSI Circuits, Kyoto, Japan, June 9-14, 2019, 2019
Proceedings of the Second Conference on Machine Learning and Systems, SysML 2019, 2019
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019
Proceedings of the International Conference on Computer-Aided Design, 2019
Darwin-WGA: A Co-processor Provides Increased Sensitivity in Whole Genome Alignments with High Speedup.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019
A 0.11 PJ/OP, 0.32-128 Tops, Scalable Multi-Chip-Module-Based Deep Neural Network Accelerator Designed with A High-Productivity vlsi Methodology.
Proceedings of the 2019 IEEE Hot Chips 31 Symposium (HCS), 2019
Proceedings of the 56th Annual Design Automation Conference 2019, 2019
A 2-to-20 GHz Multi-Phase Clock Generator with Phase Interpolators Using Injection-Locked Oscillation Buffers for High-Speed IOs in 16nm FinFET.
Proceedings of the IEEE Custom Integrated Circuits Conference, 2019
Proceedings of the 25th IEEE International Symposium on Asynchronous Circuits and Systems, 2019
2018
Proceedings of the 2018 IEEE Symposium on VLSI Circuits, 2018
A 1.17pJ/b 25Gb/s/pin ground-referenced single-ended serial link for off- and on-package communication in 16nm CMOS using a process- and temperature-adaptive voltage regulator.
Proceedings of the 2018 IEEE International Solid-State Circuits Conference, 2018
Proceedings of the 6th International Conference on Learning Representations, 2018
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training.
Proceedings of the 6th International Conference on Learning Representations, 2018
Proceedings of the 55th Annual Design Automation Conference, 2018
Ground-referenced signaling for intra-chip and short-reach chip-to-chip interconnects.
Proceedings of the 2018 IEEE Custom Integrated Circuits Conference, 2018
Darwin: A Genomics Co-processor Provides up to 15, 000X Acceleration on Long Read Assembly.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018
2017
CG-OoO: Energy-Efficient Coarse-Grain Out-of-Order Execution Near In-Order Energy with Near Out-of-Order Performance.
ACM Trans. Archit. Code Optim., 2017
CoRR, 2017
CoRR, 2017
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017
Proceedings of the Workshop on Trends in Machine-Learning (and impact on computer architecture), 2017
Proceedings of the 5th International Conference on Learning Representations, 2017
Proceedings of the 5th International Conference on Learning Representations, 2017
Proceedings of the 5th International Conference on Learning Representations, 2017
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017
2016
ACM Trans. Archit. Code Optim., 2016
A 28 nm 2 Mbit 6 T SRAM With Highly Configurable Low-Voltage Write-Ability Assist Implementation and Capacitor-Based Sense-Amplifier Input Offset Compensation.
IEEE J. Solid State Circuits, 2016
CoRR, 2016
CoRR, 2016
Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding.
Proceedings of the 4th International Conference on Learning Representations, 2016
8.6 A 6.5-to-23.3fJ/b/mm balanced charge-recycling bus in 16nm FinFET CMOS at 1.7-to-2.6Gb/s/wire with clock forwarding and low-crosstalk contraflow wiring.
Proceedings of the 2016 IEEE International Solid-State Circuits Conference, 2016
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016
Deep compression and EIE: Efficient inference engine on compressed deep neural network.
Proceedings of the 2016 IEEE Hot Chips 28 Symposium (HCS), 2016
2015
IEEE Trans. Parallel Distributed Syst., 2015
Proceedings of the International Conference for High Performance Computing, 2015
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015
2014
Proceedings of the International Conference for High Performance Computing, 2014
Proceedings of the ACM International Conference on Supercomputing 25th Anniversary Volume, 2014
2013
A 0.54 pJ/b 20 Gb/s Ground-Referenced Single-Ended Short-Reach Serial Link in 28 nm CMOS for Advanced Packaging Applications.
IEEE J. Solid State Circuits, 2013
Proceedings of the International Conference for High Performance Computing, 2013
A 0.54pJ/b 20Gb/s ground-referenced single-ended short-haul serial link in 28nm CMOS for advanced packaging applications.
Proceedings of the 2013 IEEE International Solid-State Circuits Conference, 2013
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013
Proceedings of the 50th Annual Design Automation Conference 2013, 2013
2012
A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors.
ACM Trans. Comput. Syst., 2012
Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012
Proceedings of the 30th International IEEE Conference on Computer Design, 2012
Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012
2011
IEEE Comput. Archit. Lett., 2011
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011
Proceedings of the 2011 IEEE International Test Conference, 2011
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
The utility of fast active messages on many-core chips: Efficient supercomputing project.
Proceedings of the 2011 IEEE Hot Chips 23 Symposium (HCS), 2011
2010
Buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures.
Proceedings of the SPAA 2010: Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2010
Proceedings of the NOCS 2010, 2010
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010
Proceedings of the 24th International Conference on Supercomputing, 2010
Proceedings of the 39th International Conference on Parallel Processing, 2010
Proceedings of the 2010 International Conference on Compilers, 2010
Proceedings of the 16th IEEE International Symposium on Asynchronous Circuits and Systems, 2010
2009
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009
Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009
2008
IEEE J. Solid State Circuits, 2008
IEEE Comput. Archit. Lett., 2008
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008
Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008
Proceedings of the Euro-Par 2008, 2008
Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008
2007
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007
Proceedings of the First International Symposium on Networks-on-Chips, 2007
Proceedings of the 2007 IEEE International Solid-State Circuits Conference, 2007
Proceedings of the 2007 International Symposium on Low Power Electronics and Design, 2007
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007
Proceedings of the 21th Annual International Conference on Supercomputing, 2007
Tradeoff between data-, instruction-, and thread-level parallelism in stream processors.
Proceedings of the 21th Annual International Conference on Supercomputing, 2007
Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007
2006
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006
Proceedings of the 33rd International Symposium on Computer Architecture (ISCA 2006), 2006
Proceedings of the 20th Annual International Conference on Supercomputing, 2006
Proceedings of the 24th International Conference on Computer Design (ICCD 2006), 2006
Proceedings of the IEEE 2006 Custom Integrated Circuits Conference, 2006
Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT 2006), 2006
2005
A 20-Gb/s 0.13-μm CMOS serial link transmitter using an LC-PLL to directly drive the output multiplexer.
IEEE J. Solid State Circuits, 2005
Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005
Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005
Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005
Proceedings of the 42nd Design Automation Conference, 2005
2004
IEEE J. Solid State Circuits, 2004
IEEE Comput. Archit. Lett., 2004
Proceedings of the 2004 workshop on Computer architecture education, 2004
Proceedings of the SPAA 2004: Proceedings of the Sixteenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2004
Proceedings of the ACM/IEEE SC2004 Conference on High Performance Networking and Computing, 2004
Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004
Proceedings of the 10th International Conference on High-Performance Computer Architecture (HPCA-10 2004), 2004
2003
IEEE/ACM Trans. Netw., 2003
IEEE J. Solid State Circuits, 2003
Jitter transfer characteristics of delay-locked loops - theories and design techniques.
IEEE J. Solid State Circuits, 2003
Proceedings of the SPAA 2003: Proceedings of the Fifteenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2003
Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computing, 2003
Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003
Proceedings of the 21st International Conference on Computer Design (ICCD 2003), 2003
Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003
Proceedings of the IEEE Custom Integrated Circuits Conference, 2003
2002
A low-power multiplying DLL for low-jitter multigigahertz clock generation in highly integrated digital chips.
IEEE J. Solid State Circuits, 2002
Proceedings of the Fourteenth Annual ACM Symposium on Parallel Algorithms and Architectures, 2002
Proceedings of the 20th International Conference on Computer Design (ICCD 2002), 2002
Proceedings of the 20th International Conference on Computer Design (ICCD 2002), 2002
Proceedings of the 20th International Conference on Computer Design (ICCD 2002), 2002
Proceedings of the 20th International Conference on Computer Design (ICCD 2002), 2002
Proceedings of the 10th Annual IEEE Symposium on High Performance Interconnects (HOTIC 2002), August 21, 2002
Proceedings of the 2002 ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, 2002
2001
Proceedings of the 2001 International Symposium on Circuits and Systems, 2001
Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA'01), 2001
Proceedings of the 38th Design Automation Conference, 2001
Digital systems engineering.
Cambridge University Press, ISBN: 978-0-521-59292-5, 2001
2000
IEEE J. Solid State Circuits, 2000
Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000
Proceedings of the High Performance Computing, Third International Symposium, 2000
Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000
Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000
Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000
Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000
Proceedings of the 2000 ACM SIGGRAPH/EUROGRAPHICS Workshop on Graphics Hardware, 2000
Proceedings of the 37th Conference on Design Automation, 2000
Proceedings of the ASPLOS-IX Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, 2000
1999
Proceedings of the 18th Conference on Advanced Research in VLSI (ARVLSI '99), 1999
1998
Proceedings of the Rendering Techniques '98, Proceedings of the Eurographics Workshop in Vienna, Austria, June 29, 1998
Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998
Proceedings of the 25th Annual International Symposium on Computer Architecture, 1998
Proceedings of the 25 Years of the International Symposia on Computer Architecture (Selected Papers)., 1998
Proceedings of the 25 Years of the International Symposia on Computer Architecture (Selected Papers)., 1998
The effects of explicitly parallel mechanisms on the multi-ALU processor cluster pipeline.
Proceedings of the International Conference on Computer Design: VLSI in Computers and Processors, 1998
1997
Extended Ehemeral Logging: Log Storage Management for Applications with Long Lived Transactions.
ACM Trans. Database Syst., 1997
1995
Thread prioritization: A thread scheduling mechanism for multiple-context parallel processors.
Future Gener. Comput. Syst., 1995
Proceedings of the Fifth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), 1995
Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture (HPCA 1995), 1995
Proceedings of the 16th Conference on Advanced Research in VLSI (ARVLSI '95), 1995
1994
SIGARCH Comput. Archit. News, 1994
The Reliable Router: A Reliable and High-Performance Communication Substrate for Parallel Computers.
Proceedings of the Parallel Computer Routing and Communication, 1994
Proceedings of the Hot Interconnects II, 1994
Proceedings of the Third International Conference on Information and Knowledge Management (CIKM'94), Gaithersburg, Maryland, USA, November 29, 1994
Proceedings of the ASPLOS-VI Proceedings, 1994
Proceedings of the Multithreaded Computer Architecture, 1994
Proceedings of the Automatic Parallelization: New Approaches to Code Generation, 1994
Issues in the Design and Implementation of Instruction Processors for Multicomputers (Position Statement).
Proceedings of the Multithreaded Computer Architecture, 1994
1993
IEEE Trans. Parallel Distributed Syst., 1993
Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, 1993
Evaluation of Mechanisms for Fine-Grained Parallel Programs in the J-Machine and the CM-5.
Proceedings of the 20th Annual International Symposium on Computer Architecture, 1993
Proceedings of the 20th Annual International Symposium on Computer Architecture, 1993
1992
IEEE Trans. Computers, 1992
The message-driven processor: a multicomputer processing node with efficient mechanisms.
IEEE Micro, 1992
Proceedings of the 19th Annual International Symposium on Computer Architecture. Gold Coast, 1992
Proceedings of the Proceedings 1992 IEEE International Conference on Computer Design: VLSI in Computer & Processors, 1992
Proceedings of the Proceedings 1992 IEEE International Conference on Computer Design: VLSI in Computer & Processors, 1992
Proceedings of the Proceedings 1992 IEEE International Conference on Computer Design: VLSI in Computer & Processors, 1992
1991
IEEE Trans. Computers, 1991
Experiences Implementing Dataflow on a General-Purpose Parallel Computer.
Proceedings of the International Conference on Parallel Processing, 1991
Proceedings of the Proceedings 1991 IEEE International Conference on Computer Design: VLSI in Computer & Processors, 1991
1990
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1990
IEEE Trans. Computers, 1990
Proceedings of the Second ACM SIGPLAN Symposium on Princiles & Practice of Parallel Programming (PPOPP), 1990
Proceedings of the 1990 IEEE International Conference on Computer Design: VLSI in Computers and Processors, 1990
1989
Proceedings of the ACM SIGPLAN'89 Conference on Programming Language Design and Implementation (PLDI), 1989
Proceedings of the PARLE '89: Parallel Architectures and Languages Europe, 1989
The J-Machine: A Fine-Gain Concurrent Computer.
Proceedings of the Information Processing 89, Proceedings of the IFIP 11th World Computer Congress, San Francisco, USA, August 28, 1989
Proceedings of the 26th ACM/IEEE Design Automation Conference, 1989
Proceedings of the ASPLOS-III Proceedings, 1989
1988
Proceedings of the 1988 ACM SIGPLAN Workshop on Object-based Concurrent Programming, 1988
Proceedings of the 15th Annual International Symposium on Computer Architecture, 1988
Mechanisms for Concurrent Computing.
Proceedings of the International Conference on Fifth Generation Computer Systems, 1988
Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications, 1988
1987
IEEE Trans. Computers, 1987
Proceedings of the 24th ACM/IEEE Design Automation Conference. Miami Beach, FL, USA, June 28, 1987
1986
1985
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1985
Proceedings of the 12th Annual Symposium on Computer Architecture, 1985