John Paul Shen

According to our database1, John Paul Shen authored at least 122 papers between 1980 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
TNNGen: Automated Design of Neuromorphic Sensory Processing Units for Time-Series Clustering.
IEEE Trans. Circuits Syst. II Express Briefs, May, 2024

Corrigendum: SemNet: Learning semantic attributes for human activity recognition with deep belief networks.
Frontiers Big Data, 2024

Realtime Person Identification via Gait Analysis.
CoRR, 2024

OzMAC: An Energy-Efficient Sparsity-Exploiting Multiply-Accumulate-Unit Design for DL Inference.
CoRR, 2024

Commercial Evaluation of Zero-Skipping MAC Design for Bit Sparsity Exploitation in DL Inference.
Proceedings of the 32nd IFIP/IEEE International Conference on Very Large Scale Integration, 2024

NeRTCAM: CAM-Based CMOS Implementation of Reference Frames for Neuromorphic Processors.
Proceedings of the Neuro Inspired Computational Elements Conference, 2024

Exploration of Unary Arithmetic-Based Matrix Multiply Units for Low Precision DL Accelerators.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2024

Realtime Person Identification via Gait Analysis Using IMU Sensors on Edge Devices.
Proceedings of the International Conference on Neuromorphic Systems, 2024

TNN-CIM: An In-SRAM CMOS Implementation of TNN-Based Synaptic Arrays with STDP Learning.
Proceedings of the 6th IEEE International Conference on AI Circuits and Systems, 2024

2023
IDIoT: Multimodal Framework for Ubiquitous Identification and Assignment of Human-carried Wearable Devices.
ACM Trans. Internet Things, May, 2023

tubGEMM: Energy-Efficient and Sparsity-Effective Temporal-Unary-Binary Based Matrix Multiply Unit.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2023

tuGEMM: Area-Power-Efficient Temporal Unary GEMM Architecture for Low-Precision Edge AI.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2023

2022
SemNet: Learning semantic attributes for human activity recognition with deep belief networks.
Frontiers Big Data, 2022

Towards a Design Framework for TNN-Based Neuromorphic Sensory Processing Units.
CoRR, 2022

TNN7: A Custom Macro Suite for Implementing Highly Optimized Designs of Neuromorphic TNNs.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2022

2021
A Microarchitecture Implementation Framework for Online Learning with Temporal Neural Networks.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2021

Unsupervised Clustering of Time Series Signals Using Neuromorphic Energy-Efficient Temporal Neural Networks.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Generating Realistic Ride-Hailing Datasets Using GANs.
ACM Trans. Spatial Algorithms Syst., 2020

Direct CMOS Implementation of Neuromorphic Temporal Neural Networks for Sensory Processing.
CoRR, 2020

Sky Segmentation for Enhanced Depth Reconstruction and Bokeh Rendering with Efficient Architectures.
Proceedings of the Computational Imaging XVIII, Burlingame, 2020

2019
Deep Speaker Embedding for Speaker-Targeted Automatic Speech Recognition.
Proceedings of the NLPIR 2019: The 3rd International Conference on Natural Language Processing and Information Retrieval, Tokushima, Japan, June 28, 2019

AttriNet: learning mid-level features for human activity recognition with deep belief networks.
Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers, 2019

Audio-visual TED corpus: enhancing the TED-LIUM corpus with facial information, contextual text and object recognition.
Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers, 2019

2018
Improving Bag-Of-Words: Capturing Local Information for Motion-Based Activity Recognition.
Proceedings of the 2018 ACM International Joint Conference and 2018 International Symposium on Pervasive and Ubiquitous Computing and Wearable Computers, 2018

2017
On the Real-time Vehicle Placement Problem.
CoRR, 2017

SurfaceVibe: vibration-based tap & swipe tracking on ubiquitous surfaces.
Proceedings of the 16th ACM/IEEE International Conference on Information Processing in Sensor Networks, 2017

Data Driven Analysis of the Potentials of Dynamic Ride Pooling.
Proceedings of the 10th ACM SIGSPATIAL Workshop on Computational Transportation Science, 2017

Space-Time Graph Modeling of Ride Requests Based on Real-World Data.
Proceedings of the Workshops of the The Thirty-First AAAI Conference on Artificial Intelligence, 2017

2008
Mitosis: A Speculative Multithreaded Processor Based on Precomputation Slices.
IEEE Trans. Parallel Distributed Syst., 2008

2006
Die Stacking (3D) Microarchitecture.
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

Multiple Instruction Stream Processor.
Proceedings of the 33rd International Symposium on Computer Architecture (ISCA 2006), 2006

2005
Mitigating Amdahl's Law through EPI Throttling.
Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005

A Dependency Chain Clustered Microarchitecture.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

2004
A case for shared instruction cache on chip multiprocessors running OLTP.
SIGARCH Comput. Archit. News, 2004

Helper Threads via Virtual Multithreading.
IEEE Micro, 2004

Best of Both Latency and Throughput.
Proceedings of the 22nd IEEE International Conference on Computer Design: VLSI in Computers & Processors (ICCD 2004), 2004

Hardware Support for Prescient Instruction Prefetch.
Proceedings of the 10th International Conference on High-Performance Computer Architecture (HPCA-10 2004), 2004

Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors.
Proceedings of the 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004

Helper threads via virtual multithreading on an experimental itanium<sup>®</sup> 2 processor-based platform.
Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems, 2004

2003
A framework for modeling and optimization of prescient instruction prefetch.
Proceedings of the International Conference on Measurements and Modeling of Computer Systems, 2003

Scaling and Charact rizing Database Workloads: Bridging the Gap between Research and Practice.
Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003

2002
Post-Pass Binary Adaptation for Software-Based Speculative Precomputation.
Proceedings of the 2002 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2002

Branch Behavior of a Commercial OLTP Workload on Intel IA32 Processors.
Proceedings of the 20th International Conference on Computer Design (ICCD 2002), 2002

Memory Latency-Tolerance Approaches for Itanium Processors: Out-of-Order Execution vs. Speculative Precomputation.
Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA'02), 2002

Non-Vital Loads.
Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA'02), 2002

Quantitative Evaluation of the Register Stack Engine and Optimizations for Future Itanium Processors.
Proceedings of the 6th Annual Workshop on Interaction between Compilers and Computer Architecture (INTERACT-6 2002), 2002

2001
Coming challenges in microarchitecture and architecture.
Proc. IEEE, 2001

Dynamic speculative precomputation.
Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001

Speculative precomputation: long-range prefetching of delinquent loads.
Proceedings of the 28th Annual International Symposium on Computer Architecture, 2001

Clear and Present Tensions in Microprocessor Design.
Proceedings of the 19th International Conference on Computer Design (ICCD 2001), 2001

Parallel Cachelets.
Proceedings of the 19th International Conference on Computer Design (ICCD 2001), 2001

Register Renaming and Scheduling for Dynamic Execution of Predicated Code.
Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA'01), 2001

Relating buffer-oriented microarchitecture validation to high-level pipeline functionality.
Proceedings of the Sixth IEEE International High-Level Design Validation and Test Workshop 2001, 2001

2000
A Buffer-Oriented Methodology for Microarchitecture Validation.
J. Electron. Test., 2000

Effectiveness of Microarchitecture Test Program Generation.
IEEE Des. Test Comput., 2000

PipeRench implementation of the instruction path coprocessor.
Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000

Completion time multiple branch prediction for enhancing trace cache performance.
Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000

Instruction path coprocessors.
Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000

1999
An integrated functional performance simulator.
IEEE Micro, 1999

Superscalar Processor Validation at the Microarchitecture Level.
Proceedings of the 12th International Conference on VLSI Design (VLSI Design 1999), 1999

System-Level Issues for Software Thread Integration: Guest Triggering and Host Selection.
Proceedings of the 20th IEEE Real-Time Systems Symposium, 1999

The Block-Based Trace Cache.
Proceedings of the 26th Annual International Symposium on Computer Architecture, 1999

Reducing branch misprediction penalties via dynamic control independence detection.
Proceedings of the 13th international conference on Supercomputing, 1999

Mispredicted Path Cache Effects.
Proceedings of the Euro-Par '99 Parallel Processing, 5th International Euro-Par Conference, Toulouse, France, August 31, 1999

1998
Exploiting Value Locality to Exceed the Dataflow Limit.
Int. J. Parallel Program., 1998

Calibration of Microprocessor Performance Models.
Computer, 1998

Techniques for Software Thread Integration in Real-Time Embedded Systems.
Proceedings of the 19th IEEE Real-Time Systems Symposium, 1998

Load Execution Latency Reduction.
Proceedings of the 12th international conference on Supercomputing, 1998

Hardware to Software Migration with Real-Time Thread Integration.
Proceedings of the 24th EUROMICRO '98 Conference, 1998

Efficacy and Performance Impact of Value Prediction.
Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques, 1998

1997
Post-pass partitioning of signal processing programs.
Int. J. Parallel Program., 1997

Superspeculative Microarchitecture for Beyond AD 2000.
Computer, 1997

Compiler Support for Low-Cost Synchronization Among Threads.
Proceedings of the Parallel Computing: Fundamentals, 1997

A Framework for Statistical Modeling of Superscalar Processor Performance.
Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture (HPCA '97), 1997

The Performance Potential of Value and Dependence Prediction.
Proceedings of the Euro-Par '97 Parallel Processing, 1997

A Realistic Study on Multithreaded Superscalar Processor Design.
Proceedings of the Euro-Par '97 Parallel Processing, 1997

1996
Exceeding the Dataflow Limit via Value Prediction.
Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, 1996

Can Trace-Driven Simulators Accurately Predict Superscalar Performance?
Proceedings of the 1996 International Conference on Computer Design (ICCD '96), 1996

Value Locality and Load Value Prediction.
Proceedings of the ASPLOS-VII Proceedings, 1996

The Intrinsic Bandwidth Requirements of Ordinary Programs.
Proceedings of the ASPLOS-VII Proceedings, 1996

Automatic partitioning of signal processing programs for symmetric multiprocessors.
Proceedings of the Fifth International Conference on Parallel Architectures and Compilation Techniques, 1996

1995
VMW: A Visualization-Based Microarchitecture Workbench.
Computer, 1995

A limit study of local memory requirements using value reuse profiles.
Proceedings of the 28th Annual International Symposium on Microarchitecture, Ann Arbor, Michigan, USA, November 29, 1995

Performance Evaluation of the PowerPC 620 Microarchitecture.
Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995

Systematic Validation of Pipeline Interlock for Superscalar Microarchitectures.
Proceedings of the Digest of Papers: FTCS-25, 1995

1994
Exploiting Instruction-Level Parallelism for Integrated Control-Flow Monitoring.
IEEE Trans. Computers, 1994

Theoretical modeling of superscalar processor performance.
Proceedings of the 27th Annual International Symposium on Microarchitecture, San Jose, California, USA, November 30, 1994

Speculative Disambiguation: A Compilation Technique for Dynamic Memory Disambiguation.
Proceedings of the 21st Annual International Symposium on Computer Architecture. Chicago, 1994

A PDG-based Tool and its Use in Analyzing Program Control Dependences.
Proceedings of the Parallel Architectures and Compilation Techniques, 1994

1993
Instruction-level experimental evaluation of the Multiflow TRACE 14/300 VLIW computer.
J. Supercomput., 1993

EXPLORER: a retargetable and visualization-based trace-driven simulator for superscalar processors.
Proceedings of the 26th Annual International Symposium on Microarchitecture, 1993

Balancing Fine- and Medium-Grained Parallelism in Scheduling Loops for the XIMD Architecture.
Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism, 1993

Architecture-Compatible Code Boosting for Performance Enhancement of the IBM RS/6000.
Proceedings of the Proceedings 1993 International Conference on Computer Design: VLSI in Computers & Processors, 1993

1992
Direct Methods for Synthesis of Self-Monitoring State Machines.
Proceedings of the Digest of Papers: FTCS-22, 1992

1991
An Instruction-Level Performance Analysis of the Multiflow TRACE 14/300.
Proceedings of the 24th Annual IEEE/ACM International Symposium on Microarchitecture, 1991

Implementation Optimization Techniques for Architecture Synthesis of Application-Specific Processors.
Proceedings of the 24th Annual IEEE/ACM International Symposium on Microarchitecture, 1991

Instruction Level Profiling and Evaluation of the IBM/6000.
Proceedings of the 18th Annual International Symposium on Computer Architecture. Toronto, 1991

Exploiting Instruction-Level Resource Parallelism for Transparent, Integrated Control-Flow Monitoring.
Proceedings of the 1991 International Symposium on Fault-Tolerant Computing, 1991

A Variable Instruction Stream Extension to the VLIW Architecture.
Proceedings of the ASPLOS-IV Proceedings, 1991

1990
Continuous signature monitoring: low-cost concurrent detection of processor control errors.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1990

Evaluation and Synthesis of Self-Monitoring State Machines.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 1990

Architecture Synthesis of High-Performance Application-Specific Processors.
Proceedings of the 27th ACM/IEEE Design Automation Conference. Orlando, 1990

1988
A CMOS fault extractor for inductive fault analysis.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1988

Flexible processors: a promising application-specific processor design approach.
Proceedings of the 21st Annual Workshop and Symposium on Microprogramming and Microarchitecture, 1988, San Diego, California, USA, November 28, 1988

Organization of array data for concurrent memory access.
Proceedings of the 21st Annual Workshop and Symposium on Microprogramming and Microarchitecture, 1988, San Diego, California, USA, November 28, 1988

Continuous Signature Monitoring: Efficient Concurrent-Detection of Processor Control Errors.
Proceedings of the Proceedings International Test Conference 1988, 1988

Extraction and Simulation of Realistic CMOS Faults Using Inductive Fault Analysis.
Proceedings of the Proceedings International Test Conference 1988, 1988

The White Dwarf: A High-Performance Application-Specific Processor.
Proceedings of the 15th Annual International Symposium on Computer Architecture, 1988

1987
Processor Control Flow Monitoring Using Signatured Instruction Streams.
IEEE Trans. Computers, 1987

Interprocessor Traffic Scheduling Algorithm for Multiple-Processor Networks.
IEEE Trans. Computers, 1987

1986
Fault-tolerance and performance analysis of beta-networks.
Parallel Comput., 1986

Highlights of CMU Research on CAD, CAM, CAT of VLSI Circuits.
Proceedings of the Fall Joint Computer Conference, November 2-6, 1986, Dallas, Texas, USA, 1986

1985
Inductive Fault Analysis of MOS Integrated Circuits.
IEEE Des. Test, 1985

Automated Design for Testability of Semicustom Integrated Circuits.
Proceedings of the Proceedings International Test Conference 1985, 1985

1984
Fault-Tolerance of Dynamic-Full-Access Interconnection Networks.
IEEE Trans. Computers, 1984

The Design of Easily Tastabel VLSI Array Multipliers.
IEEE Trans. Computers, 1984

Systematic Characterization of Physical Defects for Fault Analysis of MOS IC Cells.
Proceedings of the Proceedings International Test Conference 1984, 1984

1983
On-Line Self-Monitoring Using Signatured Instruction Streams.
Proceedings of the Proceedings International Test Conference 1983, 1983

Easily-Testable (N, K) Shuffle/Exchange Networks.
Proceedings of the International Conference on Parallel Processing, 1983

The design of two easily-testable VLSI array multipliers.
Proceedings of the 6th IEEE Symposium on Computer Arithmetic, 1983

1982
Fault tolerance analysis of several interconnection networks.
Proceedings of the International Conference on Parallel Processing, 1982

1980
Fault Tolerance of a Class of Connecting Networks.
Proceedings of the 7th Annual Symposium on Computer Architecture, 1980


  Loading...