2025
PromptTSS: A Prompting-Based Approach for Interactive Multi-Granularity Time Series Segmentation.
CoRR, June, 2025
Time-IMM: A Dataset and Benchmark for Irregular Multimodal Multivariate Time Series.
CoRR, June, 2025
2024
Efficient Inference of Transformers on Bare-Metal Devices with RISC-V Vector Processors.
Proceedings of the 22nd IEEE Interregional NEWCAS Conference, 2024
TimeDRL: Disentangled Representation Learning for Multivariate Time-Series.
Proceedings of the 40th IEEE International Conference on Data Engineering, 2024
A 40-nm 13.88-TOPS/W FC-DNN Engine for 16-bit Intelligent Audio Processing Featuring Weight-Sharing and Approximate Computing.
Proceedings of the 36th IEEE Hot Chips Symposium, 2024
2023
LLM4TS: Two-Stage Fine-Tuning for Time-Series Forecasting with Pre-Trained LLMs.
CoRR, 2023
Adaptive Similarity-Aware Hyperparameter Tuners for Classification Tasks.
IEEE Access, 2023
2017
ULV-Turbo Cache for an Instantaneous Performance Boost on Asymmetric Architectures.
IEEE Trans. Very Large Scale Integr. Syst., 2017
Energy-Efficient TCAM Search Engine Design Using Priority-Decision in Memory Technology.
IEEE Trans. Very Large Scale Integr. Syst., 2017
A Flexible Wildcard-Pattern Matching Accelerator via Simultaneous Discrete Finite Automata.
IEEE Trans. Very Large Scale Integr. Syst., 2017
Leak Stopper: An Actively Revitalized Snoop Filter Architecture with Effective Generation Control.
ACM Trans. Design Autom. Electr. Syst., 2017
eTag: Tag-Comparison in Memory to Achieve Direct Data Access based on eDRAM to Improve Energy Efficiency of DRAM Cache.
IEEE Trans. Circuits Syst. I Regul. Pap., 2017
A Resistance Drift Compensation Scheme to Reduce MLC PCM Raw BER by Over 100× for Storage Class Memory Applications.
,
,
,
,
,
,
,
,
,
,
,
,
IEEE J. Solid State Circuits, 2017
A 3T1R Nonvolatile TCAM Using MLC ReRAM for Frequent-Off Instant-On Filters in IoT and Big-Data Processing.
IEEE J. Solid State Circuits, 2017
2016
High-Performance Deadlock-Free ID Assignment for Advanced Interconnect Protocols.
IEEE Trans. Very Large Scale Integr. Syst., 2016
Zero-Counting and Adaptive-Latency Cache Using a Voltage-Guardband Breakthrough for Energy-Efficient Operations.
IEEE Trans. Circuits Syst. II Express Briefs, 2016
A ReRAM-Based 4T2R Nonvolatile TCAM Using RC-Filtered Stress-Decoupled Scheme for Frequent-OFF Instant-ON Search Engines Used in IoT and Big-Data Processing.
IEEE J. Solid State Circuits, 2016
Cross-matching caches: Dynamic timing calibration and bit-level timing-failure mask caches to reduce timing discrepancies with low voltage processors.
Integr., 2016
Variable-length VLIW encoding for code size reduction in embedded processors.
Proceedings of the 29th IEEE International System-on-Chip Conference, 2016
7.4 A 256b-wordlength ReRAM-based TCAM with 1ns search-time and 14× improvement in wordlength-energyefficiency-density product using 2.5T1R cell.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2016 IEEE International Solid-State Circuits Conference, 2016
7.3 A resistance-drift compensation scheme to reduce MLC PCM raw BER by over 100× for storage-class memory applications.
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2016 IEEE International Solid-State Circuits Conference, 2016
2015
Soft-Error-Tolerant Design Methodology for Balancing Performance, Power, and Reliability.
IEEE Trans. Very Large Scale Integr. Syst., 2015
A latency-elastic and fault-tolerant cache for improving performance and reliability on low voltage operation.
Proceedings of the VLSI Design, Automation and Test, 2015
Lifetime-aware LRU promotion policy for last-level cache.
Proceedings of the VLSI Design, Automation and Test, 2015
Adaptive granularity and coordinated management for timely prefetching in multi-core systems.
Proceedings of the VLSI Design, Automation and Test, 2015
17.5 A 3T1R nonvolatile TCAM using MLC ReRAM with Sub-1ns search time.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2015 IEEE International Solid-State Circuits Conference, 2015
Energy-efficient non-volatile TCAM search engine design using priority-decision in memory technology for DPI.
Proceedings of the 52nd Annual Design Automation Conference, 2015
Low-cost low-power droop-voltage-aware delay-fault-prevention designs for DVS caches.
Proceedings of the 2015 IEEE 11th International Conference on ASIC, 2015
2014
Reconfigurable vertical profiling framework for the android runtime system.
ACM Trans. Embed. Comput. Syst., 2014
ReRAM-based 4T2R nonvolatile TCAM with 7x NVM-stress reduction, and 4x improvement in speed-wordlength-capacity for normally-off instant-on filter-based search engines used in big-data processing.
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Symposium on VLSI Circuits, 2014
Leveraging Data Lifetime for Energy-Aware Last Level Non-Volatile SRAM Caches using Redundant Store Elimination.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014
DAPs: Dynamic Adjustment and Partial Sampling for Multithreaded/Multicore Simulation.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014
2013
Variation-aware and adaptive-latency accesses for reliable low voltage caches.
Proceedings of the 21st IEEE/IFIP International Conference on VLSI and System-on-Chip, 2013
Cross-layer dynamic prefetching allocation strategies for high-performance multicores.
Proceedings of the 2013 International Symposium on VLSI Design, Automation, and Test, 2013
A configurable bus-tracer for error reproduction in post-silicon validation.
Proceedings of the 2013 International Symposium on VLSI Design, Automation, and Test, 2013
A 0.48V 0.57nJ/pixel video-recording SoC in 65nm CMOS.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2013 IEEE International Solid-State Circuits Conference, 2013
2012
A Scalable High-Performance Virus Detection Processor Against a Large Pattern Set for Embedded Network Security.
IEEE Trans. Very Large Scale Integr. Syst., 2012
NUDA: A Non-Uniform Debugging Architecture and Nonintrusive Race Detection for Many-Core Systems.
IEEE Trans. Computers, 2012
IMITATOR: A deterministic multicore replay system with refining techniques.
Proceedings of Technical Program of 2012 VLSI Design, Automation and Test, 2012
2011
Maintaining performance on power gating of microprocessor functional units by using a predictive pre-wakeup strategy.
ACM Trans. Archit. Code Optim., 2011
Hierarchical circuit-switched NoC for multicore video processing.
Microprocess. Microsystems, 2011
Load and storage balanced posting file partitioning for parallel information retrieval.
J. Syst. Softw., 2011
2010
Adaptive Pipeline voltage Scaling in High Performance Microprocessor.
J. Circuits Syst. Comput., 2010
RunAssert: A non-intrusive run-time assertion for parallel programs debugging.
Proceedings of the Design, Automation and Test in Europe, 2010
2009
VisoMT: A Collaborative Multithreading Multicore Processor for Multimedia Applications With a Fast Data Switching Mechanism.
IEEE Trans. Circuits Syst. Video Technol., 2009
An Adaptively Dividable Dual-Port BiTCAM for Virus-Detection Processors in Mobile Devices.
IEEE J. Solid State Circuits, 2009
VeriC: A semi-hardware description language to bridge the gap between ESL design and RTL models.
Proceedings of the 10th International Symposium on Quality of Electronic Design (ISQED 2009), 2009
dIP: A Non-intrusive Debugging IP for Dynamic Data Race Detection in Many-Core.
Proceedings of the 10th International Symposium on Pervasive Systems, 2009
NUDA: a non-uniform debugging architecture and non-intrusive race detection for many-core.
Proceedings of the 46th Design Automation Conference, 2009
No cache-coherence: a single-cycle ring interconnection for multi-core L1-NUCA sharing on 3D chips.
Proceedings of the 46th Design Automation Conference, 2009
2008
Tailoring circuit-switched network-on-chip to application-specific system-on-chip by two optimization schemes.
ACM Trans. Design Autom. Electr. Syst., 2008
Low-power algorithm for automatic topology generation for application-specific networks on chips.
IET Comput. Digit. Tech., 2008
2007
Efficient segment-based video transcoding proxy for mobile multimedia services.
J. Syst. Archit., 2007
Reducing Branch Misprediction Penalties Via Adaptive Pipeline Scaling.
Proceedings of the High Performance Embedded Architectures and Compilers, 2007
An Embedded Coherent-Multithreading Multimedia Processor and Its Programming Model.
Proceedings of the 44th Design Automation Conference, 2007
2006
On a design of crossroad switches for low-power on-chip communication architectures.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2006), 2006
Design of customized functional units for the VLIW-based multi-threading processor core targeted at multimedia applications.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2006), 2006
Collaborative Multithreading: An Open Scalable Processor Architecture for Embedded Multimedia Applications.
Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, 2006
Evaluation and design trade-offs between circuit-switched and packet-switched NOCs for application-specific SOCs.
Proceedings of the 43rd Design Automation Conference, 2006
Fast Run-Time Power Monitoring Methodology for Embedded Systems.
Proceedings of the 2006 International Conference on Embedded Systems & Applications, 2006
2005
Flexible Heterogeneous Multicore Architectures for Versatile Media Processing Via Customized Long Instruction Words.
IEEE Trans. Circuits Syst. Video Technol., 2005
Design techniques for single-low-V<sub>DD</sub> CMOS systems.
IEEE J. Solid State Circuits, 2005
Development of Architecture and Software Technologies in High-Performance Low-Power SoC Design.
Proceedings of the 11th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA 2005), 2005
A low-power crossroad switch architecture and its core placement for network-on-chip.
Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005
Efficient Segment-Based Video Transcoding Proxy for Mobile Multimedia Services.
Proceedings of the 2005 IEEE International Conference on Multimedia and Expo, 2005
System-Level Power-Aware Scheduling by Operation-based Prediction.
Proceedings of the 2005 International Conference on Pervasive Systems and Computing, 2005
Crossroad System-on-Chip Communication Architecture for Low Power Embedded Systems.
Proceedings of The 2005 International Conference on Embedded Systems and Applications, 2005
2004
Branch-and-bound task allocation with task clustering-based pruning.
J. Parallel Distributed Comput., 2004
Scalable locality-aware event dispatching mechanism for network servers.
IEE Proc. Softw., 2004
A parameterized power-aware IP core generator for the 2-D 8×8 DCT/IDCT.
Proceedings of the 2004 International Symposium on Circuits and Systems, 2004
A power-aware IP core generator for the one-dimensional discrete Fourier transform.
Proceedings of the 2004 International Symposium on Circuits and Systems, 2004
Unified bus encoding by stream reconstruction with variable strides.
Proceedings of the 2004 International Symposium on Circuits and Systems, 2004
A power-aware IP core design for the variable-length DCT/IDCT targeting at MPEG4 shape-adaptive transforms.
Proceedings of the 2004 International Symposium on Circuits and Systems, 2004
2003
Variable-size data item placement for load and storage balancing.
J. Syst. Softw., 2003
Energy Efficient Caching-on-Cache Architectures for Embedded Systems.
J. Inf. Sci. Eng., 2003
Inverted file compression through document identifier reassignment.
Inf. Process. Manag., 2003
Flexible Heterogeneous Multicore Architectures for Media Processing via Customized Long Instruction Words.
Proceedings of the IFIP VLSI-SoC 2003, 2003
A Tree-Based inverted File for Fast Ranked-Document Retrieval.
Proceedings of the International Conference on Information and Knowledge Engineering. IKE'03, June 23, 2003
2002
Decoupling of data and tag arrays for on-chip caches.
Microprocess. Microsystems, 2002
Posting file partitioning and parallel information retrieval.
J. Syst. Softw., 2002
Dynamic voltage leveling scheduling for real-time embedded systems on low-power variable speed processors.
Proceedings of the International Conference on Compilers, 2002
2001
Compressing inverted files in scalable information systems by binary decision diagram encoding .
Proceedings of the 2001 ACM/IEEE conference on Supercomputing, 2001
2000
Dynamic memory management for real-time embedded Java chips.
Proceedings of the 7th International Workshop on Real-Time Computing and Applications Symposium (RTCSA 2000), 2000
1999
Segmented bus design for low-power systems.
IEEE Trans. Very Large Scale Integr. Syst., 1999
SBA: a server-initiated playback scheme supporting variable bit rate control.
IEEE Trans. Consumer Electron., 1999
1998
Supporting Highly-Speculative Execution via Adaptive Branch Trees.
Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, Las Vegas, Nevada, USA, January 31, 1998
1997
Reducing memory penalty by a programmable prefetch engine for on-chip caches.
Microprocess. Microsystems, 1997
1996
Techniques for The Efficient Analysis of Cache Performance.
J. Inf. Sci. Eng., 1996
Efficient trace-sampling simulation techniques for cache performance analysis.
Proceedings of the Proceedings 29st Annual Simulation Symposium (SS '96), 1996
1995
Effective Hardware Based Data Prefetching for High-Performance Processors.
IEEE Trans. Computers, 1995
An effective programmable prefetch engine for on-chip caches.
Proceedings of the 28th Annual International Symposium on Microarchitecture, Ann Arbor, Michigan, USA, November 29, 1995
1994
A Performance Study of Software and Hardware Data Prefetching Schemes.
Proceedings of the 21st Annual International Symposium on Computer Architecture. Chicago, 1994
An Evaluation of Hardware and Software Data Prefetching.
Proceedings of the Applications in Parallel and Distributed Computing, 1994
1992
Reducing Memory Latency via Non-blocking and Prefetching Caches.
Proceedings of the ASPLOS-V Proceedings, 1992
1991
An effective on-chip preloading scheme to reduce data access penalty.
Proceedings of the Proceedings Supercomputing '91, 1991