A novel RISC-V core for the networking processing processor with bit-level custom instructions and thread-aware fetching architecture.

Saved in:
Bibliographic Details
Title: A novel RISC-V core for the networking processing processor with bit-level custom instructions and thread-aware fetching architecture.
Authors: Chen, Jiakun1 (AUTHOR) 220246696@seu.edu.cn, Fu, Yuanming1 (AUTHOR) 220246773@seu.edu.cn, Lian, Yuyu1 (AUTHOR), Han, Jianhui2 (AUTHOR) han.jianhui@sanechips.com.cn, Pi, Jianyuan2 (AUTHOR) pi.jianyuan@sanechips.com.cn, Ling, Ming1 (AUTHOR) trio@seu.edu.cn
Source: Integration: The VLSI Journal. Jul2026, Vol. 109, pN.PAG-N.PAG. 1p.
Subjects: Simultaneous multithreading processors, High performance processors, Field programmable gate arrays, Data packeting
Abstract: This work presents a RISC-V-based network processor core augmented with bit-level custom instructions and an interleaved multithreading architecture to address the requirements of high-performance and flexible packet processing. For representative Switch.P4 workloads, the proposed bit-level instructions reduce the dynamic instruction count by up to 75.2% and the overall execution time by up to 72.5% compared with the conventional RV32I implementations. The architecture further integrates a thread-aware instruction prefetching mechanism while preserving full compatibility with the standard C toolchain. FPGA-based prototyping demonstrates that the proposed design achieves a 30.74% improvement in performance and a 19.02% gain in area efficiency over the baseline design without thread-aware fetch optimization. Moreover, power analysis in the TSMC 12 nm process at 1 GHz shows that the fetch filtering mechanism (a core component of the thread-aware architecture) reduces the total power consumption of the core by 6.4%, achieving a balanced optimization of performance, area and power. • A novel RISC-V network processor core with bit-level custom instructions. • Proposed thread-aware instruction prefetching improves fetch efficiency. • Bit-level instructions reduce instruction count by up to 75.2%. • FPGA prototyping shows 30.74% performance and 19.02% area efficiency gains. • Compatible with standard RISC-V toolchain and P4 workloads. [ABSTRACT FROM AUTHOR]
Copyright of Integration: The VLSI Journal is the property of Elsevier B.V. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
Description
Abstract:This work presents a RISC-V-based network processor core augmented with bit-level custom instructions and an interleaved multithreading architecture to address the requirements of high-performance and flexible packet processing. For representative Switch.P4 workloads, the proposed bit-level instructions reduce the dynamic instruction count by up to 75.2% and the overall execution time by up to 72.5% compared with the conventional RV32I implementations. The architecture further integrates a thread-aware instruction prefetching mechanism while preserving full compatibility with the standard C toolchain. FPGA-based prototyping demonstrates that the proposed design achieves a 30.74% improvement in performance and a 19.02% gain in area efficiency over the baseline design without thread-aware fetch optimization. Moreover, power analysis in the TSMC 12 nm process at 1 GHz shows that the fetch filtering mechanism (a core component of the thread-aware architecture) reduces the total power consumption of the core by 6.4%, achieving a balanced optimization of performance, area and power. • A novel RISC-V network processor core with bit-level custom instructions. • Proposed thread-aware instruction prefetching improves fetch efficiency. • Bit-level instructions reduce instruction count by up to 75.2%. • FPGA prototyping shows 30.74% performance and 19.02% area efficiency gains. • Compatible with standard RISC-V toolchain and P4 workloads. [ABSTRACT FROM AUTHOR]
ISSN:01679260
DOI:10.1016/j.vlsi.2026.102749