J.S.S.M. Wong | TU Delft Repository

BCIM

Efficient Implementation of Binary Neural Network Based on Computation in Memory

Journal article (2024) - M.Z. Zahedi (author), Taha Shahroodi (author), Carlos Escuin (author), GN Gaydadjiev (author), G. Gaydadjiev (author), GN Gaydadjiev (author), G. Gaydadjiev (author), J.S.S.M. Wong (author), Said Hamdioui (author), S. Hamdioui (author)

Applications of Binary Neural Networks (BNNs) are promising for embedded systems with hard constraints on energy and computing power. Contrary to conventional neural networks using floating-point datatypes, BNNs use binarized weights and activations to reduce memory and computati ...

High-Performance Data Mapping for BNNs on PCM-Based Integrated Photonics

Conference paper (2024) - Taha Shahroodi (author), Raphael Cardoso (author), J.S.S.M. Wong (author), Alberto Bosio (author), Ian O'Connor (author), S. Hamdioui (author), Said Hamdioui (author)

State-of-the-Art (SotA) hardware implementations of Deep Neural Networks (DNNs) incur high latencies and costs. Binary Neural Networks (BNNs) are potential alternative solutions to realize faster implementations without losing accuracy. In this paper, we first present a new data ...

Efficient Signed Arithmetic Multiplication on Memristor-based Crossbar

Journal article (2023) - M.Z. Zahedi (author), Taha Shahroodi (author), J.S.S.M. Wong (author), S. Hamdioui (author), Said Hamdioui (author)

The vast potential of memristor-based computation-in-memory (CIM) engines has mainly triggered the mapping of best-suited applications. Nevertheless, with additional support, existing applications can also benefit from CIM. In particular, this paper proposes an energy and area-ef ...

SparseMEM

Energy-efficient Design for In-memory Sparse-based Graph Processing

Conference paper (2023) - M.Z. Zahedi (author), Geert Custers (author), Taha Shahroodi (author), G. Gaydadjiev (author), GN Gaydadjiev (author), G. Gaydadjiev (author), GN Gaydadjiev (author), J.S.S.M. Wong (author), S. Hamdioui (author), Said Hamdioui (author)

Performing analysis on large graph datasets in an energy-efficient manner has posed a significant challenge; not only due to excessive data movements and poor locality, but also due to the non-optimal use of high sparsity of such datasets. The latter leads to a waste of resources ...

Lightspeed Binary Neural Networks using Optical Phase-Change Materials

Conference paper (2023) - Taha Shahroodi (author), Rafaela Cardoso (author), M.Z. Zahedi (author), J.S.S.M. Wong (author), Alberto Bosio (author), Ian O'Connor (author), Said Hamdioui (author), S. Hamdioui (author)

This paper investigates the potential of a compute-in-memory core based on optical Phase Change Materials (oPCMs) to speed up and reduce the energy consumption of the Matrix-Matrix-Multiplication operation. The paper also proposes a new data mapping for Binary Neural Networks (BN ...

SieveMem: A Computation-in-Memory Architecture for Fast and Accurate Pre-Alignment

Conference paper (2023) - Taha Shahroodi (author), Michael Miao (author), M.Z. Zahedi (author), J.S.S.M. Wong (author), S. Hamdioui (author), Said Hamdioui (author)

The high execution time of DNA sequence alignment negatively affects many genomic studies that rely on sequence alignment results. Pre-alignment filtering was introduced as a step before alignment to reduce the execution time of short-read sequence alignment greatly. With its suc ...

System Design for Computation-in-Memory

From Primitive to Complex Functions

Conference paper (2022) - M.Z. Zahedi (author), Taha Shahroodi (author), Geert Custers (author), A. Singh (author), J.S.S.M. Wong (author), S. Hamdioui (author), Said Hamdioui (author)

In recent years, we are witnessing a trend moving away from conventional computer architectures towards Computation-In-Memory (CIM) based on emerging memristor devices. This is due to the fact that the performance and energy efficiency of traditional computer architectures can no ...

MNEMOSENE

Tile Architecture and Simulator for Memristor-based Computation-in-memory

Journal article (2022) - M.Z. Zahedi (author), M.F.M. Abu Lebdeh (author), Christopher Bengel (author), Dirk Wouters (author), S. Menzel (author), Manuel Le Gallo (author), Abu Sebastian (author), J.S.S.M. Wong (author), S. Hamdioui (author), Said Hamdioui (author)

In recent years, we are witnessing a trend toward in-memory computing for future generations of computers that differs from traditional von-Neumann architecture in which there is a clear distinction between computing and memory units. Considering that data movements between the c ...

In recent years, we are witnessing a trend toward in-memory computing for future generations of computers that differs from traditional von-Neumann architecture in which there is a clear distinction between computing and memory units. Considering that data movements between the central processing unit (CPU) and memory consume several orders of magnitude more energy compared to simple arithmetic operations in the CPU, in-memory computing will lead to huge energy savings as data no longer needs to be moved around between these units. In an initial step toward this goal, new non-volatile memory technologies, e.g., resistive RAM (ReRAM) and phase-change memory (PCM), are being explored. This has led to a large body of research that mainly focuses on the design of the memory array and its peripheral circuitry. In this article, we mainly focus on the tile architecture (comprising a memory array and peripheral circuitry) in which storage and compute operations are performed in the (analog) memory array and the results are produced in the (digital) periphery. Such an architecture is termed compute-in-memory-periphery (CIM-P). More precisely, we derive an abstract CIM-tile architecture and define its main building blocks. To bridge the gap between higher-level programming languages and the underlying (analog) circuit designs, an instruction-set architecture is defined that is intended to control and, in turn, sequence the operations within this CIM tile to perform higher-level more complex operations. Moreover, we define a procedure to pipeline the CIM-tile operations to further improve the performance. To simulate the tile and perform design space exploration considering different technologies and parameters, we introduce the fully parameterized first-of-its-kind CIM tile simulator and compiler. Furthermore, the compiler is technology-aware when scheduling the CIM-tile instructions. Finally, using the simulator, we perform several preliminary design space explorations regarding the three competing technologies, ReRAM, PCM, and STT-MRAM concerning CIM-tile parameters, e.g., the number of ADCs. Additionally, we investigate the effect of pipelining in relation to the clock speeds of the digital periphery assuming the three technologies. In the end, we demonstrate that our simulator is also capable of reporting energy consumption for each building block within the CIM tile after the execution of in-memory kernels considering the data-dependency on the energy consumption of the memory array. All the source codes are publicly available.

@en

KrakenOnMem

A Memristor-Augmented HW/SW Framework for Taxonomic Profiling

Conference paper (2022) - Taha Shahroodi (author), M.Z. Zahedi (author), A. Singh (author), J.S.S.M. Wong (author), S. Hamdioui (author), Said Hamdioui (author)

State-of-the-art taxonomic profilers that comprise the first step in larger-context metagenomic studies have proven to be computationally intensive, i.e., while accurate, they come at the cost of high latency and energy consumption. Table Lookup operation is a primary bottleneck ...

Demeter

A Fast and Energy-Efficient Food Profiler Using Hyperdimensional Computing in Memory

Journal article (2022) - Taha Shahroodi (author), M.Z. Zahedi (author), Can Firtina (author), Mohammed Alser (author), J.S.S.M. Wong (author), Onur Mutlu (author), Said Hamdioui (author), S. Hamdioui (author)

Food profiling is an essential step in any food monitoring system needed to prevent health risks and potential frauds in the food industry. Significant improvements in sequencing technologies are pushing food profiling to become the main computational bottleneck. State-of-the-art ...

FPGA-based Deep Learning Accelerator for RF Applications

Conference paper (2021) - H. den Boer (author), R.W.D. Muller (author), J.S.S.M. Wong (author), V. Voogt (author)

A key obstacle within the design of cognitive radios has always been the spectrum sensing component that implements the function automatic modulation classification (AMC). With the transition to software-defined radios (SDRs) followed by the introduction of field-programmable gat ...

Tile Architecture and Hardware Implementation for Computation-in-Memory

Conference paper (2021) - M.Z. Zahedi (author), Remon van Duijnen (author), J.S.S.M. Wong (author), Said Hamdioui (author), S. Hamdioui (author)

Computation-in-memory (CIM) shows great promise for specific applications by employing emerging (non-volatile) memory technologies such as memristors for both storage and compute, greatly reducing energy consumption, and improving performance. Based on our own observations, we ca ...

Efficient organization of digital periphery to support integer datatype for memristor-based cim

Conference paper (2020) - M.Z. Zahedi (author), M. Mayahinia (author), M.F.M. Abu Lebdeh (author), J.S.S.M. Wong (author), S. Hamdioui (author), Said Hamdioui (author)

Von Neumann-based architectures suffer from costly communication between CPU and memory. This communication imposes several orders of magnitude more power and performance overheads compared to the arithmetic operations performed by the processor. This overhead becomes critical fo ...

Machine Learning-Based Processor Adaptability Targeting Energy, Performance, and Reliability

Conference paper (2019) - A.L. Sartor (author), Pedro Henrique Exenberger Becker (author), J.S.S.M. Wong (author), Radu Marculescu (author), Antonio Carlos Schneider Beck (author)

Adaptive processors can dynamically change their hardware configuration by tuning several knobs that optimize a given metric, according to the current application. However, the complexity of choosing the best setup at runtime increases exponentially as more adaptive resources bec ...

CIM-SIM

Computation in Memory SIMuIator

Conference paper (2019) - Ali Banagozar (author), K. Vadivel (author), Sander Stuijk (author), Henk Corporaal (author), J.S.S.M. Wong (author), M.F.M. Abu Lebdeh (author), J. Yu (author), Said Hamdioui (author), S. Hamdioui (author)

Computation-in-memory reverses the trend in von-Neumann processors by bringing the computation closer to the data, to even within the memory array, as opposed to introducing new memory hierarchies to keep (frequently used) data closer to a central processing unit (CPU). In recent ...

Memristive Device Based Circuits for Computation-in-Memory Architectures

Conference paper (2019) - M.F.M. Abu Lebdeh (author), Uljana Reinsalu (author), H.A. Du Nguyen (author), J.S.S.M. Wong (author), S. Hamdioui (author), Said Hamdioui (author)

Emerging computing applications (such as big-data and Internet-of-things) are extremely demanding in terms of storage, energy and computational efficiency, while today’s architectures and device technologies are facing major challenges making them incapable to meet these demands. ...

Dynamic Trade-off among Fault Tolerance, Energy Consumption, and Performance on a Multiple-issue VLIW Processor

Journal article (2018) - A.L. Sartor (author), Pedro H. E. Becker (author), J.J. Hoozemans (author), J.S.S.M. Wong (author), A.C. Schneider Beck Filho (author)

In the design of modern-day processors, energy consumption and fault tolerance have gained significant importance next to performance. This is caused by battery constraints, thermal design limits, and higher susceptibility to errors as transistor feature sizes are decreasing. How ...

A low-cost BRAM-Based function reuse for configurable soft-core processors in FPGAs

Conference paper (2018) - Pedro H. E. Becker (author), A.L. Sartor (author), Marcelo Brandalero (author), T. Trevisan Jost (author), J.S.S.M. Wong (author), Luigi Carro (author), Antonio C.S. Beck (author)

Many modern FPGA-based soft-processor designs must include dedicated hardware modules to satisfy the requirements of a wide range of applications. Not seldom they all do not fit in the FPGA target, so their functionalities must be mapped into the much slower software domain. Howe ...

Evaluating Auto-adaptation Methods for Fine-grained Adaptable Processors

Conference paper (2018) - J.J. Hoozemans (author), J. van Straten (author), Zaid Al-Ars (author), Z. Al-Ars (author), J.S.S.M. Wong (author)

To achieve energy savings while maintaining adequate performance, system designers and programmers wish to create the best possible match between program behavior and the underlying hardware. Well-known current approaches include DVFS and task migrations in heterogeneous platform ...

ISA-DTMR

Selective Protection in Configurable Heterogeneous Multicores

Conference paper (2018) - Augusto G. Erichsen (author), A.L. Sartor (author), J. Dellagostin Souza (author), Monica M. Pereira (author), J.S.S.M. Wong (author), Antonio C.S. Beck (author)

The well-known Triple Modular Redundancy (TMR), when applied to processors to mitigate the occurrence of faults, implies that all applications have the same level of criticality (since they are all equally protected) and are executed in a homogeneous environment, which naturally ...