Indirect memory accesses frequently appear in applications where memory bandwidth is a critical bottleneck. Prior indirect memory access pro
Indirect memory accesses frequently appear in applications where memory bandwidth is a critical bottleneck. Prior indirect memory access proposals, such as indirect prefetchers, runahead execution, fetchers, and decoupled access/execute architectures, primarily focus on improving memory access latency by loading data ahead of computation but still rely on the DRAM controllers to reorder memory requests and enhance memory bandwidth utilization. DRAM controllers have limited visibility to future memory accesses due to the small capacity of request buffers and the restricted memory-level parallelism of conventional core and memory systems. We introduce DX100, a programmable data access accelerator for indirect memory accesses. DX100 is shared across cores to offload bulk indirect memory accesses and associated address calculation operations. DX100 reorders, interleaves, and coalesces memory requests to improve DRAM row-buffer hit rate and memory bandwidth utilization. DX100 provides a general-purpose ISA to support diverse access types, loop patterns, conditional accesses, and address calculations. To support this accelerator without significant programming efforts, we discuss a set of MLIR compiler passes that automatically transform legacy code to utilize DX100. Experimental evaluations on 12 benchmarks spanning scientific computing, database, and graph applications show that DX100 achieves performance improvements of 2.6x over a multicore baseline and 2.0x over the state-of-the-art indirect prefetcher. Comment: Proceedings of the 52nd Annual International Symposium on Computer Architecture (ISCA 2025)