Memory Hierarchy Visibility in Parallel Programming Languages — Paul Keir, Codeplay Research
The choice as to which levels in a memory hierarchy are exposed within a programming language or API can be critical. Expose too many, and you risk programmability, and performance portability.
Heterogeneous computing and GPGPU aims to repurpose the data-parallel capability of graphics and commodity hardware for general calculations. GPGPU APIs, which now include OpenCL SYCL; Apple's Metal; and Qualcomm's MARE; must all decide on a suitable abstraction for hardware memory levels. Established GPGPU APIs such as CUDA, C++AMP, and OpenCL offer language support for four levels of volatile memory. However, while the presence of GPUs are now essentially ubiquitous, the diminished role of discrete graphi