HIP parallel primitives for developing performant GPU-accelerated code on ROCm
https://github.com/ROCm/rocPRIM