KokkosEdit

Kokkos is a C++ library ecosystem designed to write performance-portable code for modern high-performance computing (HPC) architectures. Originating in government-funded research circles, it provides a single programming model that abstracts both where work is executed (execution spaces) and where data is stored (memory spaces), so that code can run efficiently on multi-core CPUs, GPUs, and future accelerators without being rewritten for every new architecture. The project is openly developed and widely used in scientific and engineering codes that seek to maximize hardware utilization while containing development and maintenance costs. Its design emphasizes practical portability, enabling researchers to deliver scalable simulations and analyses across diverse systems, from local clusters to national facilities. For background on related programming approaches, see C++ and High-performance computing.

Kokkos sits at the intersection of software engineering discipline and high-stakes computational science. By providing templated abstractions for parallel execution and data management, it helps authors avoid writing architecture-specific kernels while still exposing performance-critical details to specialized backends. Core concepts include execution spaces (where work runs) and memory spaces (where data resides), along with a multi-dimensional array construct often referred to as a Kokkos::View. The project also emphasizes portability of algorithms through a common interface, with parallel loops such as parallel_for and parallel_reduce that map to the chosen backend. For context on the broader tooling ecosystem, see KokkosKernels and Trilinos.

History

Kokkos emerged from the need to address the rising fragmentation of HPC platforms, where advances in processor technology and accelerator devices outpaced single-language approaches. It was developed within Sandia National Laboratories and related DOE HPC initiatives as a portable programming model that could serve both national labs and industry partners. Over time, it matured into a widely adopted component of the Trilinos project, a large collection of scientific software packages that rely on portability layers to support multiple architectures. This transition helped Kokkos become a foundational layer for performance-focused codes used in physics, chemistry, materials science, and engineering applications. The codebase is publicly available under a permissive license and actively maintained by a broad community of researchers and practitioners, with contributions from universities, national labs, and industry partners. The practical aim has been to reduce duplication of effort across platforms while keeping close to the performance characteristics demanded by scientific workloads.

Design and architecture

Kokkos provides a set of abstractions that separate algorithms from their execution and memory details. The primary abstractions are:

  • ExecutionSpace: an abstraction for where work runs, covering backends such as CPU multi-core runtimes and accelerators like GPUs.
  • MemorySpace: an abstraction for where data lives, including host memory and device memory.
  • Views: a templated multi-dimensional array type that manages memory in a way compatible with all supported backends.
  • Policies: tools to organize parallel work into hierarchical structures, including parallel_for, parallel_reduce, and parallel_scan.
  • TeamPolicy and hierarchical execution models: a way to express nested parallelism that maps to hardware threads and blocks on GPUs.
  • Deep copy and synchronization utilities: mechanisms for moving data between memory spaces safely and efficiently.

Backends are provided to map the abstract constructs to concrete hardware. Notable implementations include:

  • CUDA for NVIDIA GPUs
  • HIP for AMD GPUs
  • OpenMP and Serial for CPU-based execution
  • Various other backends in active development or experimentation

The design emphasizes a single source code base that remains close to standard C++ while exploiting backend-specific optimizations when available. This approach is intended to reduce the maintenance burden of supporting many architecture-specific code paths and to minimize the total cost of ownership for large HPC codes. For context on how these ideas relate to broader programming practices, see C++ and OpenMP.

Kokkos also includes a set of higher-level libraries, notably KokkosKernels, which provides a collection of linear algebra and graph algorithms implemented in a portable way, and interfaces with Trilinos packages for broader scientific computing workflows. The ecosystem is designed to interoperate with other HPC tools, compilers, and performance analysis utilities, with integration points that allow existing codes to adopt Kokkos incrementally.

Programming model and core components

In Kokkos, developers write code using generic templates and Kokkos-provided constructs. The typical workflow involves:

  • Defining data structures with Kokkos::View to manage memory across host and device spaces.
  • Writing algorithms with Kokkos::parallel_for and Kokkos::parallel_reduce to express work in a backend-agnostic way.
  • Selecting an ExecutionSpace and MemorySpace appropriate for the target hardware at compile time or runtime.
  • Porting code incrementally, reusing as much of the existing algorithmic structure as possible, while letting the runtime select specialized optimizations.

This model allows numerical codes to express parallelism in a way that is not tied to a single vendor's toolkit. By providing a uniform API, Kokkos helps ensure that performance-sensitive software remains portable as hardware evolves. For deeper dives into array abstractions and memory semantics, see Kokkos::View and MemorySpace concepts; practical examples often involve Lattice-style simulations or Finite element methods implemented with Kokkos backends.

The ecosystem supports a range of scientific domains by enabling data layouts and access patterns that align with hardware capabilities. This has particular relevance for large-scale simulations where data movement is a dominant cost, and where keeping the same code base across CPU and accelerator backends reduces implementation risk and time to solution. The design philosophy aligns with broader goals in Performance portability discussions and relates to how modern HPC projects balance abstraction with raw throughput.

Adoption and impact

Kokkos has seen broad uptake in both government-funded and academic HPC programs as well as industry partnerships. It forms a foundational layer in many codes that require cross-architecture portability and has been adopted by major HPC software stacks such as Trilinos and various physics and engineering simulation packages. By enabling one code path to target CPUs and GPUs, Kokkos helps institutions leverage diverse hardware deployments without reinventing core algorithms for each system. This translates into tangible cost savings and faster research turnaround, especially in environments that regularly switch between clusters, accelerators, and future exascale platforms.

The project’s governance emphasizes open collaboration, with contributions from universities, national labs like Sandia National Laboratories, and industry collaborators. Its licensing structure is designed to encourage widespread use in both open-source and mission-critical contexts, helping ensure long-term sustainability of critical HPC software. The result is a programming model that is not tied to a single vendor or device family, while still delivering competitive performance on state-of-the-art hardware. For related ecosystems and software strategies, see Open-source software and High-performance computing.

Controversies and debates

As with any large, multi-organization software effort tied to national-scale computing goals, Kokkos has been at the center of technical and institutional debates. Key points of discussion include:

  • Portability versus peak performance: Critics sometimes argue that abstraction layers introduce overhead or prevent access to vendor-specific optimizations. Proponents respond that Kokkos is designed to minimize overhead and to expose performance-critical paths to backend-specific implementations, yielding competitive results across architectures while reducing duplication of effort in code bases.

  • Governance and funding: The model of government-sponsored research infrastructure can raise concerns about how priorities are set and how resources are allocated. Supporters point to the value of stable, long-term investment in infrastructure that benefits a broad user community and accelerates scientific discovery, while critics may call for clearer performance benchmarks and accountability in how funds are used.

  • Vendor lock-in versus open competition: Some argue that portability abstractions might suppress innovation in specialized backends. Advocates for Kokkos counter that the platform remains architecture-agnostic, with backends chosen to optimize performance and with a community-driven development process that welcomes new backends and contributions from multiple parties.

  • Diversity and inclusion discussions in technical communities: Critics of any field sometimes argue that emphasis on representation can distract from technical excellence. From a practical perspective, supporters of HPC software note that Kokkos has benefited from contributions across institutions and that performance-focused collaboration tends to produce robust, well-documented software. Those who criticize the focus on inclusivity of any kind often frame their critique as a matter of meritocratic efficiency; in practice, the Kokkos development model continues to prioritize code quality, performance, and broad participation, arguing that open, merit-based collaboration yields the strongest software and the best return on investment.

  • Woke criticisms and their practical value: Proponents of performance-first approaches often view ideological critiques as distractions from measurable outcomes like runtime efficiency, scalability, and code maintainability. They argue that Kokkos’s strengths lie in its ability to deliver portable performance and to reduce the cost of maintaining multi-architecture software stacks. Critics who frame the project primarily in identity or policy terms tend to overlook the core value proposition: enabling scientists and engineers to run the same algorithms efficiently across diverse hardware. In this view, arguments centered on hardware performance, reliability, and long-term sustainability carry the most weight, while considerations labeled as “woke” are seen as unrelated to the technical merits of the platform and, in practice, as often overstated.

See also