-------------------------------------------------------------------------------------

**"****A swept rule framework for performing efficient stencil calculations on GPUs for unstructured meshes****"**

Mohammad Islam

MIT ACDL

Abstract**:** Stencil calculations lie at the heart of many scientific computing applications, such as the solution of linear systems of equations via iterative methods (e.g. Jacobi iteration). Although straightforward to implement, stencil updates are difficult to perform efficiently on high performance computing architectures as they involve large amounts of data access relative to computation (making them memory bound). In this talk, we present a swept rule algorithm for performing stencil calculations on a general unstructured mesh efficiently using a GPU. The algorithm is an extension of the original swept rule for breaking the latency barrier when solving time-dependent PDEs on parallel CPUs. The general idea is to determine several partitions of the unstructured mesh which maximize the number of updates which can be performed on each DOF without requiring communication between subdomains. Data associated with each subdomain can then be allocated to the high bandwidth shared memory of distinct GPU blocks, so that data access required for the stencil updates is performed efficiently. We illustrate some test cases where the algorithm accelerates the performance of the Jacobi iteration, relative to a standard GPU implementation which only uses GPU global memory. In particular, we find a 4-6x speedup in performing Jacobi iterations on a linear system when applying the algorithm to a large structured problem, and currently a 2x speedup on a small unstructured linear system. The algorithm only requires knowledge of the mesh connectivity, making it applicable to a general unstructured mesh.

-------------------------------------------------------------------------------------

**"****The transport map unadjusted Langevin algorithm****"**

Benjamin Zhang

MIT ACDL

Abstract**:** Langevin dynamics (LD) are often used to sample high-dimensional, non-Gaussian distributions whose densities are known up to a normalizing constant. In particular, there has been recent interest in the unadjusted Langevin algorithm (ULA) in which a single realization of LD is used to estimate expectations with respect to the target distribution. When the target distribution is not strongly log-concave, the method is known to exhibit slow convergence, which affects the efficiency of the resulting estimator. Meanwhile transport maps provide a way to couple complex non-Gaussian target distributions with simple reference ones. They can be used to generate cheap samples of the target distribution. However, transport maps between random variables must be approximated, so the resulting estimators are often biased. We present a method that combines transport maps with the unadjusted Langevin algorithm and demonstrate its advantages over the standard ULA. Given a map and a target distribution, we show that when the pushforward of the target through the map is strongly log-concave, the output process exhibits geometric convergence in the 2–Wasserstein distance, even when the target distribution is not strongly log-concave. Moreover, we show that in continuous time, when a transport map is applied to LD, the result is a Riemannian manifold Langevin dynamics with a metric that is defined by the map.