LitP: Language-integrated Tensor Parallelism

Authors

Philipp Kramer
Eastern Switzerland University of Applied Sciences, Switzerland

Keywords:

Programming Model, DSL, SIMD, Data-Parallel, Data-Flow, Reactive System, GPU, Compilation

Synopsis

This is a Chapter in:

Book:
Intelligent Computing and Consumer Support Applications

Series:
Chronicle of Computing

Chapter Abstract:

LitP, is a new data-parallel programming model for .NET respectively C#. It has been designed specifically to run data-parallel calculations in a managed runtime system. It can be used to solve a wide range of problems. Programs are described in vectorized form, but abstract from low-level architectural details of the target hardware. This allows for a good compromise between simplicity and performance. The presented library currently targets GPUs and CPUs, but is in principle well suited to any kind of general purpose, parallel processing unit.

Keywords:
Programming Model, DSL, SIMD, Data-Parallel, Data-Flow, Reactive System, GPU, Compilation

Cite this paper as:
Kramer P. (2023) LitP: Language-integrated Tensor Parallelism. In: Tiako P.F. (ed) Intelligent Computing and Consumer Support Applications. Chronicle of Computing. OkIP. https://doi.org/10.55432/978-1-6692-0003-1_4

Presented at:
The 2022 OkIP International Conference on Advances in High-Performance Computing (AHPC) in Oklahoma City, Oklahoma, USA, and Online, on October 3-6, 2022

Contact:
Philipp Kramer
philipp.kramer@ost.ch

References

2016. Aparapi CUDA programming model. http://aparapi.com. [Online; accessed 06-April-2021].

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016).

Pritesh Agrawal. 2012. Parallelizing LINQ Program for GPGPU. (2012).

Marco Aldinucci, Sonia Campa, Marco Danelutto, Peter Kilpatrick, and Massimo Torquati. 2012. Targeting distributed systems in fastflow. In European Conference on Parallel Processing. Springer, 47–56.

Altimesh. 2017. Hybridizer. http://www.altimesh.com/hybridizeressentials/. [Online; accessed 06-April-2021].

Anaconda. 2012. Numba. http://numba.pydata.org. [Online; accessed 06-April-2021].

Cédric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre AndréWacrenier. 2011. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience 23, 2 (2011), 187–198.

John Cheng, Max Grossman, and Ty McKercher. 2014. Professional CUDA c programming. John Wiley & Sons.

Amit Choudhury. 2014. A simple approximation to the area under standard normal curve. Mathematics and Statistics 2, 3 (2014), 147–149.

Florent Duguet and Guillaume de Roujoux. 2014. Altimesh Hybridizer™ Enabling Accelerators in. Net and more. In GPU Technology Conference.

Jiří Filipovič, Matúš Madzin, Jan Fousek, and Luděk Matyska. 2015. Optimizing CUDA code by kernel fusion: application on BLAS. The Journal of Supercomputing 71, 10 (2015), 3934–3957.

Franz Franchetti, Tze Meng Low, Doru Thom Popovici, Richard M Veras, Daniele G Spampinato, Jeremy R Johnson, Markus Püschel, James C Hoe, and José MF Moura. 2018. SPIRAL: Extreme performance portability. Proc. IEEE 106, 11 (2018), 1935–1968.

K Gregory and A Miller. 2012. C++ AMP: Accelerated Massive Parallelism with Microsoft Visual C++, Published with the authorization of Microsoft Corporation by O’Relly Media.

Kazuaki Ishizaki, Akihiro Hayashi, Gita Koblents, and Vivek Sarkar. 2015. Compiling and optimizing java 8 programs for gpu execution. In 2015 International Conference on Parallel Architecture and Compilation (PACT). IEEE, 419–431.

David B Kirk and W Hwu Wen-Mei. 2016. Programming massively parallel processors: a hands-on approach. Morgan kaufmann.

Andreas Kloeckner. 2009. PyCUDA. https://documen.tician.de/pycuda/. [Online; accessed 06-April-2021].

Andreas Kloeckner. 2009. PyOpenCL. https://documen.tician.de/pyopencl/. [Online; accessed 06-April-2021].

Philipp Kramer, Daniel Egloff, and L Blaser. 2016. The alea reactive dataflow system for gpu parallelization. In Proc. of the HLGPU 2016 Workshop, HiPEAC.

Calle Lejdfors and Lennart Ohlsson. 2007. PyGPU: A high-level language for high-speed image processing. In IADIS International Conference Applied Computing 2007. 66–81.

Erik Meijer, Brian Beckman, and Gavin Bierman. 2006. Linq: reconciling object, relations and xml in the. net framework. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data. 706–706.

Biagio Peccerillo and Sandro Bartolini. 2018. PHAST-A portable highlevel modern C++ programming library for GPUs and multi-cores. IEEE Transactions on Parallel and Distributed Systems 30, 1 (2018), 174–189.

Quantalea. [n.d.]. Alea GPU. https://developer.nvidia.com/blog/accelerate-net-applications-alea-gpu/. [Online; accessed 06-April-2021].

Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. Acm Sigplan Notices 48, 6 (2013), 519–530.

Christopher J Rossbach, Yuan Yu, Jon Currey, Jean-Philippe Martin, and Dennis Fetterly. 2013. Dandelion: a compiler and runtime for heterogeneous systems. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. 49–68.

Markus Steinberger, Michael Kenzel, Pedro Boechat, Bernhard Kerbl, Mark Dokter, and Dieter Schmalstieg. 2014. Whippletree: Task-based scheduling of dynamic workloads on the GPU. ACM Transactions on Graphics (TOG) 33, 6 (2014), 1–11.

Michel Steuwer, Philipp Kegel, and Sergei Gorlatch. 2011. Skelcl-a portable skeleton library for high-level gpu programming. In 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum. IEEE, 1176–1182.

Nessos Information Technologies. 2014. GpuLinq. https://github.com/nessos/GpuLinq. [Online; accessed 08-April-2021].

Raoul-Gabriel Urma, Mario Fusco, and Alan Mycroft. 2014. Java 8 inaction. Manning publications.

Mohamed Wahib and Naoya Maruyama. 2014. Scalable kernel fusion for memory-bound GPU applications. In SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 191–202.

Sandra Wienke, Paul Springer, Christian Terboven, and Dieter an Mey. 2012. OpenACC—first experiences with real-world applications. In European Conference on Parallel Processing. Springer, 859–870.

Haicheng Wu, Gregory Diamos, Srihari Cadambi, and Sudhakar Yalamanchili. 2012. Kernel weaver: Automatically fusing database primitives for efficient gpu computation. In 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 107–118.

Erik Wynters. 2016. Fast and easy parallel processing on GPUs using C++ AMP. Journal of Computing Sciences in Colleges 31, 6 (2016), 27–33.

JY Xu. 2008. OpenCL–the open standard for parallel programming of heterogeneous systems. (2008).

LitP: Language-integrated Tensor Parallelism

Published

September 21, 2023

Online ISSN

2831-350X

Print ISSN

2831-3496

Details about this monograph

Date of first publication (11)

2023-09-21
Hijri Calendar