Use of multi-GPU systems for large FFTs: with applications in ultrasound simulations

dc.contributor.authorNandapalan, Nimalan
dc.date.accessioned2013-05-20T05:32:46Z
dc.date.available2013-05-20T05:32:46Z
dc.date.issued2013
dc.description.abstractUltrasound simulations are a type of application that are both computationally and communicatively intensive. With better performance, implementations of these can be used in designing new ultrasound probes, developing better signal processing techniques, training new ultrasonographers, in treatment planning and many other uses [11]. The pseudo-spectral technique can be used effectively to express the wave-propagation model used in these simulations, and is characterised by its use of the Fast Fourier Transform (FFT). The FFT can account for over half of the time spent by ultrasound simulations, with the remaining consisting of embarrassingly parallel arithmetic [28]. The use of a Graphics Processing Unit (GPU) for general computations like the FFT has become ubiquitous with favourable performance. The current trend in the design of the Central Processing Unit (CPU) of most systems has seen a shift from single-core to multi-core processing with these now being assembled into multi-socket configurations. GPUs are already massively multi-core processors typically with three or four times as many cores the question remains: will GPUs follow a similar trend and incorporate multiple devices in individual sockets when implemented? The purpose of the work in this thesis is to assess the viability of multi-GPU systems for ultrasound simulations in terms of cost and performance compared to other system designs that offer similar computational resources. Current machine hardware is capable of supporting multiple GPU through peripheral devices and offers a glimpse of the potential of future machines however, relatively little work has been reported on the use of such systems for ultrasound simulations and the FFT algorithm. In this thesis, to address this issue, we benchmark and model the device-to-device communication potential of an existing multi-GPU system. Four different methods are considered, namely: via CPU, pointer swapping, hybrid-staged, and kernel. The results reveal that the pointer swapping and kernel based methods of managing communication can be up to twice as efficient as other methods. The methods for communication identified in the benchmarks are then used as the basis for a number of important generic communication functions, which are in turn used to implement a distributed 3D FFT algorithm as required by the ultrasound simulation. The multi-GPU distributed 3D FFT with four GPUs was found to be up to 18% faster than an existing FFT implementation on a six core CPU. This multi-GPU distributed 3D FFT implementation is then used in an ultra- sound simulation as a proof-of-concept case study of the thesis. By overlapping communication and computation between the CPU and GPU resources a speed up of 8% is observed.en_AU
dc.identifier.otherb35684550
dc.identifier.urihttp://hdl.handle.net/1885/10053
dc.language.isoen_AUen_AU
dc.subjectGPU FFT Ultrasounden_AU
dc.titleUse of multi-GPU systems for large FFTs: with applications in ultrasound simulationsen_AU
dc.typeThesis (Masters)en_AU
dcterms.valid2013en_AU
local.contributor.affiliationResearch School of Computer Scienceen_AU
local.contributor.supervisorRendell, Alistair
local.description.notesSupervisor: Alistair Rendell, Supervisor's Email Address: Alistair.Rendell@anu.edu.auen_AU
local.description.refereedYesen_AU
local.identifier.doi10.25911/5d78d836f0192
local.mintdoimint
local.type.degreeMaster of Philosophy (MPhil)en_AU

Downloads

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Nandapalan_N_2013.pdf
Size:
2.58 MB
Format:
Adobe Portable Document Format
Description:
Whole Thesis