Sliced Coordinate List Implementation Analysis on Sparse Matrix-Vector Multiplication Using Compute Unified Device Architecture

Authors

  • fitriyani fitriyani Telkom University

DOI:

https://doi.org/10.21108/IJOICT.2016.21.71

Abstract

Matrices are one of the most used data representation form from real-world problems. Lot of matrix was formed very big but sparse, hence information inside the matrix is relatively small compared to its size. This caused into heavy computational resources needed to process those matrices within short time. One of the solutions to do an efficient process to the sparse matrix is to form it into a specialized form of sparse matrix, such as Sliced Coordinate List (SCOO). SCOO format for sparse matrix has been developed and combined within an implementation using Compute Unified Device Architecture (CUDA). In this research, performance of SCOO implementation using GPU – CUDA will be compared to the other sparse matrix format named Coordinate List (COO) based on its memory usage and execution time. Results obtained from this research show that although SCOO implementation for sparse matrix use memory 1.000529 larger than COO format, its serial performance is 3.18 faster than serial COO, besides that, if SCOO implementation is conducted parallel using GPU – CUDA then its performance can be achieved around 123.8 faster than parallel COO or 77 times faster than parallel COO using one of the available library for CUDA, named CUSP.

Downloads

Download data is not yet available.

References

Dang, H.V and Schmidt, B. (2013). CUDA-enabled Sparse Matrix–Vector Multiplication on GPUs using atomic operations. Parallel Computing, 39 (11), 737-750.

http://dx.doi.org/10.1016/j.parco.2013.09.005">Crossref

Richard.B, Gabriel, C, & John , S.( 2014), Linear Algebra: Algorithms. Applications. and Techniques: Third Edition. USA, Academic Press.

Nicholas, Wilt. (2013). The CUDA Handbook: A Comprehensive Guide to GPU Programming. USA, Addison-Wesley Professional.

Halliday & Resnick ( 2011). Fundamental of Physics 9th Edition Extended. USA, John Wiley & Sons.

N. Bell & M. Garland (2009). Implementing Sparse Matrix–Vector Multiplication on Throughput-Oriented Processors, SC, ACM, 11.

N. Bell & M. Garland (2012). Cusp: Generic Parallel Algorithms for Sparse Matrix and Graph Computations. <http://cusp-library.googlecode.com>, version 0.3.0.

Davis, T. A. & Y. Hu. "Tim Davis: University of Florida Sparse Matrix Collectio : sparse matrices from a wide range of applications". http://www.cise.ufl.edu/research/sparse/matrices/

Davis. Tim. "Research Sparse Matrix Algorithms and Software". http://faculty.cse.tamu.edu/davis/research.html.

Rauber,T. & Runger, G. (2010) Parallel Programming For Multicore and Cluster Systems, Springer.

NVIDIA Corporation, CUDA C Programming Guide, (2011) .<http://docs.nvidia.com/cuda/cuda-c-programming-guide/>.

NVIDIA. "What is GPU Computing? | High Performance Computing | NVIDIA | NVIDIA". http://www.nvidia.com/object/what-is-gpu-computing.html

GilbJ.R., Reinhardt, S. & Shah V. B. (2006). Highperformance Graph Algorithms from Parallel Sparse Matrices,. Proc. of the Int'l Workshop on Applied Parallel Computing.

Farzaneh,A., Kheiri, H. & Shahmersi, M.A. (2009). An Efficient Storage Format For Large Sparse Matrices, Commun. Fac.Sci. Univ.Ank. Series A1 Volume 58 , Number 2 , 1-10

Hong,S. & Kim, S. (2009) . Memory-level and Thread-level Parallelism Aware GPU Architecture Performance Analytical Model, ISCA'09 Proceeding, ACM.

Zlatev & Zahari (1991). Computational Methods for General Sparse Matrices. Netherlands: Springer.

Downloads

Published

2016-07-01

How to Cite

fitriyani, fitriyani. (2016). Sliced Coordinate List Implementation Analysis on Sparse Matrix-Vector Multiplication Using Compute Unified Device Architecture. International Journal on Information and Communication Technology (IJoICT), 2(1), 13–22. https://doi.org/10.21108/IJOICT.2016.21.71

Issue

Section

Articles