Sliced Coordinate List Implementation Analysis on Sparse Matrix-Vector Multiplication Using Compute Unified Device Architecture
DOI:
https://doi.org/10.21108/IJOICT.2016.21.71Abstract
Matrices are one of the most used data representation form from real-world problems. Lot of matrix was formed very big but sparse, hence information inside the matrix is relatively small compared to its size. This caused into heavy computational resources needed to process those matrices within short time. One of the solutions to do an efficient process to the sparse matrix is to form it into a specialized form of sparse matrix, such as Sliced Coordinate List (SCOO). SCOO format for sparse matrix has been developed and combined within an implementation using Compute Unified Device Architecture (CUDA). In this research, performance of SCOO implementation using GPU – CUDA will be compared to the other sparse matrix format named Coordinate List (COO) based on its memory usage and execution time. Results obtained from this research show that although SCOO implementation for sparse matrix use memory 1.000529 larger than COO format, its serial performance is 3.18 faster than serial COO, besides that, if SCOO implementation is conducted parallel using GPU – CUDA then its performance can be achieved around 123.8 faster than parallel COO or 77 times faster than parallel COO using one of the available library for CUDA, named CUSP.
Downloads
References
Dang, H.V and Schmidt, B. (2013). CUDA-enabled Sparse Matrix–Vector Multiplication on GPUs using atomic operations. Parallel Computing, 39 (11), 737-750.
http://dx.doi.org/10.1016/j.parco.2013.09.005">Crossref
Richard.B, Gabriel, C, & John , S.( 2014), Linear Algebra: Algorithms. Applications. and Techniques: Third Edition. USA, Academic Press.
Nicholas, Wilt. (2013). The CUDA Handbook: A Comprehensive Guide to GPU Programming. USA, Addison-Wesley Professional.
Halliday & Resnick ( 2011). Fundamental of Physics 9th Edition Extended. USA, John Wiley & Sons.
N. Bell & M. Garland (2009). Implementing Sparse Matrix–Vector Multiplication on Throughput-Oriented Processors, SC, ACM, 11.
N. Bell & M. Garland (2012). Cusp: Generic Parallel Algorithms for Sparse Matrix and Graph Computations. <http://cusp-library.googlecode.com>, version 0.3.0.
Davis, T. A. & Y. Hu. "Tim Davis: University of Florida Sparse Matrix Collectio : sparse matrices from a wide range of applications". http://www.cise.ufl.edu/research/sparse/matrices/
Davis. Tim. "Research Sparse Matrix Algorithms and Software". http://faculty.cse.tamu.edu/davis/research.html.
Rauber,T. & Runger, G. (2010) Parallel Programming For Multicore and Cluster Systems, Springer.
NVIDIA Corporation, CUDA C Programming Guide, (2011) .<http://docs.nvidia.com/cuda/cuda-c-programming-guide/>.
NVIDIA. "What is GPU Computing? | High Performance Computing | NVIDIA | NVIDIA". http://www.nvidia.com/object/what-is-gpu-computing.html
GilbJ.R., Reinhardt, S. & Shah V. B. (2006). Highperformance Graph Algorithms from Parallel Sparse Matrices,. Proc. of the Int'l Workshop on Applied Parallel Computing.
Farzaneh,A., Kheiri, H. & Shahmersi, M.A. (2009). An Efficient Storage Format For Large Sparse Matrices, Commun. Fac.Sci. Univ.Ank. Series A1 Volume 58 , Number 2 , 1-10
Hong,S. & Kim, S. (2009) . Memory-level and Thread-level Parallelism Aware GPU Architecture Performance Analytical Model, ISCA'09 Proceeding, ACM.
Zlatev & Zahari (1991). Computational Methods for General Sparse Matrices. Netherlands: Springer.
Downloads
Published
How to Cite
Issue
Section
License
Manuscript submitted to IJoICT has to be an original work of the author(s), contains no element of plagiarism, and has never been published or is not being considered for publication in other journals. Author(s) shall agree to assign all copyright of published article to IJoICT. Requests related to future re-use and re-publication of major or substantial parts of the article must be consulted with the editors of IJoICT.