Sparse Array Operations

Global Arrays provides some capability for creating and manipulating sparse 2D arrays. This includes functionality for creating a 2D array, basic operations for extracting and setting array diagonals, basic linear algebra operations for sparse matrix-vector muliply and matrix-matrix multiply and additional operations for directly accessing the data inside the sparse array. These operations are sufficient to allow users to build basic linear solvers, such as conjugate gradient and BiCGSTAB and to support investigations into sparse data storage and data manipulation.

The sparse array functionality in GA is accessible from both the C and Fortran APIs, although at present date only the C-interface has been extensively tested.

Creating a Sparse Array

The model for creating a sparse array is similar to other packages that create sparse data structures, such as PETSc and Hypre. The sparse array is initially created using a call to NGA_Sprs_array_create(int idim, int jdim, int type), where idim is the row dimension of the sparse array, jdim is the column dimension of the sparse array and type is the GA datatype of the array. This function is also available in the C API in a version that supports 64 bit indexing NGA_Sprs_array_create64(int64_t idim, int64_t jdim, int64_t type) [1].

The function returns an integer handle that can be used to identify the array in subsequent array operations. Sparse arrays in application can typically end up being much larger than dense arrays, so the option to use 64 bit indexing is more important here. If a sparse array is created with the 64 bit interface, then all other operations on the array should also use the 64 bit versions of the operation, if they are available. Some internal data structures containing the locations of the non-zero values of the sparse array elements are sized to 64 bits when the 64 bit interface is used and other operation may fail if the wrong interfaces is used on the array.

Once the sparse array has been created, the non-zero elements can be added to the array with the NGA_Sprs_array_add_element(int s_a, int idx, int jdx, void *val) or NGA_Sprs_array_add_element64(int s_a, int64_t idx, int64_t jdx, void *val) function. The handle s_a identifies the sparse array, idx and jdx are the row and column indices and val is the value of the array element. Any process can add any element to the array, the only restriction is that an array element can only be added once. It is a good idea to make sure that all diagonal elements of the array are added, even if they happen to be zero. Some array operations may fail if some diagonal entries are missing.

Finally, once all elements have been added to the array, it is time to assemble the array into a final form that can be used for computation by calling the NGA_Sprs_array_assemble(int s_a) function. This function sorts individual elements onto processors and further organizes the data within processors so that it is ready for computation. The final data layout is illustrated in Figure 11. The row dimension of the sparse array is divided into roughly equal sized segments and all the elements with a row index falling within a row segment are assigned to the corresponding processor. The column dimension is partitioned in the same way and the row block on each processor is divided into blocks of data corresponding to the column partition. Nominally, the number of blocks of data corresponds to the number of processors, but in many applications, the column blocks may be empty. If the block is empty, then no memory is allocated for that block. Each of the non-zero blocks is stored using a compressed sparse row format (CSR). This data may be accessed directly by the process hosting the data or it can be retrieved from a remote locate and copied into local buffers. The overall data layout of the sparse matrix can be characterized as a sparse array of sparse blocks.

_images/sprs-layout.png — Figure 11: Schematic layout of sparse matrix data distribution. Original matrix is partitioned along the row axis and each partition is assigned to a process. Data within a row block is further partitioned along the column axis into sparse blocks. Final data structure is a sparse array of sparse blocks.

Alternatively, it is possible to create a new sparse array from an existing sparse array using the function NGA_Sprs_array_duplicate(int s_a) which returns a handle to a new sparse array that is a duplicate of the original array s_a. The new array can then be modified using some of the functions described in the next section.

Finally, when you are done with array, it can be removed from the system with the command NGA_Sprs_array_destroy(int s_a). This will free up all resources associated with the array.

Collective Operations on Sparse Arrays

Once a sparse array has been created, it can be used in several collective operations on distributed data. The most important of these is sparse matrix-vector multiplication, where the vector is an ordinary global array of dimension 1. This operation is of the form NGA_Sprs_array_matvec_multiply(int s_a, int g_x, int g_b) and corresponds to the multiplication \(\overline{\overline{A}}\cdot\overline{x} =\overline{b}\). The handles g_x and g_b are ordinary global arrays of dimension 1. Note that the length of g_x and g_b must match the column dimension of the sparse matrix s_a. Matrix-vector multiplication is a major operation used in iterative solvers.

Many algorithms involving sparse matrices manipulate the diagonal, and some operations are included that enable users to extract and modify diagonal values. Note that these operations are very likely to fail if not all diagonal values have been included in the sparse matrix, so it is generatlly a good idea to include diagonal values in the sparse array even if they are zero. The first operation extracts the diagonal values and stores them in a distributed 1 dimensional global array NGA_Sprs_array_get_diag(int s_a, int *g_d) where g_d is an ordinary global array of dimension 1. The diagonals can also be shifted by a constant value using the function NGA_Sprs_array_shift_diag(int s_a, void *shift) where the variable shift is the amount the diagonal should be incremented.

Other operations include left and right multiplication by a diagonal matrix. The diagonal matrix is represented by a 1 dimensional global array containing the diagonal values. The multiplication operations can be completed using the functions NGA_Sprs_array_right_multiply(int s_a, int g_d) and NGA_Sprs_array_left_multiply(int s_a, int g_d). The handle g_d is the global array containing the diagonal elemenents.

The sparse matrix library also supports sparse matrix-sparse matrix multiplication through the function NGA_Sprs_array_matmatm_multiply(int s_a, int s_b). This operation multiplies the sparse matrices s_a and s_b together and returns a handle to a new sparse matrix representing the product. Functions that multiplication of a sparse matrix and a dense matrix have also been included in the suite of sparse matrix operations. These are NGA_Sprs_array_sprsdns_multiply(int s_a, int g_b) and NGA_Sprs_array_dnssprs_multiply(int g_a, int s_b), which implement multiplication of a sparse matrix times a dense matrix and a dense matrix times a sparse matrix, respectively.

Direct Access to Sparse Matrix Data

GA provides basic functionality on sparse matrices that can be used to construct more complex algorithms. In addition, an array of tools provides direct access to the data stored in the sparse array enabling users to both access and modify the sparse array in more complicated ways. The most basic operations for understanding data layout are the

NGA_Sprs_array_row_distribution(int s_a, int iproc, int *rlo, int *rhi)

and

NGA_Sprs_array_column_distribution(int s_a, int iproc, int *clo, int *chi) functions. For the given processor iproc these function return the bounding indices of the corresponding row or column partition. Data for all rows between rlo and rhi are located on processor iproc. On each processor, this data is further organized into CSR formatted subblocks based in the column partition. Note that in many cases, the subblocks may be empty. These distribution functions are also available with 64 bit interfaces.

The NGA_Sprs_array_col_block_list(int s_a, int **idx_x, int *n) function returns a list of the column blocks that contain data on the calling process. In C, the pointer *idx is allocated by this function and must be freed by the calling program. In Fortran, the array idx is assumed to have already been allocated and the variable n on input is the length of idx. On output, n is the number of non-zero blocks. The data inside the column blocks can be accessed directly using

NGA_Sprs_arrayt_access_col_block(int s_a, int icol, int **idx, int **jdx, void **val). This function returns pointers to the column block data in CSR format. The bounding indices for this data can be obtained using the distribution functions. This function is also available in a 64 bit version. If the column block corresponding to iproc is empty, the pointers returned by the function are NULL. The row indices idx array contains the starting location of the column indices and values for all non-zero values in the row. The number of entries in idx is rhi-rlo+2. The extra entry is included so that the number of non-zero entries in any row can be calculated using idx[i+1]-idx[i], even for the last row in the block. The absolute row index of the row indexed by i in idx can be calculated using i+rlo. The array jdx contains the absolute column indices for the entry. The relative column index in the block can be recovered using j=jdx[k]-clo.

Any column block in the sparse array can be recovered using the NGA_Sprs_array_get_block(int s_a, int irow, int icol, int **idx, int **idx, void *val, int *rlo, int *rhi, int *clo, int *chi) function. This function is also available in a 64 bit version. The indices irow and icol are the row and column indices of the desired block. The values of these indices are bounded by the number of processors. The CSR formatted data is returned in the arrays idx, jdx and val. If the block contains no data, these arrays are NULL. These arrays are allocated by the function and must be freed by the calling program. The indices rlo,rhi,clo,chi are the bounding indices for the block in a 64 bit version.