Sparse Array Operations
Global Arrays provides some capability for creating and manipulating sparse 2D arrays. This includes functionality for creating a 2D array, basic operations for extracting and setting array diagonals, basic linear algebra operations for sparse matrix-vector muliply and matrix-matrix multiply and additional operations for directly accessing the data inside the sparse array. These operations are sufficient to allow users to build basic linear solvers, such as conjugate gradient and BiCGSTAB and to support investigations into sparse data storage and data manipulation.
The sparse array functionality in GA is accessible from both the C and Fortran APIs, although at present date only the C-interface has been extensively tested.
Creating a Sparse Array
The model for creating a sparse array is similar to other packages that
create sparse data structures, such as PETSc and Hypre. The sparse array
is initially created using a call to
NGA_Sprs_array_create(int idim, int jdim, int type), where idim
is the row dimension of the sparse array, jdim is the column
dimension of the sparse array and type is the GA datatype of the
array. This function is also available in the C API in a version that
supports 64 bit indexing
NGA_Sprs_array_create64(int64_t idim, int64_t jdim, int64_t type) [1].
The function returns an integer handle that can be used to identify the array in subsequent array operations. Sparse arrays in application can typically end up being much larger than dense arrays, so the option to use 64 bit indexing is more important here. If a sparse array is created with the 64 bit interface, then all other operations on the array should also use the 64 bit versions of the operation, if they are available. Some internal data structures containing the locations of the non-zero values of the sparse array elements are sized to 64 bits when the 64 bit interface is used and other operation may fail if the wrong interfaces is used on the array.
Once the sparse array has been created, the non-zero elements can be
added to the array with the
NGA_Sprs_array_add_element(int s_a, int idx, int jdx, void *val) or
NGA_Sprs_array_add_element64(int s_a, int64_t idx, int64_t jdx, void *val)
function. The handle s_a identifies the sparse array, idx and
jdx are the row and column indices and val is the value of the
array element. Any process can add any element to the array, the only
restriction is that an array element can only be added once. It is a
good idea to make sure that all diagonal elements of the array are
added, even if they happen to be zero. Some array operations may fail if
some diagonal entries are missing.
Finally, once all elements have been added to the array, it is time to
assemble the array into a final form that can be used for computation by
calling the NGA_Sprs_array_assemble(int s_a) function. This function
sorts individual elements onto processors and further organizes the data
within processors so that it is ready for computation. The final data
layout is illustrated in Figure 11. The row dimension
of the sparse array is divided into roughly equal sized segments and all
the elements with a row index falling within a row segment are assigned
to the corresponding processor. The column dimension is partitioned in
the same way and the row block on each processor is divided into blocks
of data corresponding to the column partition. Nominally, the number of
blocks of data corresponds to the number of processors, but in many
applications, the column blocks may be empty. If the block is empty,
then no memory is allocated for that block. Each of the non-zero blocks
is stored using a compressed sparse row format (CSR). This data may be
accessed directly by the process hosting the data or it can be retrieved
from a remote locate and copied into local buffers. The overall data
layout of the sparse matrix can be characterized as a sparse array of
sparse blocks.
Figure 11: Schematic layout of sparse matrix data distribution. Original matrix is partitioned along the row axis and each partition is assigned to a process. Data within a row block is further partitioned along the column axis into sparse blocks. Final data structure is a sparse array of sparse blocks.
Alternatively, it is possible to create a new sparse array from an
existing sparse array using the function
NGA_Sprs_array_duplicate(int s_a) which returns a handle to a new
sparse array that is a duplicate of the original array s_a. The new
array can then be modified using some of the functions described in the
next section.
Finally, when you are done with array, it can be removed from the system
with the command NGA_Sprs_array_destroy(int s_a). This will free up
all resources associated with the array.
Collective Operations on Sparse Arrays
Once a sparse array has been created, it can be used in several
collective operations on distributed data. The most important of these
is sparse matrix-vector multiplication, where the vector is an ordinary
global array of dimension 1. This operation is of the form
NGA_Sprs_array_matvec_multiply(int s_a, int g_x, int g_b) and
corresponds to the multiplication
\(\overline{\overline{A}}\cdot\overline{x}
=\overline{b}\). The handles g_x and g_b are ordinary global
arrays of dimension 1. Note that the length of g_x and g_b must
match the column dimension of the sparse matrix s_a. Matrix-vector
multiplication is a major operation used in iterative solvers.
Many algorithms involving sparse matrices manipulate the diagonal, and
some operations are included that enable users to extract and modify
diagonal values. Note that these operations are very likely to fail if
not all diagonal values have been included in the sparse matrix, so it
is generatlly a good idea to include diagonal values in the sparse array
even if they are zero. The first operation extracts the diagonal values
and stores them in a distributed 1 dimensional global array
NGA_Sprs_array_get_diag(int s_a, int *g_d) where g_d is an
ordinary global array of dimension 1. The diagonals can also be shifted
by a constant value using the function
NGA_Sprs_array_shift_diag(int s_a, void *shift) where the variable
shift is the amount the diagonal should be incremented.
Other operations include left and right multiplication by a diagonal
matrix. The diagonal matrix is represented by a 1 dimensional global
array containing the diagonal values. The multiplication operations can
be completed using the functions
NGA_Sprs_array_right_multiply(int s_a, int g_d) and
NGA_Sprs_array_left_multiply(int s_a, int g_d). The handle g_d
is the global array containing the diagonal elemenents.
The sparse matrix library also supports sparse matrix-sparse matrix
multiplication through the function
NGA_Sprs_array_matmatm_multiply(int s_a, int s_b). This operation
multiplies the sparse matrices s_a and s_b together and returns
a handle to a new sparse matrix representing the product. Functions that
multiplication of a sparse matrix and a dense matrix have also been
included in the suite of sparse matrix operations. These are
NGA_Sprs_array_sprsdns_multiply(int s_a, int g_b) and
NGA_Sprs_array_dnssprs_multiply(int g_a, int s_b), which implement
multiplication of a sparse matrix times a dense matrix and a dense matrix
times a sparse matrix, respectively.
Direct Access to Sparse Matrix Data
GA provides basic functionality on sparse matrices that can be used to construct more complex algorithms. In addition, an array of tools provides direct access to the data stored in the sparse array enabling users to both access and modify the sparse array in more complicated ways. The most basic operations for understanding data layout are the
NGA_Sprs_array_row_distribution(int s_a, int iproc, int *rlo, int *rhi)
and
NGA_Sprs_array_column_distribution(int s_a, int iproc, int *clo, int *chi)
functions. For the given processor iproc these function return the
bounding indices of the corresponding row or column partition. Data for
all rows between rlo and rhi are located on processor iproc.
On each processor, this data is further organized into CSR formatted
subblocks based in the column partition. Note that in many cases, the
subblocks may be empty. These distribution functions are also available
with 64 bit interfaces.
The NGA_Sprs_array_col_block_list(int s_a, int **idx_x, int *n)
function returns a list of the column blocks that contain data on the
calling process. In C, the pointer *idx is allocated by this
function and must be freed by the calling program. In Fortran, the array
idx is assumed to have already been allocated and the variable n
on input is the length of idx. On output, n is the number of
non-zero blocks. The data inside the column blocks can be accessed
directly using
NGA_Sprs_arrayt_access_col_block(int s_a, int icol, int **idx, int **jdx, void **val).
This function returns pointers to the column block data in CSR format.
The bounding indices for this data can be obtained using the
distribution functions. This function is also available in a 64 bit
version. If the column block corresponding to iproc is empty, the
pointers returned by the function are NULL. The row indices idx
array contains the starting location of the column indices and values
for all non-zero values in the row. The number of entries in idx is
rhi-rlo+2. The extra entry is included so that the number of
non-zero entries in any row can be calculated using idx[i+1]-idx[i],
even for the last row in the block. The absolute row index of the row
indexed by i in idx can be calculated using i+rlo. The array
jdx contains the absolute column indices for the entry. The relative
column index in the block can be recovered using j=jdx[k]-clo.
Any column block in the sparse array can be recovered using the
NGA_Sprs_array_get_block(int s_a, int irow, int icol, int **idx, int **idx, void *val, int *rlo, int *rhi, int *clo, int *chi)
function. This function is also available in a 64 bit version. The
indices irow and icol are the row and column indices of the
desired block. The values of these indices are bounded by the number of
processors. The CSR formatted data is returned in the arrays idx,
jdx and val. If the block contains no data, these arrays are
NULL. These arrays are allocated by the function and must be freed
by the calling program. The indices rlo,rhi,clo,chi are the bounding
indices for the block in a 64 bit version.