Sparse Array Operations ======================= Global Arrays provides some capability for creating and manipulating sparse 2D arrays. This includes functionality for creating a 2D array, basic operations for extracting and setting array diagonals, basic linear algebra operations for sparse matrix-vector muliply and matrix-matrix multiply and additional operations for directly accessing the data inside the sparse array. These operations are sufficient to allow users to build basic linear solvers, such as conjugate gradient and BiCGSTAB and to support investigations into sparse data storage and data manipulation. The sparse array functionality in GA is accessible from both the C and Fortran APIs, although at present date only the C-interface has been extensively tested. Creating a Sparse Array ----------------------- The model for creating a sparse array is similar to other packages that create sparse data structures, such as PETSc and Hypre. The sparse array is initially created using a call to ``NGA_Sprs_array_create(int idim, int jdim, int type)``, where ``idim`` is the row dimension of the sparse array, ``jdim`` is the column dimension of the sparse array and ``type`` is the GA datatype of the array. This function is also available in the C API in a version that supports 64 bit indexing ``NGA_Sprs_array_create64(int64_t idim, int64_t jdim, int64_t type)`` [1]_. The function returns an integer handle that can be used to identify the array in subsequent array operations. Sparse arrays in application can typically end up being much larger than dense arrays, so the option to use 64 bit indexing is more important here. If a sparse array is created with the 64 bit interface, then all other operations on the array should also use the 64 bit versions of the operation, if they are available. Some internal data structures containing the locations of the non-zero values of the sparse array elements are sized to 64 bits when the 64 bit interface is used and other operation may fail if the wrong interfaces is used on the array. Once the sparse array has been created, the non-zero elements can be added to the array with the ``NGA_Sprs_array_add_element(int s_a, int idx, int jdx, void *val)`` or ``NGA_Sprs_array_add_element64(int s_a, int64_t idx, int64_t jdx, void *val)`` function. The handle ``s_a`` identifies the sparse array, ``idx`` and ``jdx`` are the row and column indices and ``val`` is the value of the array element. Any process can add any element to the array, the only restriction is that an array element can only be added once. It is a good idea to make sure that all diagonal elements of the array are added, even if they happen to be zero. Some array operations may fail if some diagonal entries are missing. Finally, once all elements have been added to the array, it is time to assemble the array into a final form that can be used for computation by calling the ``NGA_Sprs_array_assemble(int s_a)`` function. This function sorts individual elements onto processors and further organizes the data within processors so that it is ready for computation. The final data layout is illustrated in :numref:`Figure %s `. The row dimension of the sparse array is divided into roughly equal sized segments and all the elements with a row index falling within a row segment are assigned to the corresponding processor. The column dimension is partitioned in the same way and the row block on each processor is divided into blocks of data corresponding to the column partition. Nominally, the number of blocks of data corresponds to the number of processors, but in many applications, the column blocks may be empty. If the block is empty, then no memory is allocated for that block. Each of the non-zero blocks is stored using a compressed sparse row format (CSR). This data may be accessed directly by the process hosting the data or it can be retrieved from a remote locate and copied into local buffers. The overall data layout of the sparse matrix can be characterized as a sparse array of sparse blocks. .. _schematic-layout-sparse: .. figure:: images/sprs-layout.png :width: 90% :align: center Schematic layout of sparse matrix data distribution. Original matrix is partitioned along the row axis and each partition is assigned to a process. Data within a row block is further partitioned along the column axis into sparse blocks. Final data structure is a sparse array of sparse blocks. Alternatively, it is possible to create a new sparse array from an existing sparse array using the function ``NGA_Sprs_array_duplicate(int s_a)`` which returns a handle to a new sparse array that is a duplicate of the original array ``s_a``. The new array can then be modified using some of the functions described in the next section. Finally, when you are done with array, it can be removed from the system with the command ``NGA_Sprs_array_destroy(int s_a)``. This will free up all resources associated with the array. Collective Operations on Sparse Arrays -------------------------------------- Once a sparse array has been created, it can be used in several collective operations on distributed data. The most important of these is sparse matrix-vector multiplication, where the vector is an ordinary global array of dimension 1. This operation is of the form ``NGA_Sprs_array_matvec_multiply(int s_a, int g_x, int g_b)`` and corresponds to the multiplication :math:`\overline{\overline{A}}\cdot\overline{x} =\overline{b}`. The handles ``g_x`` and ``g_b`` are ordinary global arrays of dimension 1. Note that the length of ``g_x`` and ``g_b`` must match the column dimension of the sparse matrix ``s_a``. Matrix-vector multiplication is a major operation used in iterative solvers. Many algorithms involving sparse matrices manipulate the diagonal, and some operations are included that enable users to extract and modify diagonal values. Note that these operations are very likely to fail if not all diagonal values have been included in the sparse matrix, so it is generatlly a good idea to include diagonal values in the sparse array even if they are zero. The first operation extracts the diagonal values and stores them in a distributed 1 dimensional global array ``NGA_Sprs_array_get_diag(int s_a, int *g_d)`` where ``g_d`` is an ordinary global array of dimension 1. The diagonals can also be shifted by a constant value using the function ``NGA_Sprs_array_shift_diag(int s_a, void *shift)`` where the variable ``shift`` is the amount the diagonal should be incremented. Other operations include left and right multiplication by a diagonal matrix. The diagonal matrix is represented by a 1 dimensional global array containing the diagonal values. The multiplication operations can be completed using the functions ``NGA_Sprs_array_right_multiply(int s_a, int g_d)`` and ``NGA_Sprs_array_left_multiply(int s_a, int g_d)``. The handle ``g_d`` is the global array containing the diagonal elemenents. The sparse matrix library also supports sparse matrix-sparse matrix multiplication through the function ``NGA_Sprs_array_matmatm_multiply(int s_a, int s_b)``. This operation multiplies the sparse matrices ``s_a`` and ``s_b`` together and returns a handle to a new sparse matrix representing the product. Functions that multiplication of a sparse matrix and a dense matrix have also been included in the suite of sparse matrix operations. These are ``NGA_Sprs_array_sprsdns_multiply(int s_a, int g_b)`` and ``NGA_Sprs_array_dnssprs_multiply(int g_a, int s_b)``, which implement multiplication of a sparse matrix times a dense matrix and a dense matrix times a sparse matrix, respectively. Direct Access to Sparse Matrix Data ----------------------------------- GA provides basic functionality on sparse matrices that can be used to construct more complex algorithms. In addition, an array of tools provides direct access to the data stored in the sparse array enabling users to both access and modify the sparse array in more complicated ways. The most basic operations for understanding data layout are the ``NGA_Sprs_array_row_distribution(int s_a, int iproc, int *rlo, int *rhi)`` and ``NGA_Sprs_array_column_distribution(int s_a, int iproc, int *clo, int *chi)`` functions. For the given processor ``iproc`` these function return the bounding indices of the corresponding row or column partition. Data for all rows between ``rlo`` and ``rhi`` are located on processor ``iproc``. On each processor, this data is further organized into CSR formatted subblocks based in the column partition. Note that in many cases, the subblocks may be empty. These distribution functions are also available with 64 bit interfaces. The ``NGA_Sprs_array_col_block_list(int s_a, int **idx_x, int *n)`` function returns a list of the column blocks that contain data on the calling process. In C, the pointer ``*idx`` is allocated by this function and must be freed by the calling program. In Fortran, the array ``idx`` is assumed to have already been allocated and the variable ``n`` on input is the length of ``idx``. On output, ``n`` is the number of non-zero blocks. The data inside the column blocks can be accessed directly using ``NGA_Sprs_arrayt_access_col_block(int s_a, int icol, int **idx, int **jdx, void **val)``. This function returns pointers to the column block data in CSR format. The bounding indices for this data can be obtained using the distribution functions. This function is also available in a 64 bit version. If the column block corresponding to ``iproc`` is empty, the pointers returned by the function are ``NULL``. The row indices ``idx`` array contains the starting location of the column indices and values for all non-zero values in the row. The number of entries in ``idx`` is ``rhi-rlo+2``. The extra entry is included so that the number of non-zero entries in any row can be calculated using ``idx[i+1]-idx[i]``, even for the last row in the block. The absolute row index of the row indexed by ``i`` in ``idx`` can be calculated using ``i+rlo``. The array ``jdx`` contains the absolute column indices for the entry. The relative column index in the block can be recovered using ``j=jdx[k]-clo``. Any column block in the sparse array can be recovered using the ``NGA_Sprs_array_get_block(int s_a, int irow, int icol, int **idx, int **idx, void *val, int *rlo, int *rhi, int *clo, int *chi)`` function. This function is also available in a 64 bit version. The indices ``irow`` and ``icol`` are the row and column indices of the desired block. The values of these indices are bounded by the number of processors. The CSR formatted data is returned in the arrays ``idx``, ``jdx`` and ``val``. If the block contains no data, these arrays are ``NULL``. These arrays are allocated by the function and must be freed by the calling program. The indices ``rlo,rhi,clo,chi`` are the bounding indices for the block in a 64 bit version. .. [1] If a sparse array is created using the 64 bit interface, then the GA library should be build with 64 bit index support.