Restricted Arrays
Overview
The restricted arrays functionality is designed to provide users with
further options for how data is distributed among processors by allowing
them to reduce the total number of processors that actually have data
and also by allowing users to remap which data blocks go to which
processors. There are two calls that allow users to create restricted
arrays; both must be used with the new interface for creating arrays.
This requires that users must first create an array handle using the
ga_create_handle call and then apply properties to the handle using
different ga_set calls. The two calls that allow users to create
restricted arrays are ga_set_restricted and
ga_set_restricted_range. The first call is more general, the second
is a convenience call that allows users to create restricted arrays on a
contiguous set of processors.
Both calls allow users to restrict the data in a global array to a
subset of available processors. The set ga_set_restricted call has two
arguments, nproc, and an array list of size nproc. Nproc
represents the number of processors that are supposed to contain data
and list is an array of the processor IDs that contain data. For the
array shown in Figure 8, the problem is run
on 36 processors but for nproc=4 and list=[8,9,15,21] only the
processors shown in the figure will have data. The array will be
decomposed assuming that it is distributed amongst only 4 processors so
it will be broken up using either a 2x2, 1x4, or 4x1 decomposition. The
block that would normally be mapped to process 0 in a 4 processor
decomposition will go to process 8, the data that would map to process 1
will go to process 9, etc. This functionality can be used to create
global arrays that have data limited to a small set of processors but
which are visible to all processors.
Figure 8: A global array distributed on 36 processors. If nproc=4 and list = [8,9,15,21] then only the shaded processor will contain data. The array will be decomposed into 4 blocks.
The restricted array capability can also be used to alter the default distribution of data. This is ordinarily done in a column major way on the processor grid so that a global array created on 16 processors that has been decomposed into a 4x4 grid of data blocks would have data mapped out as shown in Figure 9. The first column of blocks is assigned to processes 0-3, the second to processes 4-7, etc.
Figure 9: Standard data distribution for a global array created on 16 processors and decomposed into a 4x4 grid of data blocks.
Figure 10 shows an alternative distribution that could be achieved using restricted arrays and setting the list array to [0,1,4,5,2,3,6,7,8,9,12,13,10,11,14,15]. This distribution might but useful for reducing intranode communication for multiprocessor nodes
Figure 10: An alternative distribution that could be achieved using restricted arrays. An array on 16 processors decomposed into a 4x4 grid of data blocks.
Restricted Arrays Operations
Fortran subroutine: ga_set_restricted(g_a, list, nproc)
C: void GA_Set_restricted(int g_a, int list[], int nproc)
C++: GA::GlobalArray::setRestricted(int list[], int nproc) const
This subroutine restricts data in the global array g_a to only the
nproc processors listed in the array list. The value of nproc
must be less than or equal to the number of available processors. If
this call is used in conjunction with ga_set_irreg_distr, then the
decomposition in the ga_set_irreg_distr call must be done assuming
the number of processors used in the ga_set_restricted call. The
data that would ordinarily get mapped to process 0 in an nproc
distribution will get mapped to the processor in list[0], the data
that would be mapped to process 1 will get mapped to list[1], etc.
This can be used to restructure the data layout in a global array even
if the value of nproc equals the total number of processors
available.
Fortran subroutine: ga_set_restricted_range(g_a, list, nproc)
C: void GA_Set_restricted_range(int g_a, int lo_proc, int hi_proc)
C++: GA::GlobalArray::setRestrictedRange(int lo_proc, int hi_proc) const
This subroutine restricts data in the global array g_a to the
processors beginning withlo_proc and ending with hi_proc. Both
lo_proc and hi_proc must be less than or equal to the total
number of processors available minus one (e.g., in the range [0,N-1]
where N is the total number of processors) and lo_proc must be less
than or equal to hi_proc. If lo_proc=0 and hi_proc=N-1 then
this command has no effect on the data distribution. This call is
equivalent to using the ga_set_restricted call where
nprocs = hi_proc-lo_proc+1 and the array list contains the
processors [lo_proc, hi_proc] in consecutive order.