Warning: This feature is experimental. If variables with support are allocated in shared memory, there may be significant adaptations required to make collective resizing calls (risk of deadlocks). These resizings are slower than in local memory. It is recommended to use shared memory only for variables without support. It is not recommended to use shared memory with variables on particles; calls to endUpdate() are rather frequent, and with shared memory, they become collective!

Introduction

Arcane variables usually use the default allocator to allocate memory. Without a GPU, local machine memory is used, and with a GPU, unified memory is used.

A new memory allocator (internal to Arcane) is available and allows memory to be allocated in shared machine memory. To do this, internally, we use the previously presented class: Arcane::MachineShMemWin. We will therefore have access to non-contiguous segments.

This mode is compatible with all Arcane variable types (except scalar variables without support (e.g., VariableScalarReal) and partial variables).

The main difficulty in using this shared memory mode is ensuring that all calls that reallocate memory are collective.

For variables resized by Arcane, the user does not need to worry about these collective calls; Arcane handles them. For example, with mesh variables, Arcane handles resizing if the mesh evolves.

Warning: For variables with support, at the family level, using shared memory variables makes calls acting on these variables collective. We can mention the methods Arcane::IItemFamily::compactItems() and Arcane::IItemFamily::endUpdate().

Conversely, for variables for which a resize() method is available (or reshape() for multi-dimensional variables), it is necessary to ensure that all machine subdomains call this method (even if doing var.resize(var.size()) for subdomains that do not require resizing).

Setting aside these resizing calls, the use of shared memory variables is identical to the use of local memory variables.

To declare a variable in shared memory, simply add the IVariable::PInShMem property when creating it (in AXL files, the corresponding option is in-shmem="true").

Accessing Memory Segments of Other Subdomains

The utility of putting variables in shared memory is to be able to access data from other subdomains without message exchanges.

To access data from all subdomains, you can use the MachineShMemWinVariable classes. One class per Arcane variable type:

Variable Type (example)	Class to Use
1D Array Variable without support (Arcane::VariableArrayInt32)	Arcane::MachineShMemWinVariableArrayT
Mesh Scalar Variable (Arcane::VariableCellInt32)	Arcane::MachineShMemWinMeshVariableScalarT
2D Array Variable without support (Arcane::VariableArray2Int32)	Arcane::MachineShMemWinVariableArray2T
Mesh 1D Array Variable (Arcane::VariableCellArrayInt32)	Arcane::MachineShMemWinMeshVariableArrayT
Scalar Multi-dimensional Variable (Arcane::MeshMDVariableRefT<Cell, Real, MDDim2>)	Arcane::MachineShMemWinMeshMDVariableT
Vector Multi-dimensional Variable (Arcane::MeshVectorMDVariableRefT<Cell, Real, 7, MDDim2>)	Arcane::MachineShMemWinMeshVectorMDVariableT
Matrix Multi-dimensional Variable (Arcane::MeshMatrixMDVariableRefT<Cell, Real, 2, 5, MDDim1>)	Arcane::MachineShMemWinMeshMatrixMDVariableT

Three methods are common to these classes:

machineRanks(),
barrier(),
updateVariable().

The first two have already been briefly described in the previous section (Usage).

Arcane::MachineShMemWinVariableCommon::machineRanks() allows retrieving the ranks of the computation node's subdomains.

For example, if the returned view contains [0, 2, 4, 6], we know that the computation node possesses these subdomains and that we have access to their data via MachineShMemWin.
By using the Arcane::IParallelMng::commSize() method, knowing that the ranks are contiguous, we can also determine which subdomains are not in our computation node. For example, if commSize() = 8, then the subdomains for which we must perform inter-node communications are subdomains [1, 3, 5, 7].

Arcane::MachineShMemWinVariableCommon::barrier() allows performing a barrier for all subdomains of the computation node (so, if we take the previous example, a barrier for subdomains [0, 2, 4, 6]).

This is useful in the case where subdomains use a memory window to share information, to wait until each subdomain has written to its window before other subdomains in the node read this data. The granularity is smaller than Arcane::IParallelMng::barrier().

The real difference from the previous section is the method Arcane::MachineShMemWinMeshVariableArrayT::updateVariable().

Internally, as explained in the introduction, we use an allocator that allocates memory in shared memory and we use the Arcane::MachineShMemWin class to access it.
Arcane::MachineShMemWinVariable in turn uses Arcane::MachineShMemWin to access the shared memory of the variables.

The problem is that the size of an array in Arcane is not necessarily the same size as the memory allocated by it. Consequently, internally, we cannot rely on the size returned by Arcane::MachineShMemWin to build views on the variables.

We must therefore retrieve the sizes of the variables from each subdomain in another way. To do this, we use a memory window to share them.

When changing the size of a variable (via a change in the mesh or via a resize for array variables), we must update the variable sizes.

Today, it is up to the user to do this via a call to updateVariable().

It is also possible to destroy the Arcane::MachineShMemWinVariable object and recreate it after updating the variable.

Examples

Some examples to illustrate the use of these classes:

Example 1

VariableArrayInt32 var(VariableBuildInfo(mesh(), "Test3", IVariable::PInShMem));
MachineShMemWinVariableArrayT var_sh(var);
 
var.resize(2);
var_sh.updateVariable();
 
var[0] = parallelMng()->commRank();
var[1] = parallelMng()->commRank();
 
var_sh.barrier();
 
auto machine_ranks = var_sh.machineRanks();
 
for (Int32 rank : machine_ranks) {
  info() << "Rank " << rank << " -- Value : " << var_sh.view(rank);
}

In this example, each subdomain has an array of two Int32.

Note: The array could be of a different size for each subdomain.

Each subdomain puts its rank in the two cells of the array, and then each subdomain displays the view of each array (var_sh.view(rank) returns a view of two Int32 from the rank array).

The call to the updateVariable() method could easily be removed by putting var.resize(2); between the creation of the variable and the creation of the MachineShMemWinVariable:

Example 1.1

VariableArrayInt32 var(VariableBuildInfo(mesh(), "Test3", IVariable::PInShMem));
var.resize(2);
MachineShMemWinVariableArrayT var_sh(var);
//...

An alternative to calling updateVariable() is the destruction/recreation of MachineShMemWinVariable:

Example 1.2

VariableArrayInt32 var(VariableBuildInfo(mesh(), "Test3", IVariable::PInShMem));
{
  MachineShMemWinVariableArrayT var_sh(var);
  info() << var_sh.machineRanks();
  //...
}
var.resize(2);
{
  MachineShMemWinVariableArrayT var_sh(var);
  info() << var_sh.machineRanks();
  //...
}

Example 2

VariableCellInt32 var(VariableBuildInfo(mesh(), "Test3", IVariable::PInShMem));
MachineShMemWinMeshVariableScalarT var_sh(var);
 
ENUMERATE_ (Cell, icell, allCells()) {
  var[icell] = parallelMng()->commRank();
}
 
var_sh.barrier();
 
auto machine_ranks = var_sh.machineRanks();
 
for (Int32 rank : machine_ranks) {
  info() << "Rank " << rank << " -- Value : " << var_sh(rank, 0);
}

For mesh quantities, we have access to the operator Arcane::MachineShMemWinMeshVariableScalarT::operator()() which allows accessing the value of an Item using its local_id.

Warning: The local_id is local to the target subdomain. It is therefore necessary to share it in some way. You must not use the local_id of one subdomain to access the Items of another subdomain!

If multiple values need to be read from another subdomain, it is strongly recommended to do so by retrieving a view using the method Arcane::MachineShMemWinMeshVariableScalarT::view(). Example:

Example 2.1

for (Int32 rank : machine_ranks) {
  Span<Int32> var_rank_view = var_sh.view(rank);
  for (Int32 local_id_rank = 0; local_id_rank < var_rank_view.size(); ++local_id_rank) {
    info() << "Rank " << rank << " -- LocalId : " << local_id_rank << " -- Value : " << var_rank_view[local_id_rank];
  }
}

Warning: If there have been deletions of Items without compaction, the code in Example 2.1 will display values of deleted Items.

In Example 2, the barrier is important, given that each subdomain will access the data of the other subdomains.
Nevertheless, it is also possible to do this to avoid the barrier:

Example 2.2

VariableCellInt32 var(VariableBuildInfo(mesh(), "Test3", IVariable::PInShMem));
 
ENUMERATE_ (Cell, icell, allCells()) {
  var[icell] = parallelMng()->commRank();
}
 
MachineShMemWinMeshVariableScalarT var_sh(var);

Example 3

VariableArray2Int32 var(VariableBuildInfo(mesh(), "Test3", IVariable::PInShMem));
 
var.resize(2, 3);
 
var(0, 0) = parallelMng()->commRank();
var(0, 1) = parallelMng()->commRank();
var(0, 2) = parallelMng()->commRank();
var(1, 0) = parallelMng()->commRank() * 10;
var(1, 1) = parallelMng()->commRank() * 10;
var(1, 2) = parallelMng()->commRank() * 10;
 
MachineShMemWinVariableArray2T var_sh(var);
 
auto machine_ranks = var_sh.machineRanks();
 
for (Int32 rank : machine_ranks) {
  info() << "Rank " << rank << " -- Value0 : " << var_sh.view(rank)[0];
  info() << "Rank " << rank << " -- Value1 : " << var_sh.view(rank)[1];
}

Here, we have a 2D array without support.

Note: As with the 1D array, the 2D array could be of a different size for each subdomain.

The method Arcane::MachineShMemWinVariableArray2T::view() allows retrieving a view (of type Arcane::Span2) on the 2D array of another subdomain of the computation node.

Example 4

VariableCellArrayInt32 var(VariableBuildInfo(mesh(), "Test3", IVariable::PInShMem));
 
constexpr Int32 dim2_var = 2;
 
var.resize(dim2_var);
 
ENUMERATE_ (Cell, icell, allCells()) {
  var(icell, 0) = parallelMng()->commRank();
  var(icell, 1) = parallelMng()->commRank() * 10;
}
 
MachineShMemWinMeshVariableArrayT var_sh(var);
auto machine_ranks = var_sh.machineRanks();
 
for (Int32 rank : machine_ranks) {
 
  Span2<Int32> view2D = var_sh.view(rank);
  for (Int32 local_id_rank = 0; local_id_rank < view2D.dim1Size(); ++local_id_rank) {
    Span<Int32> view1D = view2D[local_id_rank];
    for (Int32 pos = 0; pos < dim2_var; ++pos) {
      info() << "Rank " << rank << " -- LocalId : " << local_id_rank << " -- Position array : " << pos << " -- Value : " << view1D[pos];
    }
  }
}

A mesh 1D array variable is a 2D array but with the first dimension corresponding to the number of Items.

Warning: Compared to the 2D array variable without support, the size of the second dimension must be identical for each subdomain.

We find the method Arcane::MachineShMemWinMeshVariableArrayT::view(), which returns a view of the 2D array of the variable from another subdomain. The first dimension takes a local_id of an Item from the other subdomain and the second dimension is the position in the array of the Item.

Example 5

MeshMDVariableRefT<Cell, Real, MDDim2> var(VariableBuildInfo(mesh(), "Test3", IVariable::PInShMem));
var.reshape({ 2, 3 });
 
ENUMERATE_ (Cell, icell, allCells()) {
  var(icell, 0, 0) = parallelMng()->commRank();
  var(icell, 0, 1) = parallelMng()->commRank() * 10;
  var(icell, 0, 2) = parallelMng()->commRank() * 20;
  var(icell, 1, 0) = parallelMng()->commRank() * 30;
  var(icell, 1, 1) = parallelMng()->commRank() * 40;
  var(icell, 1, 2) = parallelMng()->commRank() * 50;
}
 
MachineShMemWinMeshMDVariableT var_sh(var);
auto machine_ranks = var_sh.machineRanks();
for (Int32 rank : machine_ranks) {
  MDSpan<Real, MDDim3> aaa = var_sh.view(rank);
  info() << "Rank " << rank << " -- aaa.extents().extent0() : " << aaa.extent0();
  info() << "Rank " << rank << " -- aaa.extents().extent1() : " << aaa.extent1();
  info() << "Rank " << rank << " -- aaa.extents().extent2() : " << aaa.extent2();
}

With multi-dimensional variables, the method Arcane::MachineShMemWinMeshMDVariableT::view() returns an Arcane::MDSpan with one extra dimension compared to the variable's dimension, the first dimension corresponding to the support.

The operator Arcane::MachineShMemWinMeshMDVariableT::operator()() is also available and allows retrieving a multi-dimensional view of the variable's dimension (since the local_id is also provided).

As mentioned previously, if accessing multiple Items arrays for a given subdomain, it is better to retrieve a complete view via Arcane::MachineShMemWinMeshMDVariableT::view().

Example 6

{
  MeshVectorMDVariableRefT<Cell, Real, 3, MDDim2> var(VariableBuildInfo(mesh(), "Test3", IVariable::PInShMem));
  var.reshape({ 2, 3 });
 
  MachineShMemWinMeshVectorMDVariableT var_sh(var);
 
  ENUMERATE_ (Cell, icell, allCells()) {
    var(icell, 0, 0) = NumVector<Real, 3>{ static_cast<Real>(parallelMng()->commRank()), 0, 0 };
    var(icell, 0, 1) = NumVector<Real, 3>{ static_cast<Real>(parallelMng()->commRank() * 10), 0, 0 };
    var(icell, 0, 2) = NumVector<Real, 3>{ static_cast<Real>(parallelMng()->commRank() * 20), 0, 0 };
    var(icell, 1, 0) = NumVector<Real, 3>{ static_cast<Real>(parallelMng()->commRank() * 30), 0, 0 };
    var(icell, 1, 1) = NumVector<Real, 3>{ static_cast<Real>(parallelMng()->commRank() * 40), 0, 0 };
    var(icell, 1, 2) = NumVector<Real, 3>{ static_cast<Real>(parallelMng()->commRank() * 50), 0, 0 };
  }
 
  var_sh.barrier();
 
  auto machine_ranks = var_sh.machineRanks();
  for (Int32 rank : machine_ranks) {
    MDSpan<Real, MDDim4> aaa = var_sh.view(rank);
    info() << "Rank " << rank << " -- aaa.extents().extent0() : " << aaa.extent0();
    info() << "Rank " << rank << " -- aaa.extents().extent1() : " << aaa.extent1();
    info() << "Rank " << rank << " -- aaa.extents().extent2() : " << aaa.extent2();
    info() << "Rank " << rank << " -- aaa.extents().extent3() : " << aaa.extent3();
  }
}
{
  MeshMatrixMDVariableRefT<Cell, Real, 2, 4, MDDim1> var(VariableBuildInfo(mesh(), "Test3", IVariable::PInShMem));
  var.reshape({ 3 });
 
  ENUMERATE_ (Cell, icell, allCells()) {
    var(icell, 0) = Arcane::NumMatrix<Real, 2, 4>{ { static_cast<Real>(parallelMng()->commRank()), 0, 0, 0 }, { static_cast<Real>(parallelMng()->commRank()), 0, 0, 0 } };
    var(icell, 1) = Arcane::NumMatrix<Real, 2, 4>{ { static_cast<Real>(parallelMng()->commRank() * 10), 0, 0, 0 }, { static_cast<Real>(parallelMng()->commRank() * 10), 0, 0, 0 } };
    var(icell, 2) = Arcane::NumMatrix<Real, 2, 4>{ { static_cast<Real>(parallelMng()->commRank() * 20), 0, 0, 0 }, { static_cast<Real>(parallelMng()->commRank() * 20), 0, 0, 0 } };
  }
 
  MachineShMemWinMeshMatrixMDVariableT var_sh(var);
 
  auto machine_ranks = var_sh.machineRanks();
  for (Int32 rank : machine_ranks) {
    MDSpan<Real, MDDim4> aaa = var_sh.view(rank);
    info() << "Rank " << rank << " -- aaa.extents().extent0() : " << aaa.extent0();
    info() << "Rank " << rank << " -- aaa.extents().extent1() : " << aaa.extent1();
    info() << "Rank " << rank << " -- aaa.extents().extent2() : " << aaa.extent2();
    info() << "Rank " << rank << " -- aaa.extents().extent3() : " << aaa.extent3();
  }
}

For MD vector and matrix variables, we find the same methods as in the previous example.

Checkpoint

Shared memory variables are compatible with the checkpoint mechanism.

A variable property has been added to allow not saving the specified subdomain arrays.

This is the IVariable::PDumpNull property. This is not a property reserved for shared memory variables.

This property, when specified on a variable for a given subdomain, allows saving an empty array. This is particularly useful in recovery for shared memory variables given the obligation to perform collective operations.

Examples

Example 7

VariableArrayInt32 var(VariableBuildInfo(mesh(), "Test4", (IVariable::PInShMem | IVariable::PPersistant)));
MachineShMemWinVariableArrayT var_sh(var);
ConstArrayView<Int32> machine_ranks = var_sh.machineRanks();
 
if (globalIteration() == 1) {
  var.resize(10);
  Int32 index = 0;
  for (Int32& a : var) {
    a = parallelMng()->commRank() + index++;
  }
  if (machine_ranks[0] != parallelMng()->commRank()) {
    var.setProperty(IVariable::PDumpNull);
  }
}
else if (subDomain()->isContinue()) {
  if (machine_ranks[0] == parallelMng()->commRank()) {
    ARCANE_FATAL_IF(var.size() != 10,
                    "Error _test4() 1 : Array size is invalid -- Expected : 10 -- Found : {0}", var.size());
  }
  else {
    ARCANE_FATAL_IF(!var.empty(),
                    "Error _test4() 1 : Array size is invalid -- Expected : 0 -- Found : {0}", var.size());
  }
  var.resize(10);
}
info() << "Array : " << var;

We will call a master subdomain of the computation node, the subdomain with the smallest rank of the node (since machine_ranks is sorted in ascending order, it is the first rank of the array).

In this example, during the first iteration of the time loop, we resize the variable for all subdomains.

Then, we assign the IVariable::PDumpNull property to all non-master subdomains.

Finally, during recovery, we check that the master subdomains' arrays have been restored and that the other arrays are empty.

Shared Memory Arrays