domain-partitioning running Delft3D on a linux cluster - D-Flow Flexible Mesh - Delft3D
intro story D-Flow FM
D-Flow Flexible MeshD-Flow Flexible Mesh (D-Flow FM) is the new software engine for hydrodynamical simulations on unstructured grids in 1D-2D-3D. Together with the familiar curvilinear meshes from Delft3D 4, the unstructured grid can consist of triangles, pentagons (etc.) and 1D channel networks, all in one single mesh. It combines proven technology from the hydrodynamic engines of Delft3D 4 and SOBEK 2 and adds flexible administration, resulting in:
An overview of the current developments can be found here. The D-Flow FM - team would be delighted if you would participate in discussions on the generation of meshes, the specification of boundary conditions, the running of computations, and all kinds of other relevant topics. Feel free to share your smart questions and/or brilliant solutions!
======================================================= | Sub groups
|
Message Boards
domain-partitioning running Delft3D on a linux cluster
Hi everybody,
I use Delft3D on a Linux Cluster using MPich2. I already ran several models with different numbers of machines and cores successfully.
Currently I am working on a model, which runs with a certain number of nodes, but when I try to increase the number of nodes, the simulation doesn't start and I get the error "Found more neighbouors than subdomains in the partitioning".
I guess the problem is, that within the partitioning into the different domains, from a certain number of nodes, some domains are getting too small in one direction (M- or N-). I already checked the sizes of the different domains when the model is running with less nodes and some domains here are much bigger than others. When I increase my number of nodes, some partitions are getting very small, while others are extremely big.
My question is: How is the partitioning going on with parallel computing and is there a possibility to influence it. Or do you have an other idea where my problem could be.
Thanks in advance,
Patrick
I use Delft3D on a Linux Cluster using MPich2. I already ran several models with different numbers of machines and cores successfully.
Currently I am working on a model, which runs with a certain number of nodes, but when I try to increase the number of nodes, the simulation doesn't start and I get the error "Found more neighbouors than subdomains in the partitioning".
I guess the problem is, that within the partitioning into the different domains, from a certain number of nodes, some domains are getting too small in one direction (M- or N-). I already checked the sizes of the different domains when the model is running with less nodes and some domains here are much bigger than others. When I increase my number of nodes, some partitions are getting very small, while others are extremely big.
My question is: How is the partitioning going on with parallel computing and is there a possibility to influence it. Or do you have an other idea where my problem could be.
Thanks in advance,
Patrick
Adri Mourits, modified 7 Years ago.
RE: domain-partitioning running Delft3D on a linux cluster (Answer)
Yoda Posts: 1221 Join Date: 1/3/11 Recent Posts 00
Hi Patrick,
The size of the partitions is calculated such that all partitions have (almost) the same number of active cells. When running a calculation, a tri-diag file is generated for each partition and they contain (at the top) the local/global dimensions and number of active points of that specific partition.
Influencing this is on our "To Do" list. If you want to change the splitting, you have to change the source code yourself. Have a look at subroutine "https://svn.oss.deltares.nl/repos/delft3d/trunk/src/engines_gpl/flow2d3d/packages/data/src/parallel_mpi/dfstrip.F90". There, "ipown(ic)" is getting the number of the partition where point "ic" is going to belong to (ic is the one dimensional index of point m,n).
I don't know what's going wrong in your model. The last solved error related to the partitioning had to do with some local parameters of type "integer" that were getting values above "MAXINT" and therefore they had to be declared of type "integer(kind=8)". May be your model is so awfully big that more parameters have to be declared of type "integer(kind=8)".
Regards,
Adri
The size of the partitions is calculated such that all partitions have (almost) the same number of active cells. When running a calculation, a tri-diag file is generated for each partition and they contain (at the top) the local/global dimensions and number of active points of that specific partition.
Influencing this is on our "To Do" list. If you want to change the splitting, you have to change the source code yourself. Have a look at subroutine "https://svn.oss.deltares.nl/repos/delft3d/trunk/src/engines_gpl/flow2d3d/packages/data/src/parallel_mpi/dfstrip.F90". There, "ipown(ic)" is getting the number of the partition where point "ic" is going to belong to (ic is the one dimensional index of point m,n).
I don't know what's going wrong in your model. The last solved error related to the partitioning had to do with some local parameters of type "integer" that were getting values above "MAXINT" and therefore they had to be declared of type "integer(kind=8)". May be your model is so awfully big that more parameters have to be declared of type "integer(kind=8)".
Regards,
Adri