intro story D-Flow FM

 

D-Flow Flexible Mesh

D-Flow Flexible Mesh (D-Flow FM) is the new software engine for hydrodynamical simulations on unstructured grids in 1D-2D-3D. Together with the familiar curvilinear meshes from Delft3D 4, the unstructured grid can consist of triangles, pentagons (etc.) and 1D channel networks, all in one single mesh. It combines proven technology from the hydrodynamic engines of Delft3D 4 and SOBEK 2 and adds flexible administration, resulting in:

  • Easier 1D-2D-3D model coupling, intuitive setup of boundary conditions and meteorological forcings (amongst others).
  • More flexible 2D gridding in delta regions, river junctions, harbours, intertidal flats and more.
  • High performance by smart use of multicore architectures, and grid computing clusters.
An overview of the current developments can be found here.
 
The D-Flow FM - team would be delighted if you would participate in discussions on the generation of meshes, the specification of boundary conditions, the running of computations, and all kinds of other relevant topics. Feel free to share your smart questions and/or brilliant solutions! 

 

=======================================================
We have launched a new website (still under construction so expect continuous improvements) and a new forum dedicated to Delft3D Flexible Mesh.

Please follow this link to the new forum: 
/web/delft3dfm/forum

Post your questions, issues, suggestions, difficulties related to our Delft3D Flexible Mesh Suite on the new forum.

=======================================================

** PLEASE TAG YOUR POST! **

 

 

Sub groups
D-Flow Flexible Mesh
DELWAQ
Cohesive sediments & muddy systems

 


Message Boards

Example Error in 6.01.00.2755

DS
Dirk Smit, modified 7 Years ago.

Example Error in 6.01.00.2755

Youngling Posts: 21 Join Date: 6/18/13 Recent Posts
Hi again,

i am now testing 6.01.00.2755 without the DELWAQ Module.
While running the Example Files i noticed the following:
MPI process number 000 has host unknown and is running on processor master
MPI process number 001 has host unknown and is running on processor master
MPI process number 002 has host unknown and is running on processor master


What is the meaning of this and how can i resolve it? It seems, that the processes are running only on my Master but not on the Nodes.
All other Testcases seem to be working.

Regards,

Dirk
Adri Mourits, modified 7 Years ago.

RE: Example Error in 6.01.00.2755

Yoda Posts: 1221 Join Date: 1/3/11 Recent Posts
Hi Dirk,

On Linux, a file named "machinefile" is used, see "...\examples\01_standard\run_flow2d3d_parallel.sh". This file must contain the exact names of the machines on which the partitions must be started.

If you know the machine names in advance, you can create the machinefile manually.

If you are using some queueing mechanism, you don't know the machine names in advance. In that case you have to find out how your queueing tool publishes the allocated machines. See "...\examples\01_standard\run_flow2d3d_parallel_sge.sh" for an example when SGE is used.

Regards,

Adri
DS
Dirk Smit, modified 7 Years ago.

RE: Example Error in 6.01.00.2755

Youngling Posts: 21 Join Date: 6/18/13 Recent Posts
Hi Adri,

i have set the machinefile as you can see in the output.
If i put a mpdtrace into the script i get the machines he plans to use.

Regards,

Dirk
Adri Mourits, modified 7 Years ago.

RE: Example Error in 6.01.00.2755

Yoda Posts: 1221 Join Date: 1/3/11 Recent Posts
Hi Dirk,

It's difficult to find out what goes wrong on your cluster. Here are some suggestions:

Example script "...\examples\01_standard\run_flow2d3d_parallel.sh" contains the line:
mpd &

You should remove this line when using a cluster. When using a queueing system, the queueing system should take care for starting mpd. When this line is in the script, mpd will be started on the master machine.

The machinefile specifies the available hardware, the command "mpdboot" in your script distributes the processes over the available hardware. I don't know what MPI tool and version you are using, but may be the manual of that mpdboot command may help you. The mpdboot line in example script "...\examples\01_standard\run_flow2d3d_parallel.sh" reads:
mpdboot -n $NHOSTS -f $(pwd)/machinefile --ncpus=2

If you use exactly this line: Write the value of parameter $NHOSTS just before executing mpdboot. Does it contain the expected value?
--ncpus=2 puts 2 partitions on the first node, then 2 partitions on the second node and so on. What happens if you leave this out?

Regards,

Adri
DS
Dirk Smit, modified 7 Years ago.

RE: Example Error in 6.01.00.2755

Youngling Posts: 21 Join Date: 6/18/13 Recent Posts
Hi Adri,
sorry for the delay.
Normaly we use Torque as a scheduler.
But since i am troubleshooting right now the scheduler is my last concern.
The MPICH2 Version on our cluster i 1.4.1p1 right now.

The NHOSTS value is correct and removing "--npcus=2 " has no effect.
I still got the "MPI process number 000 has host unknown and is running on processor master.cluster" line.

Regards,

Dirk
Adri Mourits, modified 7 Years ago.

RE: Example Error in 6.01.00.2755

Yoda Posts: 1221 Join Date: 1/3/11 Recent Posts
Hi Dirk,

The calculation itself runs fine in parallel. So it has to do with one of the commands "mpd", "mpdboot" or the machinefile.

May be the full names of the machines must be used in the machinefile: not "node03" but "node03.mycompany.com".

Does your system administrator have suggestions?

Regards,

Adri
DS
Dirk Smit, modified 7 Years ago.

RE: Example Error in 6.01.00.2755

Youngling Posts: 21 Join Date: 6/18/13 Recent Posts
Hi Adri,
I am the Admin responsible for our Cluster-Calculations, so no, i have not ;-)

The machinefile is the same one we use with our other MPICH2 Programs, so THAT should not be an issue.
But even with the full Hostnames it does not work.

Could it be an error in the compilation? I noticed, that there is no Path to the MPI Liibrary in the Makefile

Regards,

Dirk
Adri Mourits, modified 7 Years ago.

RE: Example Error in 6.01.00.2755

Yoda Posts: 1221 Join Date: 1/3/11 Recent Posts
Hi Dirk,

Sometimes there are problems related to mpi. I collected some information in the FAQ.

But since your model does run in parallel, I expect that the compilation was fine. The problem you have is that the processes are not distributed correctly.

I have two suggestions left:
  • The mpi version used during compilation is another one than used at runtime
  • The mpi version used is compiled with another compiler than used for the Delft3D source code


Hope that helps.

Regards,

Adri
DS
Dirk Smit, modified 7 Years ago.

RE: Example Error in 6.01.00.2755

Youngling Posts: 21 Join Date: 6/18/13 Recent Posts
Hi Adri,

i noticed that the program wouldn't start a process remotely on the nodes.
So after some more testing i found an error in my run file AND in my my mpich2 compilation ( not the same Flags )
Now the program is distributed to my nodes as planned but the error message still shows on my console output
MPI process number 004 has host unknown and is running on processor node01.cluster

I am getting that error for every process that is started on the nodes.

But since it runs at last in parallel-mode on my nodes i can switch to my other Problems (compiling it using intel12, distribution by torque,...)

Regards,

Dirk
Adri Mourits, modified 7 Years ago.

RE: Example Error in 6.01.00.2755

Yoda Posts: 1221 Join Date: 1/3/11 Recent Posts
Hi Dirk,

If you have useful information that helps to avoid others running into the same problem: please post it in this forum.

Thanks.

Adri
DS
Dirk Smit, modified 7 Years ago.

RE: Example Error in 6.01.00.2755

Youngling Posts: 21 Join Date: 6/18/13 Recent Posts
Hi Adri,

its just, that you REALLY have to look at all the path variables.
Everything has to be absolutely correct to work.
If there is an Error look at the path, the compilation Flags and the Rest and try again...

Regards,

Dirk