Problems running domain decomposition in Linux - various run time errors - D-Flow Flexible Mesh - Delft3D
intro story D-Flow FM
D-Flow Flexible MeshD-Flow Flexible Mesh (D-Flow FM) is the new software engine for hydrodynamical simulations on unstructured grids in 1D-2D-3D. Together with the familiar curvilinear meshes from Delft3D 4, the unstructured grid can consist of triangles, pentagons (etc.) and 1D channel networks, all in one single mesh. It combines proven technology from the hydrodynamic engines of Delft3D 4 and SOBEK 2 and adds flexible administration, resulting in:
An overview of the current developments can be found here. The D-Flow FM - team would be delighted if you would participate in discussions on the generation of meshes, the specification of boundary conditions, the running of computations, and all kinds of other relevant topics. Feel free to share your smart questions and/or brilliant solutions!
======================================================= | Sub groups
|
Message Boards
Problems running domain decomposition in Linux - various run time errors
SS
Steven Sandbach, modified 8 Years ago.
Problems running domain decomposition in Linux - various run time errors
Youngling Posts: 7 Join Date: 3/28/11 Recent Posts 00
Dear Delft3d community,
I have compiled version 5.00.00.1234 of delft3d on our Linux based cluster. I have been able to run the code both in serial and in parallel modes without problem. However, recently I set up a domain decomposition simulation and have run into a number of problems all of which seem to occur during runtime (reading and writing). I should note also that I have had no problems running these simulations on a windows based platform.
The simulation is of a river-tide-coastal system and uses astronomical water level/Neauman forcing in the seaward domain. The model has four domains and includes: wind, salinity and temperature processes and some include wave.
In the original set-up, I used the same wind file (*.wnd) for all domains. This was not a problem for windows configuration but I see online comments that state that it could cause a problem for Linux compilation, so I changed the set-up so that each domain reads its own *.wnd file. I then ran the simulation again and got the same error [see attached for full error: ERROR 1]:
ERROR: *** glibc detected ***
/home/ss/Delft/5.00.00.1234/bin/lnx/flow2d3d/bin/d_hydro.exe: free(): invalid pointer: 0x00002aaac4000078 ***
looking at the output file, the simulation had run for 110 model minutes which coincides with the time at which the wind data is read (the second entry is at 110 minutes). I therefore ran a simulation with wind process turned off. This ran for a little longer (30 hours) but then crashed with [see attached for full error: ERROR 2]:
ERROR: forrtl: severe (62): syntax error in format, unit -5, file Internal Formatted Read
This error seems to be related to the reading/computation of astronomical factors in asc.f90, but having looked at the code I can't see why. There are some other unknown factors which I can't trace back. Considering the error appears to be related to reading astro components, I did two further runs with NodalT=20 and NodalT=200,000 (longer than the test simulation). The run with NodalT=20 crashed after 14 hrs with [see attached for full error: ERROR 3]:
ERROR: forrtl: severe (8): internal consistency check failure, file for_intrp_fmt.c, line 1375
but the NodalT=200,000 ran without error. I then did some runs with wave and wind turned on using standard NodalT and NodalT=20 and 200,000. Each of these simulations ran for different time periods:
1. NodalT=std: 110 minutes,
2. NodlaT=20: 170 minutes and
3. NodlaT=200,000: 890 minutes before they crashed with the error:
*** ERROR: Communication with Delft3D-FLOW failed
I should also note that the NodlaT=20 simulation also crashed with an error similar to ERROR 1.
After writing that message, I found this forum posts:
http://oss.deltares.nl/web/opendelft3d/general/-/message_boards/view_message/165248#_19_message_164894
which is similar but I am using a different version of the code and in my case I have no problems running all the example cases. After reading this post, I downloaded d3dpublish.f90 and recompiled the code. I then re-ran some of these simulations:
1. Wave ON, wind on, NodalT not re-defined. The simulation crashed @ 290 mins with:
*** ERROR: Communication with Delft3D-FLOW failed
2. Wave OFF, wind on, NodalT not re-defined. I ran the simulations twice and it crashed with two different errors:
a) *** glibc detected *** @ t= 50 minutes (corresponds to 1st wind file entry)
b) ERROR: child killed: segmentation violation @ t=110 minutes (corresponds to 2nd wind file entry)
It would appear that there is a problem handling read-write and that this possible has something to do with how each domain waits for information to be communicated between the individual domains and wave.
Has anyone encountered these errors before? Does anyone have any suggestions?
Thanks
Regards
Steve
I have compiled version 5.00.00.1234 of delft3d on our Linux based cluster. I have been able to run the code both in serial and in parallel modes without problem. However, recently I set up a domain decomposition simulation and have run into a number of problems all of which seem to occur during runtime (reading and writing). I should note also that I have had no problems running these simulations on a windows based platform.
The simulation is of a river-tide-coastal system and uses astronomical water level/Neauman forcing in the seaward domain. The model has four domains and includes: wind, salinity and temperature processes and some include wave.
In the original set-up, I used the same wind file (*.wnd) for all domains. This was not a problem for windows configuration but I see online comments that state that it could cause a problem for Linux compilation, so I changed the set-up so that each domain reads its own *.wnd file. I then ran the simulation again and got the same error [see attached for full error: ERROR 1]:
ERROR: *** glibc detected ***
/home/ss/Delft/5.00.00.1234/bin/lnx/flow2d3d/bin/d_hydro.exe: free(): invalid pointer: 0x00002aaac4000078 ***
looking at the output file, the simulation had run for 110 model minutes which coincides with the time at which the wind data is read (the second entry is at 110 minutes). I therefore ran a simulation with wind process turned off. This ran for a little longer (30 hours) but then crashed with [see attached for full error: ERROR 2]:
ERROR: forrtl: severe (62): syntax error in format, unit -5, file Internal Formatted Read
This error seems to be related to the reading/computation of astronomical factors in asc.f90, but having looked at the code I can't see why. There are some other unknown factors which I can't trace back. Considering the error appears to be related to reading astro components, I did two further runs with NodalT=20 and NodalT=200,000 (longer than the test simulation). The run with NodalT=20 crashed after 14 hrs with [see attached for full error: ERROR 3]:
ERROR: forrtl: severe (8): internal consistency check failure, file for_intrp_fmt.c, line 1375
but the NodalT=200,000 ran without error. I then did some runs with wave and wind turned on using standard NodalT and NodalT=20 and 200,000. Each of these simulations ran for different time periods:
1. NodalT=std: 110 minutes,
2. NodlaT=20: 170 minutes and
3. NodlaT=200,000: 890 minutes before they crashed with the error:
*** ERROR: Communication with Delft3D-FLOW failed
I should also note that the NodlaT=20 simulation also crashed with an error similar to ERROR 1.
After writing that message, I found this forum posts:
http://oss.deltares.nl/web/opendelft3d/general/-/message_boards/view_message/165248#_19_message_164894
which is similar but I am using a different version of the code and in my case I have no problems running all the example cases. After reading this post, I downloaded d3dpublish.f90 and recompiled the code. I then re-ran some of these simulations:
1. Wave ON, wind on, NodalT not re-defined. The simulation crashed @ 290 mins with:
*** ERROR: Communication with Delft3D-FLOW failed
2. Wave OFF, wind on, NodalT not re-defined. I ran the simulations twice and it crashed with two different errors:
a) *** glibc detected *** @ t= 50 minutes (corresponds to 1st wind file entry)
b) ERROR: child killed: segmentation violation @ t=110 minutes (corresponds to 2nd wind file entry)
It would appear that there is a problem handling read-write and that this possible has something to do with how each domain waits for information to be communicated between the individual domains and wave.
Has anyone encountered these errors before? Does anyone have any suggestions?
Thanks
Regards
Steve