Excessive memory usage when running FLOW with MPI - D-Flow Flexible Mesh - Delft3D
intro story D-Flow FM
D-Flow Flexible MeshD-Flow Flexible Mesh (D-Flow FM) is the new software engine for hydrodynamical simulations on unstructured grids in 1D-2D-3D. Together with the familiar curvilinear meshes from Delft3D 4, the unstructured grid can consist of triangles, pentagons (etc.) and 1D channel networks, all in one single mesh. It combines proven technology from the hydrodynamic engines of Delft3D 4 and SOBEK 2 and adds flexible administration, resulting in:
An overview of the current developments can be found here. The D-Flow FM - team would be delighted if you would participate in discussions on the generation of meshes, the specification of boundary conditions, the running of computations, and all kinds of other relevant topics. Feel free to share your smart questions and/or brilliant solutions!
======================================================= | Sub groups
|
Message Boards
Excessive memory usage when running FLOW with MPI
Ben Williams, modified 8 Years ago.
Excessive memory usage when running FLOW with MPI
Jedi Knight Posts: 114 Join Date: 3/23/11 Recent Posts 00
Hi,
I've recently been run D3D FLOW in parallel using MPI.
I've noticed that if I keep a simulation running, the amount RAM my system is using increases incrementaly until WIndows7 falls over. It only takes a few minutes to do this. However when I check the threads running on the task manager, the flow2d3d threads do not appear to be increasing the amount of memory that they use. Nor do any other processes in the task manager seem to indicate that they are using more memory. When I kill the simulation, the amount of memory that the system is using does not decrease. And the total amount of memory that the task manager suggests the system is using does not correspond with the "wiggly blue line" that hows how much RAM is actually in use.
I find this occurs when using version 4.00.07.0892 and 5.00.08.1855. I compiled the code using Visual Studio 2008 with Intel Fortran Composer XE 2011 (i.e. basically Fortran 11.0), without modification from the code as downloaded using svn.
Does anyone else experience this memory issue and have they overcome it? Might it be related to the way the code is compiled? I also noticed that I don't get much of a speed-up when using MPI - maybe 50% when using 8 cores. Is this normal? I thought MPI scaled almost linearly for say less than 20 cores, depending on how it is implimented....
Thanks,
Ben
Information:
1) My system is a Dell XPS 8100 running 64bit WIndows7 with 8GB ram on a corei7 processor.
2) To run the let the system use mpiexec, I opened command prompt (as administrator), navigated to "C:\Delft3D\w32\flow\bin" and ran smpd.exe -install
3) The batch file I am using to run the simulations is
@ echo off
rem
rem This script runs Delft3D-FLOW parallel
rem
set argfile=config_flow2d3d.ini
rem Set the directory containing ALL exes/dlls here (mpiexec.exe, delftflow.exe, flow2d3d.dll, mpich-dlls, DelftOnline dlls etc.)
set exedir=C:\Delft3D\w32\flow\bin\
set PATH=%exedir%;%PATH%
rem Run
rem start computation on local cores (2 for dual core; 4 for quad core etc.):
mpiexec -n 4 -localonly deltares_hydro.exe %argfile%
pause
-----
I've recently been run D3D FLOW in parallel using MPI.
I've noticed that if I keep a simulation running, the amount RAM my system is using increases incrementaly until WIndows7 falls over. It only takes a few minutes to do this. However when I check the threads running on the task manager, the flow2d3d threads do not appear to be increasing the amount of memory that they use. Nor do any other processes in the task manager seem to indicate that they are using more memory. When I kill the simulation, the amount of memory that the system is using does not decrease. And the total amount of memory that the task manager suggests the system is using does not correspond with the "wiggly blue line" that hows how much RAM is actually in use.
I find this occurs when using version 4.00.07.0892 and 5.00.08.1855. I compiled the code using Visual Studio 2008 with Intel Fortran Composer XE 2011 (i.e. basically Fortran 11.0), without modification from the code as downloaded using svn.
Does anyone else experience this memory issue and have they overcome it? Might it be related to the way the code is compiled? I also noticed that I don't get much of a speed-up when using MPI - maybe 50% when using 8 cores. Is this normal? I thought MPI scaled almost linearly for say less than 20 cores, depending on how it is implimented....
Thanks,
Ben
Information:
1) My system is a Dell XPS 8100 running 64bit WIndows7 with 8GB ram on a corei7 processor.
2) To run the let the system use mpiexec, I opened command prompt (as administrator), navigated to "C:\Delft3D\w32\flow\bin" and ran smpd.exe -install
3) The batch file I am using to run the simulations is
@ echo off
rem
rem This script runs Delft3D-FLOW parallel
rem
set argfile=config_flow2d3d.ini
rem Set the directory containing ALL exes/dlls here (mpiexec.exe, delftflow.exe, flow2d3d.dll, mpich-dlls, DelftOnline dlls etc.)
set exedir=C:\Delft3D\w32\flow\bin\
set PATH=%exedir%;%PATH%
rem Run
rem start computation on local cores (2 for dual core; 4 for quad core etc.):
mpiexec -n 4 -localonly deltares_hydro.exe %argfile%
pause
-----
Ben Williams, modified 8 Years ago.
RE: Excessive memory usage when running FLOW with MPI
Jedi Knight Posts: 114 Join Date: 3/23/11 Recent Posts 00
Update: I tried on a different system (64bit windows 7, 2x Xeon E5-2620 (12 cores total)) and did not experience this memory problem - the run seems stable.
However I'm not getting much of an improvement in running speed - I have a test simulation which runs for 1hr 30min single core, and 55 minutes when spread across 10 cores. Is this typical for MPI runs? I would have expected at least a 5x speedup, not x1.25.......
Are there options for 'fine-tuning' mpi runs?
Thanks,
Ben
However I'm not getting much of an improvement in running speed - I have a test simulation which runs for 1hr 30min single core, and 55 minutes when spread across 10 cores. Is this typical for MPI runs? I would have expected at least a 5x speedup, not x1.25.......
Are there options for 'fine-tuning' mpi runs?
Thanks,
Ben
Bert Jagers, modified 8 Years ago.
RE: Excessive memory usage when running FLOW with MPI (Answer)
Jedi Knight Posts: 201 Join Date: 12/22/10 Recent Posts 00
Hi Ben,
Thanks for sharing this information with us. I haven't heard of increasing memory consumption when running the mpi-version. I'm not an expert on this, but maybe there is an issue with the specific combination of the mpi library against which Delft3D was linked with the mpi daemon running on the machine on which you observe the memory issue.
The performance of mpi parallelization depends on the details of your model:
* how big is the grid domain (since Delft3D currently only cuts the model along 1 grid dimension: ideally your model is a rectangle, not a square)
* which part of the grid domain is filled (ideally all your grid points are active)
* what processes have you switched on
and the type of hardware you are using:
* in case of a cluster: what are the interconnects
* do you have sufficient memory
* what is the bandwidth between memory and processor (Delft3D is quite memory intensive; a multi-core processor has only one channel/cache for all communication between processor and memory -- if the bandwidth and cache are small then the system won't be able to provide enough data to the processor to keep it running at maximum speed; I would recommend big cache, fast memory access, and multiple processor over multiple core)
Because the performance on our own cluster didn't seem to be close to optimal either, we recently started testing Delft3D for a high resolution 2D river model (30 km x 1.2 km at 2 m resolution: a 15000 x 600 grid points model). On a French supercomputer this model turned out to scale linearly up to 512 processors. So, with the right hardware and the right model, scaling can be close to optimal. A 3D model on a smaller and more sparsely filled grid (but still a very large model) scales up to 128 processors.
As primary limiting factors we so far have identified:
* degree of filling
* interconnect speed on clusters
* memory bandwidth between memory and processor, and processor cache
Success,
Bert
Thanks for sharing this information with us. I haven't heard of increasing memory consumption when running the mpi-version. I'm not an expert on this, but maybe there is an issue with the specific combination of the mpi library against which Delft3D was linked with the mpi daemon running on the machine on which you observe the memory issue.
The performance of mpi parallelization depends on the details of your model:
* how big is the grid domain (since Delft3D currently only cuts the model along 1 grid dimension: ideally your model is a rectangle, not a square)
* which part of the grid domain is filled (ideally all your grid points are active)
* what processes have you switched on
and the type of hardware you are using:
* in case of a cluster: what are the interconnects
* do you have sufficient memory
* what is the bandwidth between memory and processor (Delft3D is quite memory intensive; a multi-core processor has only one channel/cache for all communication between processor and memory -- if the bandwidth and cache are small then the system won't be able to provide enough data to the processor to keep it running at maximum speed; I would recommend big cache, fast memory access, and multiple processor over multiple core)
Because the performance on our own cluster didn't seem to be close to optimal either, we recently started testing Delft3D for a high resolution 2D river model (30 km x 1.2 km at 2 m resolution: a 15000 x 600 grid points model). On a French supercomputer this model turned out to scale linearly up to 512 processors. So, with the right hardware and the right model, scaling can be close to optimal. A 3D model on a smaller and more sparsely filled grid (but still a very large model) scales up to 128 processors.
As primary limiting factors we so far have identified:
* degree of filling
* interconnect speed on clusters
* memory bandwidth between memory and processor, and processor cache
Success,
Bert