Forum_general

General

At this page you can post questions or start discussions on general topics related to Delft3D Flexible Mesh.

Please select a proper category below (if possible), to post your message or reply to an existing post. Please add tags to your posts to simplify searching.

 

** PLEASE TAG YOUR POST! **

 

 

 

 


Message Boards

Partitioning mesh with dflowfm on Linux - segfault

BH
Bogdan Hlevca, modified 3 Years ago.

Partitioning mesh with dflowfm on Linux - segfault

Youngling Posts: 13 Join Date: 4/25/16 Recent Posts
Hello,

In the manual (5.2.2) is specified that MDU and the mesh can be partitioned form Command Line (CLI) with dflowfm.
It does not work, see below. While for the MDU file we can use
generate_parallel_mdu.sh
for the mesh currenly the only option for me is using Deltashell.

I prefer doing things from the CLI especially that Deltashell has a bug related to partitioning the MDU file, specifically it does not preserve certain Physical parameters such entry for ext file for wind time series.

The error seems to be related to
petsc
, but I tracked it down and I think it chokes somewere in the
metis
library. Below is an output from a release version with no debug information ,but when I had
petsc
in debug mode the debuggers stopped after a call to a
metis
function.

Since there are no format tools for error reporting I am doing it here. I have another one for GUI. I suggest to add a forum entry for bug reports if Track/bugzilla, etc. are not available yet.

Is there any other way to partition the mesh on Linux?


ERROR output below:
========================================================================================

dflowfm 1.1.171 (Linux) can't partition meshes anymore (1.1.148 if I remember well did):

dflowfm --partition:ndomains=16 th_all_latest_net.nc

[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[0]PETSC ERROR: or try http://valgri
nd.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run  
[0]PETSC ERROR: to get more information on the crash.
[0]PETSC ERROR: --------------------- Error Message ------------------------------------
[0]PETSC ERROR: Signal received!
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Petsc Release Version 3.4.0, May, 13, 2013  
[0]PETSC ERROR: See docs/changes/index.html for recent updates.
[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[0]PETSC ERROR: See docs/index.html for manual pages.
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: dflowfm on a arch-linux2-c-opt named high by bogdan Tue May 10 00:01:31 2016
[0]PETSC ERROR: Libraries linked from /opt/delft3d-fm/petsc/3.4.0/lib
[0]PETSC ERROR: Configure run at Mon May  9 19:23:32 2016
[0]PETSC ERROR: Configure options --prefix=/opt/delft3d-fm/petsc/3.4.0 --with-mpi-dir=/opt/delft3d-fm/openmpi --downloa
d-f-blas-lapack=1 --FOPTFLAGS="-xHOST -O3 -no-prec-div" --with-debugging=0 --with-shared-libraries=1 --COPTFLAGS="-O3 "
[0]PETSC ERROR: ---------------------------------------------------------------------
MK
Michal Kleczek, modified 3 Years ago.

RE: Partitioning mesh with dflowfm on Linux - segfault

Padawan Posts: 53 Join Date: 10/23/14 Recent Posts
Dear Bogdan,

You can report your issues or suggestions on this forum. You can also contact our support desk via: support@deltaressystems.nl (here more contact details).

Now regarding your issue:
1. Did you compile your Linux version yourself?
If yes then in order to use our default partitioning method on Linux you need to compile the DflowFM with Metis.

2. If you are using the pre-compiled cli version, did you add
$PROPER_DFLOWFM_PATH/bin and $PROPER_DFLOWFM_PATH/lib
directory to your
PATH and LD_LIBRARY_PATH
environment variables?
Our default partitioning method is using METIS partitioning, hence, dflowfm is using METIS library. If you are not interested in this method or don't want to use METIS library then you can provide your own partitioning polygon (more on that in the mentioned by you section of User Manual, 5.2.2).

If none of the suggestions above helped to solve your problem could you please attach your network and model file? I tested partitioning with:
Deltares, D-Flow FM Version 1.1.171.45182, Mar 22 2016, 20:29:57
Compiled with support for:
IntGUI: no
OpenGL: no
OpenMP: yes
MPI   : yes
PETSc : yes
METIS : yes

with a semi-random model and could not reproduce your issue.
To check your version you can run the following command:
dflowfm --version

Above you can also see that 'my' version was compiled with METIS and PETSc support, therefore, both have to be available in my
LD_LIBRARY_PATH
for a proper program execution.

Kind regards,
Michal
BH
Bogdan Hlevca, modified 3 Years ago.

RE: Partitioning mesh with dflowfm on Linux - segfault

Youngling Posts: 13 Join Date: 4/25/16 Recent Posts
Dear Michal,

Yes I compiled dflowfm myself with METIS 5.1.0. Everything works fine except partitioning. v 1.1.148 was partitioning with no crashes.
As you can see below I have a slightly newer version than yours.


bogdan@high:~> dflowfm --version
Deltares, D-Flow FM Version 1.1.171.44642, May  9 2016, 21:11:06
Compiled with support for:
IntGUI: no
OpenGL: no
OpenMP: yes
MPI   : yes
PETSc : yes
METIS : yes


Also, LD_LIBRARY_PATH is fine as ldd finds all the libraries:


bogdan@high:~> ldd /opt/delft3d-fm/bin/dflowfm
        linux-vdso.so.1 (0x00007fff51e96000)
        libmpi_cxx.so.1 => /opt/delft3d-fm/openmpi/lib64/libmpi_cxx.so.1 (0x00007fb2fcb07000)
        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007fb2fc721000)
        libnetcdff.so.6 => /opt/delft3d-fm/netcdf/4.4/lib64/libnetcdff.so.6 (0x00007fb2fc4bd000)
        libnetcdf.so.11 => /opt/delft3d-fm/netcdf/4.4/lib64/libnetcdf.so.11 (0x00007fb2fc1bb000)
        libmpi_usempif08.so.11 => /opt/delft3d-fm/openmpi/lib64/libmpi_usempif08.so.11 (0x00007fb2fbf8b000)
        libmpi_usempi_ignore_tkr.so.6 => /opt/delft3d-fm/openmpi/lib64/libmpi_usempi_ignore_tkr.so.6 (0x00007fb2fbd84000)
        libmpi_mpifh.so.12 => /opt/delft3d-fm/openmpi/lib64/libmpi_mpifh.so.12 (0x00007fb2fbb22000)
        libmpi.so.12 => /opt/delft3d-fm/openmpi/lib64/libmpi.so.12 (0x00007fb2fb841000)
        libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x00007fb2fb62f000)
        libopen-rte.so.12 => /opt/delft3d-fm/openmpi/lib64/libopen-rte.so.12 (0x00007fb2fb3b1000)
        libopen-pal.so.13 => /opt/delft3d-fm/openmpi/lib64/libopen-pal.so.13 (0x00007fb2fb0d2000)
        libpciaccess.so.0 => /usr/lib64/libpciaccess.so.0 (0x00007fb2faec8000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007fb2facc3000)
        librt.so.1 => /lib64/librt.so.1 (0x00007fb2faabb000)
        libgfortran.so.3 => /usr/lib64/libgfortran.so.3 (0x00007fb2fa798000)
        libm.so.6 => /lib64/libm.so.6 (0x00007fb2fa496000)
        libutil.so.1 => /lib64/libutil.so.1 (0x00007fb2fa293000)
        libpetsc.so => /opt/delft3d-fm/petsc/3.4.0/lib/libpetsc.so (0x00007fb2f9515000)
        libmetis.so => /opt/delft3d-fm/metis/lib/libmetis.so (0x00007fb2f9287000)
        libgomp.so.1 => /usr/lib64/libgomp.so.1 (0x00007fb2f9064000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fb2f8e4d000)
        libquadmath.so.0 => /usr/lib64/libquadmath.so.0 (0x00007fb2f8c0d000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fb2f89f0000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fb2f8648000)
        /lib64/ld-linux-x86-64.so.2 (0x000055ed53a29000)
        libnl-route-3.so.200 => /usr/lib64/libnl-route-3.so.200 (0x00007fb2f83f2000)
        libnl-3.so.200 => /usr/lib64/libnl-3.so.200 (0x00007fb2f81d4000)                                              
        libX11.so.6 => /usr/lib64/libX11.so.6 (0x00007fb2f7e95000)                                                    
        libxcb.so.1 => /usr/lib64/libxcb.so.1 (0x00007fb2f7c74000)                                                    
        libXau.so.6 => /usr/lib64/libXau.so.6 (0x00007fb2f7a70000)


I attached the MDU file and the network.

Regards,
Bogdan
MK
Michal Kleczek, modified 3 Years ago.

RE: Partitioning mesh with dflowfm on Linux - segfault

Padawan Posts: 53 Join Date: 10/23/14 Recent Posts
Hey Bogdan,

I have tested the partitioning of your network on linux with several versions of our software (newer as well as older, including version 1.1.171.44642).
For me partitioning of the network goes smoothly and I do not have any problems, certainly not related to the PETSc.
I still suspect the selection of libraries that your version was compiled with. From the ldd listing that you provided above, I see that you have compiled DflowFM with OpenMPI library. Is your PETSc also compiled with the same mpi library? My experience says PETSc is rather sensitive so a program in which it is used should be compiled with the same version of MPI.
Could you perhaps try to recompile your version with a version of MPICH? Previously we have tested DflowFM with OpenMPI as the default mpi library, however, due to many problems with portability (among others) we now are using MPICH 3.1.4.
Could you elaborate a bit more on how did you compile your version? Could you please let me know what operating system do you have?
Our standard settings for linux machine are:
system: CentOS 2.6
Compiler: Intel 14.0.3 (but also gcc 4.9.1)
MPI: mpich 3.1.4
Metis: 5.1.0
PETSc: 3.4.0
netcdf: 4.3.2/4.4.0


Kind regards,
Michal
BH
Bogdan Hlevca, modified 3 Years ago.

RE: Partitioning mesh with dflowfm on Linux - segfaults

Youngling Posts: 13 Join Date: 4/25/16 Recent Posts
Hi Michal,

I also don't think that PetSC is the problem. I run in parallel very well on our hyper computing cluster with no issues.

The only thing that fails is the partitioning and although it is shown in the stack that PETSc failed it actually chokes on a Metis check (found under debugger when compiled with debugging info)

Yes, I compiled everything with the same OpenMP, including PETSc.
The reason for using and compiling OpenMPI is that mpi.mod (Fortran) module must have been compiled with the same Fortran compiler. Indeed, compatibility is a problem with the Fortran module but not with the OpenMPI itself. Because I work on the on the hyper cluster and also on my workstation, I wanted to use exactly the same environment/libraries. The ones provided by the hyper computing cluster environment were giving me all sorts of problems so I compiled all the dependencies by hand.

Everything works fine except the partitioning, which I can do with the GUI but I would have preferred to do from linux from command line.

I use :
GCC 5.0
OpenMPI 5.10
Metis 5.1.0
netcdf4.4.0  (4.2 works also)
PETSc 3.4.0 (3.4.5 works also)


My Workstation OS is an
OpenSuSE 42.1

The Hyper computing cluster uses
CentOS release 6.4 (Final)


The segfault during partitioning happened on OpenSuSE, but while writing this I tested on the CentOS machines and I get exactly the same error.

Since the differences between my setup and yours is GCC vs Intel and OpenMPI vs MPICH I beleive that one of these if not both may have something to do with the problem.

It would be interesting to see if you can reproduce the problem by using OpenMPI and GCC. I can only try using the MPICH as I don't have access to Intel compilers.

Kind Regards,
Bogdan
MK
Michal Kleczek, modified 3 Years ago.

RE: Partitioning mesh with dflowfm on Linux - segfaults

Padawan Posts: 53 Join Date: 10/23/14 Recent Posts
Dear Bogdan,

I do apologize if I was not clear previously but we do test our software with gnu compiler as well. We are aware that not all clients have access to pretty expensive Intel compiler. To be sure I have once again recompiled our 1.1.3 Linux version with following libraries:
gcc 4.9.1
necdf 4.3.2/4.4.0
openmpi 1.8.3 (compiled with above-mentioned gcc)
metis 5.1.0
petsc 3.4.0 (compiled with above-mentioned openmpi and gcc)

I have tested also with GCC 4.9.2 and 4.9.3 but I have not tested our software with GCC version 5.
I also have to admit I am a bit puzzled by mentioned by you OpenMPI version as far I am aware they keep versions with following formatting: x.x.x. If I am not mistaken the most recent version is 1.1.10.
I am a bit afraid that you are mixing OpenMP with OpenMPI, I apologise if I take wrong assumptions here. Could you please verify which version of MPI library you are using on your system? We do require an MPI library and we currently use by default MPICH but other MPI libraries should not cause additional problems. Perhaps in your previous post there was only a typo, better to be sure before we come to wrong conclusions.

Coming back to my test with the version 1.1.3 I was able to partition your network with dflowfm compiled with above-mentioned libraries. Therefore, for me partitioning of your network file (th_all_15m-180m-orth_xyz_latest_net.nc) when using version compiled with GNU and OpenMPI was successful.
Note that, partitioning of the mdu was causing Segmentation Violation (and leading to PETSC error) if DflowFM was compiled with GNU, this is already fixed and will be included in the next release. This was only affecting combination GNU and partitioning of MDU.

Kind regards,
Michal
BH
Bogdan Hlevca, modified 3 Years ago.

RE: Partitioning mesh with dflowfm on Linux - segfaults

Youngling Posts: 13 Join Date: 4/25/16 Recent Posts
Dear Michal,

Thanks for checking the builds. I apologize for my mistake when I documented the OpenMPI version. Please find below the corrected versions:

gcc version 5.3.1
OpenMPI  1.10.2 (compiled with above-mentioned gcc)
necdf 4.3.2/4.4.0 (compiled with above-mentioned gcc)
metis 5.1.0 (compiled with above-mentioned gcc)
petsc 3.4.0 (compiled with above-mentioned openmpi and gcc).  I tried 3.4.5 and I see no difference


It could be possible that these slight differences between our versions, in particular GCC and OpemMPI may cause these crashes.

I am looking forward for the next release that fixes the MDU partitioning segfault and perhaps the mesh segfault for me. Perhaps with my version of GNU compiler dflowfm exhibits the segfault for the mesh as well as for the MDU partitioning (which you are experiencing).

Best regards,
Bogdan