error (72) floating overflow

jmueller · Post by **jmueller** » Mon Mar 23, 2020 9:38 pm

Trying to run a simulation in the parallel version of MICRESS, but it doesn't make it past the first timestep. I don't know what this error is means or is referring to, so I'm not sure how to address it.

Thanks for any help,
Josh

The following is the output and error upon starting the simulation:

Remaining license time: permanent

==================================================

Time t = 0.0000000 s
CPU-time: 3 s
Current phase-field solver time step = 1.00E+18 s
Average conc. of comp. 1 = 7.1900000 wt%
Average conc. of comp. 2 = 0.2400000 wt%
Temperature at the bottom = 298.00 K
Temperature gradient = 0.00000 K/cm
Fraction of phase 0: 0.00000
Fraction of phase 1: 1.00000
Fraction of phase 2: 0.00000
Updating of diffusion data from database...
forrtl: error (72): floating overflow
Image PC Routine Line Source
libtq-linux-x86_6 00002AD1F44D8499 Unknown Unknown Unknown
libtq-linux-x86_6 00002AD1F44D6D6E Unknown Unknown Unknown
libtq-linux-x86_6 00002AD1F449EEAF Unknown Unknown Unknown
libtq-linux-x86_6 00002AD1F444BD8F Unknown Unknown Unknown
libtq-linux-x86_6 00002AD1F445088D Unknown Unknown Unknown
libpthread.so.0 00002AD2127E55D0 Unknown Unknown Unknown
MICRESS_par_TQ 0000000000AF95A1 Unknown Unknown Unknown
MICRESS_par_TQ 000000000084CDD2 Unknown Unknown Unknown
MICRESS_par_TQ 000000000040C0B6 Unknown Unknown Unknown
libc.so.6 00002AD212D163D5 Unknown Unknown Unknown
MICRESS_par_TQ 000000000040BFA9 Unknown Unknown Unknown
Aborted (core dumped)

Bernd · Post by **Bernd** » Mon Mar 23, 2020 10:57 pm

Dear Josh,

This is a hard crash which does not give us a direct hint apart from that TQ-subroutines are involved (libtq). However, it seems that the crash appears either during or after updating of diffusion data. You can find this out by checking whether the .diff output has already been completely and correctly written, or by not using diffusion data from database and trying again.
If it is due to reading diffusion data from database, it could be either that your input of diffusion data in the driving file is incorrect in some weird way, or the .ges5 file could be corrupt or incompatible. In the latter case it could be that redoing the .ges5 file (with appending compatible diffusion data) is sufficient, preferably with a more recent version of Thermo-Calc.
Finally, it could be that your computer simply has not enough memory. If you cannot find anything which could be the culprit, send us again your latest complete set of input files, and we will have a look with out tools.

Best wishes
Bernd

jmueller · Post by **jmueller** » Wed Mar 25, 2020 4:20 pm

Thank you, Bernd for the advice. However, I've tried troubleshooting, and I have not been able to avoid the error.

I checked the .diff output; the diffusion data had been updated for the two time-steps, however the value of the activation energy is zero for all elements/gradients. Also, this was the case after I updated the GES file.

Regarding the machine memory, I don't believe this is an issue. I've tried running the simulation using up to 180 GB and still get the error.

I'll email my input files as well as the .log and .diff outfiles.

Thanks for the help.

Josh

Bernd · Post by **Bernd** » Wed Mar 25, 2020 9:41 pm

Dear Josh,

I cannot reproduce your problem even trying with different sub-versions and with and without parallelization. Does your crash happen only with the parallel version?

I would advice you, if possible, to switch to the newest version 7 of MICRESS. Otherwise, if serial works, you should do that. Using parallelization, anyway, is useless until you have optimized your application in serial. Also it seems your application is using a lot of TQ time which cannot be run in parallel at present...

Bernd

jmueller · Post by **jmueller** » Fri Mar 27, 2020 6:32 pm

Thank you for looking into this, Bernd.

I don't have any issues when I run this simulation on my desktop either, only when I use the parallel version.

This particular simulation isn't exactly the concern as far as reducing simulation time via parallelization, but I have other simulations planned that I'm certain will be very computationally taxing due to diffusion calculations. Right now I'm just trying to get everything in order and running smoothly on one of our high-performance machines so that later I can submit my other simulations that I believe will benefit from parallelization.

I have some other questions though, regarding the current error I'm running into. Is there an optimal version of Thermo-Calc that I should be using for MICRESS 6.4? Also, is it maybe an issue that I'm creating GES files with TC on a microsoft OS, but then trying to use those GES files with the linux version of MICRESS 6.4? When I run TC and MICRESS on my desktop, both are running on a microsoft OS; however, the parallel version of MICRESS I am running is on linux OS.

Thanks,
Josh

ralph · Post by **ralph** » Mon Mar 30, 2020 9:59 am

I do not know from any problem regarding changing OS between creation and usage of GES5 files.
To exclude this generally possible source of error, testing would be best.

MICRESS 6.4 uses a TQ library from Thermo-Calc 2017a.
As far as I know, Thermo-Calc does not break backwards compatibility since 2017, i.e. current version TC2020a should be compatible.
Same as above, a test says more.

Ralph

jmueller · Post by **jmueller** » Tue Mar 31, 2020 9:59 pm

We've tried some more troubleshooting; here's where were at:
The input file seems to run fine on our Windows machines (serial version of MICRESS 6.400) regardless of where/what OS the GES file was made.
The input file always causes this same error on our linux HPC machine (parallel version of MICRESS 6.402) regardless of where/what OS the GES file was made.

Could there somehow be an issue with the linux version of MICRESS we have?
Is there anything different in 6.402 that could be causing the error?

We are going to install the linux version on different machine as well and see if we get the same error.

Thanks for the help,
Josh

mrobbert · Post by **mrobbert** » Wed Apr 01, 2020 2:50 am

I am the SysAdmin for the system where Josh is trying to run so I am responsible for the installation of the software. I have done some more testing since I last updated Josh and I have found that the problem appears to be something to do with TQ coupling. I started trying some of the Example files that ship with the install and I can get any that don't do TQ coupling to work, but all the ones with TQ coupling are core dumping. For instance T010_Gamma_Alpha.dri works fine, but T011_Gamma_Alpha_TQ.dri core dumps with the error:

forrtl: error (73): floating divide by zero

I have tried versions 6.400 and 6.402. The cluster we're trying to use is CentOS 7 so I copied it to another cluster that is running CentOS 6 and got the same result. Is it possible that I've missed something in the install that could cause this to happen? Might there be some other OS level setting that we've implemented that could cause this?

Thanks,
Mike

ralph · Post by **ralph** » Wed Apr 01, 2020 11:12 am

Hi Josh, Mike,

the floating point exception occurs in both version - Windows and Linux - while the TQ library copes with the requested calculations and thermo-dynamic data. Not all exceptions are caught in this library. Usually, this does not make the results worse because results from such errornous calculations will be ignored anyway.

The problem is that the compilation of MICRESS 6.402 is different on Windows and Linux. On Linux the exceptions are not disabled.
At this point, you need an update of your MICRESS version for Linux which the commercial support can provide.
Check your email.

Best,
Ralph

jmueller · Post by **jmueller** » Mon Apr 06, 2020 12:49 am

Ralph,

Thanks for sending the updated version. It appears to be running the input file without errors; however, there is a discrepancy between the simulation when ran on the parallel version of MICRESS 6.406 on linux and serial MICRESS 6.4 on Windows. The simulation appears to generate nucleation events for all my prompted inputs on the Windows version, but not on the linux version. Its doesn't give an error, it just doesn't generate any nucleation events for one of the nucleation events that I specify in the input.

Josh

MICRESS Forum

error (72) floating overflow

error (72) floating overflow

Re: error (72) floating overflow

Re: error (72) floating overflow

Re: error (72) floating overflow

Re: error (72) floating overflow

Re: error (72) floating overflow

Re: error (72) floating overflow

Re: error (72) floating overflow

Re: error (72) floating overflow

Re: error (72) floating overflow