PDA

View Full Version : Fedora 14 and Gambit 2.4.6 crash on startup


lucionas
23rd December 2010, 02:45 AM
Hi all, never had this problem before. However, I've recently installed the Fedora Core 14 on my laptop. Unfortunately, when I try to launch Fluent preprocessor Gambit 2.4.6 it immediately crashes giving the following message:

Received exception: Assertion failed in /usr/local/fluent/build/gambit/dev/gambit_lnamd64/include/BPPTREE_CC.hpp, line 288: status == 0

I'm absolutely clueless of what might be the cause of this problem (all dependencies are resolved, at least all i know of, and all the updates applied, however the problem occurred both pre- and post-update). As i have already mentioned, there were no particular (aside from the libXm.so.3 ones...) difficulties in using it on former Fedora releases.
On the other hand Fluent works... well, fluently ;)

If anyone has any idea how to touch the subject, I'd mostly appreciate.



Merry Xmas to everyone celebrating it, and happy other holidays to everyone else.

lucionas
24th December 2010, 03:15 PM
Hi all again, seems as if no one has any idea after all. However, I have managed to launch, somewhat successfully, gambit by running the command on a file, i.e.:

~/Fluent.Inc/bin/gambit ~[home-folder]/somefile.dbs

It worked, I could pretty much do anything with it. However, whenever I tried to save the changes, export mesh, or open a new project - it crashes with SIGSEGV. The terminal gives a message (before clicking OK in the Fatal Error window) about DbFree error, i.e.

ERROR: DbFree 4294847504.0 does not exist

This puzzles me even further. I've already run chmod777 on gambit (my first tought was that it might be due t some privilage issues or sonmething), but now am absolutely clueless what might be the cause.

irinat
23rd January 2012, 11:57 AM
Well, 2012 is here. But that bug is still here too. I digged a bit and my guess is that bug related to memcpy change in glibc. Now on some processors memcpy was changed and now it copies backwards. That new glibc version was introduced in F14 (and then spread to other distributions).
I'm Debian user, so here is solution for Debian testing:
$ LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libc/memcpy-preload.so /opt/Fluent.Inc/bin/gambit

I don't know if any fedora glibc packages have similar wrapper for memcpy, so here is its source:

/* Copyright (C) 2011, Aurelien Jarno <aurelien woof-woof aurel32 dot net>

This program is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
02111-1307 USA. */

#include <dlfcn.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <syslog.h>
#include <sys/time.h>

void *memcpy(void *dst, const void *src, size_t n)
{
#ifndef NOLOG
uintptr_t usrc, udst;

/* Convert to unsigned as arithmetic on pointer is undefined */
udst = (uintptr_t) dst;
usrc = (uintptr_t) src;

/* Check if source and destination overlap */
if (((udst < usrc) && ((udst + n) > usrc)) ||
((usrc < udst) && ((usrc + n) > udst))) {

static time_t lastlog = -1;
struct timeval tv;

/* gettimeofday() is not expensive for the conditions we target in
* the Debian package (kernel >= 2.6.26, x86-64 architecture), this
* might need to be changed if this wrapper is later needed with
* different conditions */
gettimeofday(&tv, NULL);

/* Don't spam syslog, limit to (roughly) one log entry per second. */
if (tv.tv_sec > lastlog) {
lastlog = tv.tv_sec;
syslog(LOG_WARNING | LOG_USER,
"source and destination overlap in memcpy() at ip %p",
__builtin_return_address(0));
}
}
#endif

/* Call memmove() instead of memcpy() */
return memmove(dst, src, n);
}

Compile it with:
gcc -D_GNU_SOURCE -DNOLOG -O2 -Wall -fPIC -shared -o memcpy-preload.so memcpy-preload.c
then place somewhere, change LD_PRELOAD accordingly. Remove -DNOLOG if you really want tonns of messages in your syslog.

That trick solved problem for me, it seems.

adamv
24th January 2012, 01:29 AM
Hi irinat,

Thanks for clarifying the source of the issue! Although I'm not a novice user, I'm by no means an expert on Linux binaries... I just tried duplicating your suggestion on my multilib x64 Gentoo install + 32bit Gambit with no luck. I definitely need to look around on Google some more to try and figure out the cause, but perhaps you could point me in the right direction?

Here's what I'm seeing when I try to preload:

Starting /usr/local/Fluent.Inc/gambit2.3.16/lnx86/gambit ...
ERROR: ld.so: object '/home/adam/temp/memcpy-preload.so' from LD_PRELOAD cannot be preloaded: ignored.
Gambit build SP2006033020.

...

-Adam

irinat
24th January 2012, 10:30 AM
Here's what I'm seeing when I try to preload:

Starting /usr/local/Fluent.Inc/gambit2.3.16/lnx86/gambit ...
ERROR: ld.so: object '/home/adam/temp/memcpy-preload.so' from LD_PRELOAD cannot be preloaded: ignored.
Gambit build SP2006033020.


There are two bugs I've seen with Gambit. One is with 2.3.16 version which freezes all your desktop when you open dropdown menu by right mouse button. You know, when you want to create cylinder but box currently selected. You press right mouse button, menu appears, but whoops!-- no program react on further keyboard or mouse input anymore. And the only way to unfreeze is to switch to text console and kill gambit process. I don't know how to solve issue. The only clue I have -- I noticed the bug after X.org update to 1.11. Gambit 2.3.16 statically linked with Openmotif libraries, so it's impossible to use patch for Openmotif which you probably found in the internet.

Second bug is related to 2.4.6. When one tries to start that version, it opens GUI and immediately displays an error. Something related to gambit database file. That issue can be solved with preload trick (see post above).

And another thing I should mention. When you preload library it should be the same architecture. Your version of Gambit is 32-bit (lnx86). So to try if this works for you, add "-m32" to gcc parameters. That gives you 32-bit .so .

adamv
25th January 2012, 04:35 AM
Thanks! -m32 did the trick, and I was able to get x86 Gambit to turn on using the LD_PRELOAD method. Unfortunately, I could draw geometry but I have not been able to mesh using the GUI using mouse commands or an old journal file (it throws SIGSEGV). I have, however, been able to generate meshes with the same journal file from the console without the GUI. I assumed if I got a copy of Gambit amd64 my life would be easier, and the licensing people at my school were able to find a copy. Unfortunately, it suffered a similar fate... and I had to find additional libraries to get it to go, specifically:

-> Installed openmotif and created a symbolic link from libXm.so.4.0.3 to libXm.so.3
-> Also tried a copy of libXm.so.3.0.3 from an old openSUSE RPM
-> Installed Gentoo's emul-linux-x86-compat
-> Also tried a copy of libstdc++.so.5.0.7 from an old openSUSE RPM

The x86 version of Gambit already had these libraries.

Using LD_DEBUG=lib or LD_DEBUG=all with LD_DEBUG_OUTPUT=somefile I haven't been able to find any specific cause for the SIGSEGV, although I'm sure it's in there (LD_DEBUG=all produces A LOT of data and LD_DEBUG=lib doesn't provide any detail).

Here's a snippet of the error when running the journal file that works on the console but not the GUI:

==> GAMBIT.26337/jou (tail) <==
25 transition 1 trows 0 wedge uniform
blayer attach "wallBoundaryLayer" face "face_fluidVolume" "face_fluidVolume" \
"face_fluidVolume" "face_fluidVolume" edge "edge_symmetry" "edge_startRoad" \
"edge_roadBoundaryLayer" "edge_endRoad" add
sfunction create sourceedges "edge.12" "edge.11" "edge.14" "edge.13" \
startsize 1 growthrate 1.1 sizelimit 20 attachfaces "face_fluidVolume" fixed
sfunction create sourceedges "edge_roadBoundaryLayer" startsize 2 growthrate \
1.18 sizelimit 20 attachfaces "face_fluidVolume" fixed
/SIG[ 1] occurred in the next command!
sfunction bgrid attachedges "edge.12" "edge.11" "edge.14" "edge.13"

==> GAMBIT.26337/trn (tail) <==
Size function:sfunc.2 attached to face_fluidVolume.
Command> edge mesh $edge_airfoilA1 $edge_airfoilA2 $edge_airfoilB1 $edge_airfoilB2
CPU time used to initialize fixed size function sfunc.1 (sec.) = 0.00
CPU time used to initialize fixed size function sfunc.2 (sec.) = 0.00
Start to generate background grid for face_fluidVolume
The start date and time = Tue Jan 24 22:51:49 2012
ERROR: Please retain a copy of the GAMBIT.26337/jou, GAMBIT.26337/trn, and
GAMBIT.26337/*.dbs files, any imported geometry and any relevant
errors or warnings you see above in this window and contact
support at your local Fluent office or distributor.

... and, for reference, below is the quick wrapper shell script I've been using to kick off amd64 Gambit. It's worth mentioning that LD_PRELOAD tries to replace memcpy in every script program that leads up to the actual Gambit binary. It will throw the "cannot be preloaded: ignored error" for any amd64/x86 mismatch.

==> /usr/local/bin/gambit <==
export FLUENT_ARCH=lnamd64
export PATH=/usr/local/Fluent.Inc/bin:${PATH}
export LD_PRELOAD=memcpy-preload-64.so
export LD_LIBRARY_PATH=/usr/local/lib64
/usr/local/Fluent.Inc/bin/gambit

I've sure learned quite a bit about Linux binary debugging while getting this to partially work, but I'm afraid I'm not sure what's the next step... do you have any other suggestions? I imagine I need to look very closely at each of the ~20 files that LD_DEBUG=all provided.

adamv
25th January 2012, 04:42 AM
Here's the full journal file, for reference.

adamv
25th January 2012, 10:01 AM
I ran Gambit within gdb, and attached is the program state when it throws the exception running the asdf.txt journal file in GUI mode.

export FLUENT_ARCH=lnamd64
export FLUENT_INC=/usr/local/Fluent.Inc
export LD_PRELOAD=memcpy-preload-64.so
export LD_LIBRARY_PATH=/usr/local/lib64:/usr/local/Fluent.Inc/gambit2.4.6/lnamd64:/usr/local/Fluent.Inc/gambit2.4.6/acis/bin/linux_amd_64_so:/usr/local/Fluent.Inc/gambit2.4.6/spatial_index/lnamd64
gdb /usr/local/Fluent.Inc/gambit2.4.6/lnamd64/./gambit

adamv
25th January 2012, 10:14 AM
... and here's the other thread:

(gdb) thread 2
[Switching to thread 2 (Thread 0x7fffe736d700 (LWP 18253))]
#0 0x00007fffeb68947d in read () from /lib64/libc.so.6
(gdb) info registers
rax 0xfffffffffffffe00 -512
rbx 0xd64940 14043456
rcx 0xffffffffffffffff -1
rdx 0x1000 4096
rsi 0x7fffed8f1000 140737178963968
rdi 0xb 11
rbp 0x7ffe 0x7ffe
rsp 0x7fffe7364d70 0x7fffe7364d70
r8 0x1 1
r9 0x0 0
r10 0x7fffe7364bc0 140737072483264
r11 0x293 659
r12 0xa 10
r13 0x0 0
r14 0x7fffed8f1000 140737178963968
r15 0x7fffe7364e30 140737072483888
rip 0x7fffeb68947d 0x7fffeb68947d <read+45>
eflags 0x293 [ CF AF SF IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
(gdb) info frame
Stack level 0, frame at 0x7fffe7364d80:
rip = 0x7fffeb68947d in read; saved rip 0x7fffeb62efe8
called by frame at 0x7fffe7364da0
Arglist at 0x7fffe7364d68, args:
Locals at 0x7fffe7364d68, Previous frame's sp is 0x7fffe7364d80
Saved registers:
rip at 0x7fffe7364d78
(gdb) frame
#0 0x00007fffeb68947d in read () from /lib64/libc.so.6
(gdb) list
1 /usr/src/packages/BUILD/glibc-2.3/cc/csu/crti.S: No such file or directory.
in /usr/src/packages/BUILD/glibc-2.3/cc/csu/crti.S
(gdb) where
#0 0x00007fffeb68947d in read () from /lib64/libc.so.6
#1 0x00007fffeb62efe8 in _IO_file_underflow () from /lib64/libc.so.6
#2 0x00007fffeb63004e in _IO_default_uflow () from /lib64/libc.so.6
#3 0x00007fffeb62414a in _IO_getline_info () from /lib64/libc.so.6
#4 0x00007fffeb622ea1 in fgets () from /lib64/libc.so.6
#5 0x00007fffe7372d0d in ?? () from /usr/local/Fluent.Inc/license/lnamd64/liblic90.so
#6 0x00007fffec0f3c6c in start_thread () from /lib64/libpthread.so.0
#7 0x00007fffeb6964bd in clone () from /lib64/libc.so.6
(gdb)

irinat
25th January 2012, 12:09 PM
I tried to run your journal in GUI and found no error. It generated mesh file, I can browse mesh (removed 'end force' command from journal).

that may sound weird but do you have helvB12.pcf.gz or helvB12-ISO8859-1.pcf.gz in your system? I found that progress display procedure loads font by string '-adobe-helvetica-bold-r-normal-*-*-120-*-*-*-*-*' but does not check returned value. That can lead to NULL pointer dereference in case there are no helvB12 font. And when I deleted that font from my system I faced SIGSEGV, similar to yours.

jpollard
25th January 2012, 12:56 PM
This may have nothing to do with the problem, but try putting the full path to the memcpy library instad of:

export LD_PRELOAD=memcpy-preload-64.so

put

export LD_PRELOAD=<fullpath>/memcpy-preload-64.so

adamv
25th January 2012, 11:16 PM
emerge font-adobe-100dpi
emerge font-adobe-75dpi
emerge font-misc-misc (not sure if this one was needed, but it couldn't hurt)

restart X and...

SUCCESS! The fonts+memcpy made it happen :-) And here I thought the missing fonts messages were harmless because the program still turned on and showed text just fine... Thank you so much! It might have taken a while, but I sure learned quite a bit from this whole process.

I might add that it's amazing how much faster the Linux amd64 version crunches through the mesh. The GUI is slower (as you'd expect), but using the same journal file / same computer / on the terminal:

Windoze: 15~20 sec IIRC (w/o Exceed)
lnx86: 14 sec
lnamd64: 6.4 sec !

For reference, these numbers are on an i7 860 using the Linux time command. Sorry about the "end force" at the end of of that journal file...

-Adam

Milano Travel Photos - Lebanon Photos on Instagram - Kingston Travel Photos on Instagram -