OpenSolaris

Collectives Discussions Documentation Download Source Browser

Home » OpenSolaris Forums » OpenSolaris » discuss

Thread: shared library symbols at address 0x00000000

Welcome, Guest Help
Login Login
Guest Settings Guest Settings
Reply to this Thread Reply to this Thread Search Forum Search Forum Back to Thread List Back to Thread List

Permlink Replies: 4 - Last Post: Nov 20, 2006 10:58 AM by: rie
mmman

Posts: 229
From: CZ

Registered: 3/21/06
shared library symbols at address 0x00000000
Posted: Nov 17, 2006 8:18 AM

  Click to reply to this thread Reply

Hi all,

[ please CC me on replies ]

I'm part time fixing some bugs in Nexenta, and I have for a second time
hit the bug, where library libA.so has been linked against some other
shared library libB.so and some symbols were incorrectly resolved to be
at absolute address 0x0. Note that I'm talking about symbols
representing regular functions like pthread_create, dlopen, ...

Some recent examples of this bug can be found in the bug reports:

http://www.gnusolaris.org/cgi-bin/trac.cgi/ticket/409
http://www.gnusolaris.org/cgi-bin/trac.cgi/ticket/347

This most probably happens due to a bug in GNU ld, that, given a certain
wrong set of commandline switches resolves the symbols incorrectly, and
leads to application crash when the resolved symbol is first used (plain
old segfault while jumping to 0x0 address).

While investigating this, I could see that on Solaris, some symbols in
some libraries are deliberately put at the address 0x0, and since this
happens in libraries like libc.so, libpthread.so, I don't believe it is
a bug.

I'm just curious why this happens, what these symbols mean, and what are
they used for. Seems that GNU ld is picking them up in situations where
it shouldn't be, and I would like to reproduce a test case where ld can
deliberately exhibit this bug.

thanx,
Martin

P.S. an excerpt of
$ nm -D libc.so | grep '00000000 A'
...
00000000 A dladdr
00000000 A dladdr1
00000000 A dlclose
00000000 A dldump
00000000 A dlerror
00000000 A dlinfo
00000000 A dlmopen
00000000 A dlopen
00000000 A dlsym
00000000 A frexp
00000000 A isnan
00000000 A isnand
00000000 A isnanf
00000000 A ldexp
00000000 A logb
00000000 A modf
00000000 A modff
...

--
http://martinman.net
_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss at opensolaris dot org



casper

Posts: 3,607
From: NL

Registered: 3/9/05
Re: [osol-code] shared library symbols at address 0x00000000
Posted: Nov 17, 2006 9:13 AM   in response to: mmman

  Click to reply to this thread Reply


>I'm part time fixing some bugs in Nexenta, and I have for a second time
>hit the bug, where library libA.so has been linked against some other
>shared library libB.so and some symbols were incorrectly resolved to be
>at absolute address 0x0. Note that I'm talking about symbols
>representing regular functions like pthread_create, dlopen, ...

>While investigating this, I could see that on Solaris, some symbols in
>some libraries are deliberately put at the address 0x0, and since this
>happens in libraries like libc.so, libpthread.so, I don't believe it is
>a bug.

Correct.

>I'm just curious why this happens, what these symbols mean, and what are
>they used for. Seems that GNU ld is picking them up in situations where
>it shouldn't be, and I would like to reproduce a test case where ld can
>deliberately exhibit this bug.


These symbols are "filter" symbols; they live in different libraries.

libc is a "filter" of libdl.so; this means it also exports a view on
the symbols found in libdl.so.

Casper
_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss at opensolaris dot org



rie

Posts: 307
From: US

Registered: 3/9/05
Re: shared library symbols at address 0x00000000
Posted: Nov 17, 2006 9:22 AM   in response to: mmman

  Click to reply to this thread Reply

Martin Man wrote:

> I'm just curious why this happens, what these symbols mean, and what are
> they used for. Seems that GNU ld is picking them up in situations where
> it shouldn't be, and I would like to reproduce a test case where ld can
> deliberately exhibit this bug.
> ...
> P.S. an excerpt of
> $ nm -D libc.so | grep '00000000 A'
> ...
> 00000000 A dladdr
> 00000000 A dladdr1

These are filters. For a long time we have produced shared objects that
act as filters on other shared objects. The whole object has acted as
a filter:

oxpoly 490. elfdump -d /usr/lib/libdl.so.1 | fgrep FILTER
[1] FILTER 0xe6 /usr/lib/ld.so.1
oxpoly 491. elfdump -d /usr/lib/libsys.so.1 | fgrep FILTER
[1] FILTER 0x758 /usr/lib/libc.so.1

In Solaris 10, we added per-symbol filtering, a mechanism where
individual symbols could be identified as filters. In fact this
got back ported to Solaris 9 9/04. See:

http://docs.sun.com/app/docs/doc/817-1984/6mhm7pl1q?a=view

The filtering is triggered because we maintain an auxiliary array
of information for the symbol table - the SHT_SUNW_syminfo,
.SUNW_syminfo section. You can dump this with:

oxpoly 493. elfdump -y /lib/libc.so.1 | grep dladdr
[1176] F [1] /usr/lib/ld.so.1 dladdr1
[1291] F [1] /usr/lib/ld.so.1 dladdr
[1641] F [1] /usr/lib/ld.so.1 _dladdr1
[1913] F [1] /usr/lib/ld.so.1 _dladdr
^
SYMINFO_FLG_FILTER

When ld(1) resolves an object to a filter symbol, it simply creates
the appropriate reference. For function references, this would be the
creation of a procedure linkage table entry, .plt:

oxpoly 498. elfdump -sN.dynsym /lib/libc.so.1 | grep dlopen
[2328] 0x00000000 0x00000000 FUNC GLOB D 5 ABS dlopen
oxpoly 499. cc -o main main.c
oxpoly 500. elfdump -r main | fgrep dlopen
R_SPARC_JMP_SLOT 0x20ca4 0 .rela.plt dlopen

When the runtime linker binds the process, it redirects the binding
to the filtee. In this case, the call is resolved to ld.so.1 itself.
Because of this redirection, there is no need for any code to back the
filter symbol definition - hence it is defined as ABS.

Another form of filtering is auxiliary filtering, this redirects the
binding at runtime if a "better" implementation exists, but if not
falls back to the original library:

oxpoly 503. elfdump -y /lib/libc.so.1 | grep memcpy
[93] A [2] /platform/$PLATFORM/lib/libc_psr.so.1 memcpy

As this function has backing code, the symbol defines the associated
code:

oxpoly 504. elfdump -sN.dynsym /lib/libc.so.1 | fgrep memcpy
[93] 0x0003fed0 0x000001b0 FUNC WEAK D 38 .text memcpy
[583] 0x0003fed0 0x000001b0 FUNC GLOB D 36 .text _memcpy


With the introduction of per-symbol filters, we were able to simplify
and refine many object filtering mechanisms. For example, the dl*
family could be defined in libc (you don't need to link with -ldl
anymore).


Send me mail if you need more information.


--

Rod.
_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss at opensolaris dot org



mmman

Posts: 229
From: CZ

Registered: 3/21/06
Re: shared library symbols at address 0x00000000
Posted: Nov 20, 2006 2:11 AM   in response to: rie

  Click to reply to this thread Reply

Hi Rod,

Thank you for a very detailed explanation.

As I understand it now, on Linux systems, the same effect can be
achieved by linking a library libA.so against another library libB.so,
in which case, the resulting binary can be linked only against libA.so
and will automatically resolve symbols from libB.so.

On Solaris, the symbols linked to libA.so from libB.so are moreover
marked as filter symbols.

I have to investigate now what is the support of filter symbols in GNU
ld(1) and GNU nm(1) and eventually fix it. It seems that neither of them
is aware of filter symbols and concepts behind them.

Have someone of you hit the same bug or seen similar problems?

thanx,
Martin

Rod Evans wrote:
> Martin Man wrote:
>
>> I'm just curious why this happens, what these symbols mean, and what
>> are they used for. Seems that GNU ld is picking them up in situations
>> where it shouldn't be, and I would like to reproduce a test case where
>> ld can deliberately exhibit this bug.
>> ...
>> P.S. an excerpt of
>> $ nm -D libc.so | grep '00000000 A'
>> ...
>> 00000000 A dladdr
>> 00000000 A dladdr1
>
> These are filters. For a long time we have produced shared objects that
> act as filters on other shared objects. The whole object has acted as
> a filter:
>
> oxpoly 490. elfdump -d /usr/lib/libdl.so.1 | fgrep FILTER
> [1] FILTER 0xe6 /usr/lib/ld.so.1
> oxpoly 491. elfdump -d /usr/lib/libsys.so.1 | fgrep FILTER
> [1] FILTER 0x758 /usr/lib/libc.so.1
>
> In Solaris 10, we added per-symbol filtering, a mechanism where
> individual symbols could be identified as filters. In fact this
> got back ported to Solaris 9 9/04. See:
>
> http://docs.sun.com/app/docs/doc/817-1984/6mhm7pl1q?a=view
>
> The filtering is triggered because we maintain an auxiliary array
> of information for the symbol table - the SHT_SUNW_syminfo,
> .SUNW_syminfo section. You can dump this with:
>
> oxpoly 493. elfdump -y /lib/libc.so.1 | grep dladdr
> [1176] F [1] /usr/lib/ld.so.1 dladdr1
> [1291] F [1] /usr/lib/ld.so.1 dladdr
> [1641] F [1] /usr/lib/ld.so.1 _dladdr1
> [1913] F [1] /usr/lib/ld.so.1 _dladdr
> ^
> SYMINFO_FLG_FILTER
>
> When ld(1) resolves an object to a filter symbol, it simply creates
> the appropriate reference. For function references, this would be the
> creation of a procedure linkage table entry, .plt:
>
> oxpoly 498. elfdump -sN.dynsym /lib/libc.so.1 | grep dlopen
> [2328] 0x00000000 0x00000000 FUNC GLOB D 5 ABS dlopen
> oxpoly 499. cc -o main main.c
> oxpoly 500. elfdump -r main | fgrep dlopen
> R_SPARC_JMP_SLOT 0x20ca4 0 .rela.plt dlopen
>
> When the runtime linker binds the process, it redirects the binding
> to the filtee. In this case, the call is resolved to ld.so.1 itself.
> Because of this redirection, there is no need for any code to back the
> filter symbol definition - hence it is defined as ABS.
>
> Another form of filtering is auxiliary filtering, this redirects the
> binding at runtime if a "better" implementation exists, but if not
> falls back to the original library:
>
> oxpoly 503. elfdump -y /lib/libc.so.1 | grep memcpy
> [93] A [2] /platform/$PLATFORM/lib/libc_psr.so.1 memcpy
>
> As this function has backing code, the symbol defines the associated
> code:
>
> oxpoly 504. elfdump -sN.dynsym /lib/libc.so.1 | fgrep memcpy
> [93] 0x0003fed0 0x000001b0 FUNC WEAK D 38 .text memcpy
> [583] 0x0003fed0 0x000001b0 FUNC GLOB D 36 .text _memcpy
>
>
> With the introduction of per-symbol filters, we were able to simplify
> and refine many object filtering mechanisms. For example, the dl*
> family could be defined in libc (you don't need to link with -ldl
> anymore).
>
>
> Send me mail if you need more information.
>
>


--
http://martinman.net
_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss at opensolaris dot org



rie

Posts: 307
From: US

Registered: 3/9/05
Re: shared library symbols at address 0x00000000
Posted: Nov 20, 2006 10:58 AM   in response to: mmman

  Click to reply to this thread Reply

Martin Man wrote:

> As I understand it now, on Linux systems, the same effect can be
> achieved by linking a library libA.so against another library libB.so,
> in which case, the resulting binary can be linked only against libA.so
> and will automatically resolve symbols from libB.so.

This is different. Filters are an abstraction, where a dependency
established at link-edit time, is redirected to an alternative
implementation at runtime.

The scenario you have outlined is a trick played with dependencies,
which the Solaris ld(1) will frown upon :-). If a binary requires
interfaces within libB.so, it should have its own dependency on
libB.so. Assuming some other object will make libB.so appear in
the address space is risky.

> On Solaris, the symbols linked to libA.so from libB.so are moreover
> marked as filter symbols.

The focal point seems to be our interpretation of ABS. On Solaris, an
ABS symbol index defines how the symbol should be interpreted *within*
the object that contains the symbol. A binary that references this
symbol should establish its own reference model based on the symbols
type - FUNC (referring object creates a .plt) or DATA (referring object
creates a GOT reference), etc.

It looks like the gnu linker is propagating the destination symbol
index (ABS) to the referring object.

At runtime, the referring object should bind to the definition as normal.
What the defining implementation does (act as a filter, dlopen() something,
or define an absolute offset) is up to the implementation - and can change
from one runtime environment to another.

--

Rod.
_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss at opensolaris dot org






Terms of Use | Privacy | Trademarks | Copyright Policy | Site Guidelines
Your use of this web site or any of its content or software indicates your agreement to be bound by these Terms of Use.
© 2010, Oracle Corporation and/or its affiliates

Oracle Logo