OpenSolaris

You are not signed in. Sign in or register.

Common Mistakes and How to Fix Them

Most of the practices listed here are incorrect regardless of the compiler you use; gcc just happens to be very good at catching them and warning you. In fact gcc's warnings are more like lint's than a normal compiler's. And of course gcc can catch many errors which lint does not, or has been configured within ON to disregard.


So you've seen all the tonic-gcc bugs, and you're wondering what's the deal. Well, here it is: Tonic needs to be able to build with gcc, and our code currently upsets it. Mostly this is just making the code cleaner but gcc also catches a few bugs.

gcc is pretty good about telling you what's wrong, but here are the main types of problems we ran into:

  • Implicit ints

Old code sometimes treats int as optional (or left it off function declarations long before void existed). Such as:

f(){}
c.c:15: warning: return type defaults to 'int'

Or do:

register i;
    
c.c:16: warning: type defaults to 'int' in declaration of 'i'

These are easy, just add the int in the right place (and delete register, let the compiler optimize, if it even pays attention to that keyword).

  • pointer<->int assignments that aren't the same size

There are a lot of places that assign pointers to ints (or the other way around) where they both aren't the same size. This can be bad as gcc seems to treat pointers as signed and might sign-extend the pointer:

{mikes:reliant:466} cat c.c
#include <stdio.h>
#include <sys/types.h>
#include <stdlib.h>
int
main()
{
        uint64t        i;
        uint64t        *p;
        p = &i;
        i = (uint64t)p;
        (void) printf("p = %p\n", (void *)p);
        (void) printf("i = %llx\n", i);
}
{mikes:yavin:467} cc -o c c.c
{mikes:yavin:468} ./c
p = ffbfedb8
i = ffbfedb8
{mikes:yavin:469} gcc -o c c.c
c.c: In function 'main':
c.c:13: warning: cast from pointer to integer of different size
{mikes:yavin:470} ./c
p = ffbfeda8
i = ffffffffffbfeda8

This can be bad if the integer is expected to be a valid pointer still. Also note that lint doesn't catch this either:

{mikes:yavin:488} lint -s c.c
{mikes:yavin:489} lint -s -errchk=%all c.c
{mikes:yavin:490}

Though perhaps there is an option, or this is just because it matches the compiler's behavior and cc doesn't do this.

The fix for this is to have a uintptrt cast in between. Or to rely on the compiler promoting a uintptrt to uint64t anyway and just cast the pointer to uintptrt in the test program:

{mikes:yavin:537} cc -o c c.c
{mikes:yavin:538} ./c
p = ffbfedb8
i = ffbfedb8
{mikes:yavin:539} gcc -o c c.c
{mikes:yavin:540} ./c
p = ffbfeda8
i = ffbfeda8
  • Functions that don't return something when they should

gcc is pretty good about catching functions that fall off the end without returning something, or mix & match return; and return (something), which is just wrong. Sometimes this is fallout from #1 above, where the function is really void but it's old and defaults to int because there was no void. Other times it is just a bug. And other times it's that the code really doesn't hit the end but gcc doesn't know that. Here is a simple example:

{mikes:yavin:611} cat c.c
#include <stdio.h>
#include <sys/types.h>
#include <stdlib.h>
static void
usage(void)
{
        (void) printf("you used me wrong\n");
        exit(1);
        /* NOTREACHED */
}
int
main()
{
        usage();
        /* NOTREACHED */
}
{mike_s:yavin:612} gcc -Wall -o c c.c
c.c: In function 'main':
c.c:19: warning: control reaches end of non-void function

gcc doesn't understand NOTREACHED, that's a lint directive (and actually lint doesn't seem to understand it well either since it still complains main has no return statement). But you can tell gcc a function doesn't return with the __NORETURN attribute attached to a prototype (see <sys/ccompile.h>). Add this after the includes:

static void usage(void) __NORETURN;
    

And gcc is happy:

{mikes:yavin:619} gcc -Wall -o c c.c {mikes:yavin:620}

Sometimes the right fix can be to add a 'default' case to a switch statement that doesn't have one. It is the last thing in the function, and all cases return, but the coder knew that he would never call the function with an argument the switch statement wouldn't handle. So no default was added and a NOTREACHED was put after the switch. That's (personally) a bad thing to do because someone else could come along and call it wrong, so it would seem better to have a default (failure) case or perhaps an assert added.

Sadly sometimes the last thing a function calls is something that takes an argument that makes it not return, but gcc can't be told that. Like a generic error printing routine that takes a 'fatal error call exit' flag. I have just added bogus returns in that case but perhaps someone can find a better way.

  • Undefined order of operations

gcc has caught a few bugs where the order of operations is undefined. For example, I have seen some function calls like this:

f(i, i++);

this causes gcc to reply:

c.c:16: warning: operation on 'i' may be undefined

because the order function arguments are evaluated is unspecified by the C standard. It may be left to right, like the programmer expected, or it may be right to left, or neither. This can also happen when an argument with side effects is passed to a macro that evaluates it twice. Or in assignments like this:

i = i++;
c.c:17: warning: operation on 'i' may be undefined
  • Main functions defined as returning void

main doesn't return void, it returns int, and gcc complains. So don't do that :) And don't forget that it may then want main to return something (or lint may) so if you have an exit at the end that can just be a return.

  • Extra tokens after preprocessor directives

This is usually for #else and #endif directives. Don't do this:

#ifdef DEBUG
…
#else DEBUG
…
#endif DEBUG

do this:

#ifdef DEBUG
…
#else /* DEBUG */
…
#endif /* DEBUG */
  • Writing into string constants.

gcc puts puts string constants in read-only memory because, well, they are constant. Unfortunately some code writes into string constants because Studio doesn't do that by default (see -xstrconst). Sadly this is not something found by compilation but only by testing, and we know of a few:

  • 6264767 cfgadm expects to write into string constants
  • 6264775 lpadmin expects to write into string constants
  • 6272155 svcs writes into string constants, can crash as a result

There is a flag that can be used to have gcc make string constants non-constant (-fwritable-strings), and we do (sadly) use it. But note what it says:

{mikes:yavin:683} /usr/sfw/bin/gcc -Wall -fwritable-strings -o c c.c
cc1: note: -fwritable-strings is deprecated; see documentation for details

And in gcc 4.0 guess what:

{mikes:yavin:684} gcc -Wall -fwritable-strings -o c c.c
cc1: error: unrecognized command line option "-fwritable-strings"

Code that wants to write to constants should just be fixed. Using -xstrconst in the CFLAGS may also help catch these in testing sooner. You can use the gcc flag -Wwrite-strings to help you find these bugs, but unfortunately it's too noisy to be useful all the time; any nontrivial code is likely to trigger a few of these.

  • Incorrect format strings

Format strings need to correspond to the types being passed, otherwise gcc complains. Common problems are:

  • Using %p corresponding to an argument which is not a pointer.
  • Using %x or %lx corresponding to a pointer argument.
  • Use of incorrect or unnecessary size specifiers.

Here are some usage guidelines:

  • Use %d or %x for int, int32t, and, in the kernel, smaller _signed integer types.
  • Use %u or %x for unsigned int, uint32t, and, in the kernel, smaller _unsigned integer types.
  • Use the "l" modifier for long, unsigned long, and (u)intptrt types. Additionally, use the "l" modifier in the kernel for sizet, offt, ssizet, and other types which are equivalent to long or unsigned long.
  • Use %p for any pointer or caddr_t type. Do not cast pointers to any other type for the purpose of display (especially types which may be less than 64 bits long) unless you are absolutely sure it is both correct and necessary. For example, kernel text addresses on SPARC are guaranteed to fit into 32-bit integers, but such a cast is unlikely to be needed if it is only for display/debugging purposes, and in fact it may even hide bugs. Note that you may need to case to void * to make lint happy; this is fine.

Special note on long long and the "ll" modifier: the SPARC kernel is no longer built 32-bit; therefore you may use "ll" only for those variables which are explicitly declared long long or a type equivalent to long long (the most notable examples are hrtimet variables). However, code using int64t or uint64_t and compiled for both 32-bit and 64-bit must instead use the macros in <sys/int_fmtio.h>.

Wrong:

uint64t    variable;
void        *ptr;
cmnerr(CENOTE, "oh dear: %llx (%x)", variable, (uintt)ptr);

Right:

#include <sys/intfmtio.h>
uint64t    variable;
void        *ptr
cmnerr(CENOTE, "oh dear: %" PRIx64 " (%p)", variable, ptr);

It is good practice to use these any time a variable's size is explicit in its type (uint8t, int16t, etc).

Note on cmnerr "%b" format: gcc warns if you include a field width with the "%b" specifier. This is a bug in gcc that will be fixed in the near future. To work around it, use the -Wno-format flag (with the compiler wrapper, -gcc=-Wno-format).

  • Useless comparisons

Comparisons which are always true or always false will cause gcc to emit warnings (and, in many cases, lint to emit warnings as well). Common mistakes are comparing unsigned variables with 0 and comparing variables with constants larger than that variable could contain. Example of wrong code:

#define    MAXLITTLE    1024
#define    MAXMEDIUM    0x10000
uint8t    little;
uint16t    medium;
int8t    slittle;
…
if (little < 0 || little > MAXLITTLE)
        return;
if (medium < 0 || medium >= MAXMEDIUM)
        return;
if (slittle >= MAXLITTLE)
        return;

Questions:

  • Why doesn't the tonic team fix these?

There are a few reasons. One is that the resources aren't there. Another is that it would seem better for the owners of the code to fix it, if there are any owners around. See 6272018, where the RE determined we could just delete a file instead of fixing it. Or the owners might know that they need to coordinate changes with other folks. There is also the faint hope that more testing would be performed on the result just to be sure there aren't any optimization-related bugs or perhaps cases where the code writes into string constants.

The bugs were mostly filed in an attempt to identify the minimum change required, and get the change to the appropriate people, who hopefully could then work in parallel with others.

  • Why don't we just turn off these warnings?

Well, because it seems better to fix the code rather than slap the compiler. After all, it's not just gcc that may be upset about it, the Studio compiler could easily decide to do the same thing. And in fact does in many cases if C99 is turned on (we turn it off because there were far, far more warnings in ON with it on. But it would be really nice to have it on as it requires the use of prototypes and prototypes catch problems). Another reason is that turning off warnings rather than fixing them is one reason the code so upsets gcc – see all the lint warnings we disable and again the fact C99 is disabled. Taking the easy road before is not paying off now.