TECH DEVIANCY

last updated, May 17, 2011

C Pointer Qualifiers


My cheat sheet in progress



Quick Background

Lately, I'm having to change and get with the decade. Here are some quick points I'm trying to pay more attention to.


Pointers to Constants vs Constant Pointers

Well, C++ might have at least one good idea, since that is where this came from. :) Newer version of GCC and visual stdio (2005+) support these features. C89 added const along with volatile.

Returning Pointers to Constants

Consider the following example. Here, function() returns a pointer to a constant structure. That means that from a compilation and static code analysis point of view, the contents of the structure (or simply value if it was a native type), should only be read from. The qualifier is also great for documentation, because it tells us something we shouldn't do. The syntax is to put the const keyword before the type of pointer.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#include <stdio.h>
#include <stdlib.h>

struct a
{
 int x;
};

const struct a * function(void)
{
 struct a *ptr;

 if((ptr = (struct a *) malloc(sizeof(struct a))) == NULL)
 {
  exit(1);
 }

 ptr->x = 0;
 return ptr;
}

int main(void)
{
 int y;
 const struct a *ptr;

 ptr = function();

 y = ptr->x;

 return 0;
}

So what happens if try to write to that location anyway? We get slapped with a gcc error.

1
2
3
4
5
6
7
8
9
10
int main(void)
{
 const struct a *ptr;

 ptr = function();

 ptr->x = 1;

 return 0;
}
error: assignment of read-only location ‘*ptr’

What if just save the value to a non-constant local version? We get slapped with a gcc warning.

1
2
3
4
5
6
7
8
9
10
int main(void)
{
 struct a *ptr;

 ptr = function();

 ptr->x = 1;

 return 0;
}
warning: assignment discards qualifiers from pointer target type

What if "know what we are doing", and just type cast? Well that works. Just remember that if you really are going to write to that value that you really do know that the documentation (the prototype) told you not to, and that if you don't want to write to value, you are not letting the compiler help you with its own error checking and possibly optimizations. Such an action is somewhat like the mirror image of the inside of function() itself. That is, there the constant qualification only comes in the picture after we are "ready", durring the implicit type cast from non-constant to constant by the return statement.

1
2
3
4
5
6
7
8
9
10
int main(void)
{
 struct a *ptr;

 ptr = (struct a *) function();

 ptr->x = 1;

 return 0;
}

This illustrates our first rule about constants, the constant property only last for a single typed variable at compile time. That is, other than a possible optimization in generated code, nothing about the runtime is effected. Once a variable is copied / assigned to another, the properties (and "constantness") of those two variables are independent. Additionally, a type cast can temporarily adjust those properties.

Parameters of Pointers to Constant Structures

Besides again being an opertunity for the compiler to check for errors and optimize the body of a function with such a parameter, it documents to the API user that certain things will never happen. In this example we can safely say that a memory leak is not created, because the function prototype promisses not to overwrite our reference to it. (Of course it could by intentional action).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
struct b
{
 const struct a *nested_ptr;
};

void do_something(const struct b *ptr)
{
 const struct a *x = ptr->nested_ptr;
}

int main(void)
{
 struct b b_obj;

 b_obj.nested_ptr = function();

 /* I "know" that the above allocation wont be "lost" by an overwrite durring this function call */
 do_something(&b_obj);

 return 0;
}

Constant Pointers

A constant pointer on the other hand is a "regular" C constant, that is what it points to can't be changed. So if you can't change the value, then how does it get set in the first place? More accurately, constants can't be changed after initialization. So the only way to give them a value is at declaration time, which includes when they are function parameters. Note also that declarations like the one below are useless as far as I can tell since x can never be set to anything. GCC doesn't complain about that in particular though (aside from the -Wall option producing the normal related warnings).

1
2
3
4
5
6
int main(void)
{
 const int x;

 return 0;
}

This is actually an important point. So much so, I'll call it our next rule about constants. Syntactically valid, yet logically useless variable declarations are allowed. When you are trying to fix errors with some of the more complicated qualifiers, you have to just know, that the compiler may not warn you that a variable declaration will never be useful, and likely the cause.

Anyway, to mark as a pointer as constant the const keyword goes after the '*'. Such a pointer as parameter in a function prototype, doesn't really do anything for the API user as far as documentation. It does however help the compiler's error checking and optimazation efforts inside the function body. For a simple example, the above do_something() prototype can be rewritten as follows bellow. Nothing has changed at all, except if the function implimentation were to overwrite ptr the compiler would generate an error. The compiler would also generate an error if say ptr->nested_ptr were to be written to since the parameter was still a pointer to a constant structure. Now the paramater is a constant pointer to a constant structure. As already stated, the "initialization" of the function's local variable ptr is, of course, done at the time the function is called by the value passed to it, and can not be reset.

void do_something(const struct b * const ptr);

More on Initializing Constants

A constant structure can be initialized (and "frozen" by its "constantness") like this.

1
2
3
4
5
6
int main(void)
{
 const struct a obj = [ 5 ];

 return obj.x;
}

A pointer to constant structure can be initialized in a couple of ways like this.

1
2
3
4
5
6
7
8
int main(void)
{
 const struct a obj = [ 5 ];
 const struct a *ptr_a = &obj;
 const struct a *ptr_b = function();

 return ptr_a->x;
}

A constant pointer to constant structure can be initialized in a couple of ways like this.

1
2
3
4
5
6
7
8
int main(void)
{
 const struct a obj = [ 5 ];
 const struct a * const ptr_a = &obj;
 const struct a * const ptr_b = function();

 return ptr_a->x;
}

Returning Constant Pointers

Here's one I still don't fully understand. See if it makes sense to you. I see that this prototype would specify a function (it works to) that would return a constant pointer. My question is, what does that "protect" you from? That returned value is never writeable, as it, at most, is just copied to a local variable.

const struct a * const function(void);

Filling in Constant Pointers, and Pointers to Pointers

A common reason for functions to take pointers to pointers as parameters is so that a pointer value may be "filled in" or updated. Let's start with a quick illustration before things nuts.

1
2
3
4
5
6
7
8
9
10
11
12
13
void fill_in(const struct a **location)
{
 *location = function();
}

int main(void)
{
 const struct a *ptr;

 fill_in(&ptr);

 return 0;
}

So here, the only thing different from Mom and Pop, meat and potatos C, is that we are still using a constant structure. To reiterate, that means again that niether main(), nor fill_in() may write to the contents of the "a" structure in use.

Our last rule for pointer constants is, each level of pointer differencing has an independent constant qualification property. Consider the next code sample.

1
2
3
4
void fill_in(const struct a ** const location)
{
 *location = function();
}

The const keyword was added before the parameter name, which makes the function scoped, local variable, location, not modifiable within the body of the function fill_in(). So how can we assign a value to it? Nothing is being assigned to it. The pointer that it points to is be assigned with the address returned by function(). There are actually three levels that might be constant or non-constant here.

  • the structure in question is constant, meaning its contents can not be written to
  • the pointer to such a structure, pointed to by location, at *location is not-constant, meaning it can be written to
  • the variable location itself, is constant, meaning it can not be written to

Suppose that we wanted that middle item to be constant as well. In such a case we would again be documenting to the API user that our passed value would not be manipulated. This time let us show the error would create by rewritting the fill_in() function so that we would be violating the behavior our prototype advertises to its callers.

1
2
3
4
void fill_in(const struct a * const * const location)
{
 *location = function();
}
error: assignment of read-only location ‘*location’

The syntax here being that the const keyword goes at each level differencing in the variable declaration. Another variation would be the below correct function that advertises and abides by, its claim to not manipulate the structure contents, or the pointer to the structure, yet allows the function body to manipulate its own local variable that holds the passed parameter.

1
2
3
4
5
6
7
8
void make_use_of(const struct a * const *location)
{
 const struct a * const ptr_a = *location;
 const struct a *ptr_b = *location;

 ptr_b = NULL;
 location = NULL;
}
  • the "a" structure in question is constant, meaning its contents can not be written to
  • the pointer to such a structure, pointed to by location, at *location is also constant, meaning it can not be written to
  • the local variable location itself, is not constant, meaning it can be written to
  • the local variable ptr_a, is constant, meaning it can not be written to
  • the local variable ptr_b, is not constant, meaning it can be written to

Remember our three rules.

  • the constant property only last for a single typed variable at compile time
  • syntactically valid, yet logically useless variable declarations are allowed
  • each level of pointer differencing has an independent constant qualification property

Restrict

The restrict keyword is another new qualifier from C99. There are a number of good explinations for it's use on the web. When pointers have the restric qualifier, it means that the location a pointer points to is not also pointed to by another pointer (in that scope), or overlapped by another pointed to region, collectively refered to as aliasing. When not used, compilers can not take certain shortcuts they otherwise could if they knew certain operations would not be performed on overlapping memory.

A much discussed example is the string.h function memcpy(), which is one of the most used C functions arround. It simply copies an arbitrary section of memory to another. Long ago it might be prototyped like this:

char *memcpy(char *destination, char *source, int length);

Well, first the void pointer was standardized, so that replaced the char *. Also address spaces (and pointer sizes) were not always the same word size as int, so the size_t macro was used for the byte count. The newer meanings of const came into being, and that allowed the source parameter to be qualified as read only. Finally, the restrict qualifier in the prototype allows for more optimized implementations of memcpy(). Thus the prototype on most environments today are more like this:

void *memcpy(void * restrict destination, const void * restrict source, size_t length);

What that means is that the source and destination regions would be non-overlapping. Actually that just made things more formal as that was always in the description of the function. If you wanted to move a region over a little by copying it to an overlapping section, then one was to use the memmove() function. The trouble came from the coincidence that on some platforms, some implementations, called in certain ways, memcpy() would behave the same as memmove(), and some programmers used it for that reason. As the newer libraries started to make use of the version with restrict, some problems made it all the way to users.

Longer story short is that restrict atctually should be used in almost every case. What I want to point out is that in practice __restrict is likely used instead. In the case where there is some compiler support for C99 it typically must be toggled on, and doing so may break other things. In such cases, or in the case of no attempt at C99, often the keyword __restrict is provided instead. __restrict support can also be found in compiler specific extensions of other languages such as C++. My two cents would be to use __restrict in code, and restrict in API documentation.


UTF-8 vs. unsigned char / signed char / char

By now, of course, you are using UTF-8 everywhere, right? That means that NULL terminated C strings should be able to use all 8 bits in each byte. For a while there I didn't understand why that didn't mean that that strings should be defined as unsigned char arrays to designate compatibility with UTF-8. I even tried doing that, but the compiler warnings drove me nuts when calling stdio. As it turns out, unlike, say int, which implies signed int, char simply means more of a compiler default for signed char or unsigned char. GCC treats it more like a third variation entirely. The point being, that on recent gcc, char arrays are in fact unsigned already for UTF-8.


More Examples

For illustration, take this function from the SP Dataset API, allong with the function body bellow.

In the case of buidling a sp dataset that models rows of nodes matched with columns of types, the above function starts a new row. It takes pointer to a record structure pointer, root_record, and creates a new record structure. If this is the first such "row" (indicated by root_record being NULL, then root_record will be updated to point to the new structure as it is now the root record. If this is another "row" in the "table", then next and previous members are updated to reflect that this the last row. Note also that the row list is circular so that the previous list item on the fist row is the last row, and the next list item of the last row is the first row. A pointer to the new record structure is also returned, or NULL on error.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
struct sp_dataset_record *
sp_dataset_node_by_type_new_node(char * const __restrict last_error, unsigned int const error_size,
                                 struct sp_dataset_record ** const __restrict root_record)
{
 struct sp_dataset_record *new_record;

 if(root_record == NULL)
  return NULL;

 if((new_record = g_slice_alloc0(sizeof(struct sp_dataset_record))) == NULL)
 {
  if(last_error != NULL)
  {
   snprintf(last_error, error_size, "g_slice_alloc0() failed");
  }
  return NULL;
 }

 if(*root_record == NULL)
 {
  *root_record = new_record;
  return new_record;
 }

 if((*root_record)->next == NULL)
 {
  new_record->prev = *root_record;
 } else {
  new_record->prev = (*root_record)->prev;
 }

 (*root_record)->prev = new_record;

 new_record->prev->next = new_record;

 return new_record;
}
  • The structure the return value points to is not constant. That is becuase sp_dataset_record structures have many functions that modify them.
  • The address for the last_error buffer is not modifiable inside the function. No other variable within scope of the function also points to the same error buffer.
  • error_size is not modifiable inside the function.
  • Within the function the address pointed to by root_record is not changed.
  • The pointer that root_record points to is non-constant because it will be set in the case of a new dataset being created.
  • The structure pointed to by the pointer pointed to by root_record is also non-constant, since added nodes record to the dataset will potentially modify it.
  • The pointer root_record points to, *root_record, can not be marked as __restrict as it has nested pointers that likely alias it. However the address root_record itself holds, is just of the caller's local variable holding that value.
  • The local variable, new_record, is neither a constant pointer, or a pointer to a constant structure, since it will hold a newly created value that will have its members set to initial values.

For use with node by type datsets. For a given node record, node_record, attempt to find a record used for types of domain type point_domain_details.

struct sp_dataset_record *
sp_dataset_node_by_type_check_for_type_on_node(
             char * const __restrict last_error, unsigned int const error_size, 
             const struct sp_dataset_record * const node_record,
             const struct sp_domain_type_render_info * const point_domain_details)
{
 const struct sp_dataset_record *node;
 
 if(node_record == NULL)
  return NULL;

 node = node_record->child;
 while(node != NULL)
 {
  if(node->point_domain_details == point_domain_details)
   return (struct sp_dataset_record *) node;

  node = node->next;
 }

 return NULL;
}
  • The structure the return value points to is not constant. That is becuase sp_dataset_record structures in general are considered to be modifiable objects. Returning one as constant would yield errors and encourage overiding typecasts. The structure in question would have had to have been handed to the function anyway, and would have probably been in a non-constant state to start with if it came from another API function.
  • The address for the last_error buffer is not modifiable inside the function. No other variable within scope of the function also points to the same error buffer.
  • error_size is not modifiable inside the function.
  • node_record can not be marked with __restrict because it could be aliased by node_record->child->parrent
  • Inside the function, the address node_record points to is not modifiable
  • The structure for node_record is marked as contant. This actually gives a "loop hole" for taking a constant sp_dataset_record structure and deriving a non-constant one with out an explicit type cast. This is unlikely an issue since these structures are non-constant in general. What this does do that is worth while, is tell the API user and the compiler that, this particular function is not one that should be modifying such an object.
  • Not only should the type render info object not be modified, but the address should hold for the durration of the function in the variable point_domain_details. Deeply nested pointers of node_record, may alias this address though.
  • The local variable, node, should not be used to modify sp_dataset_record structure contents, but it will point to different addresses durring the function call. It too, can not be marked with __restrict since some of the values it will hold will point to objects that themselves alias each other.

© 2011 C. Thomas Stover

cts at techdeviancy.com

back