Reading Assignment: All of Programming Chapter 10 Uses of Pointers
Strings
A string is a sequence of characters, terminated by the null terminator character \0
, it has numerical value 0
, but the character literal for it is written with a blacklash, since you cannot type that character normally.
Strings Literals
If we wanted to store a string literal in a variable, we might write:
1 | const char *str = "Hello World\n" |
Figure above illustrates the effect of such a statement, and the layout of the string in memory. These characters appear in the order of the string, and are followed by the null terminator character. Please note that we do not need to write this character down in the string.
If you set the const
identifier to the variable and try changing the value of the variable, it will lead to a crash since the variable is stored in the static data section.
The compiler puts the string literals into a read only region of memory because the literal may get reused, and thus should not be changed. Consider the following code:
1 | char * str1 = "Hello"; |
Both occurrences of the literal Hello
evaluate to pointers to the location where the characters of that string is stored. The compiler is free to put the two identical string literals in onen location, meaning str1
and str2
would point at the same memory. If modifying this memory were allowed, printing str2
would print Jello
, which would be confusing. In a worse case, modifying string literals could pose a wide range of issues, from strange behaviors to security problems. Note that even if the literal appears in only one place in the program, it may get re-used multiple times-in such a case, our expectation as a programmer is that the literal will always be what we wrote, and it has not been changed by previous code.
Mutable Strings
When we want to modify a string, we need the string to reside in writeable memory, such as the frame of a function. To make space for a string in a function’s frame, we need to declare an array of char
s with sufficient space to hold all of its characters, puls its null terminator. One way is like this:
1 | char str[] = "hello world\n"; |
which behaves exactly as if we wrote:
1 | char str[] = {'H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '\n', '\0'}; |
Figure below illustrates the difference between declaring str
as const char * str
versus char str[]
:
We can declare the array str
with an explicit size, but we must be careful, if we do not include enough space for the null terminator, the compiler will not complain, but it may crash and produce incorrect results latter. Valgrind is strongly recommended to be used.
String Equality
==
is used to compare two varibales, but for strings, it will check if str1
and str2
are arrows pointing at the same place, but indeed, we want to check to see if the two strings have the same sequence of characters.
There is already a function to do it, called strcmp
in string.h
of the C library. It returns 0 if the strings are equal or a postive number or a negative number accoding to lexicographic order. The comparison is case sensitive, but there is another function, called strcasecmp
which performs case-insensitive comparision.
String Copying
If we use assignment statement str1 = str2;
, it will make str1
point at exactly the same as str2
. The C library has a function, strncpy
which performs this task, it copies a string from one location to another, and takes a parameter(n) telling it the maximum number of characters it is allowed to copy. If the length of the source string is greater than or equal to n
, then the destination is not null terminated-a situation which the programmer mush typically rectify before using the string for any significant purpose.
There is a similar function, called strcpy
, which is more dangerous, since it will simply overwrite what follows it in memory, creating a variety of problems.
strdup
allocates space for a copy of the string, and copies it into that space. However, to understand how strdup
works, we need to discuss dynamic allocation in Chapter 12.
Converting from String to ints
Strings cannot be implicily converted to integers by casting. Consider the following code fragment:
1 | const char *str = "12345"; |
Such attempt will lead to compile error message:
initialization makes integer from pointer without a casting
Conversion from a string to an integer is a common task, so the C library function provide atoi
to interpret the sequence of characters as a decimal number. If there is no valid number at the start, it returns 0
. A slightly more complex function is strtol
, which lets you specify the base as well as to pass in the address of a char *
which it will fill in with a pointer to the first character after the number. That is, if you give it the string 123xyz
, it will set this pointer to point at x
.
Standard Library Functions
You can read about all of these functions in their man
pages. You can also consult man string
for a list of all of the string-related functions, and their repective prototypes to help you find what you are looking for if you do not know its name.
Useful Resource:
- Chapter 35 Section 2
Multidimensional Arrays
We can exapand on the array with multidimensional arrays.
Declaration
We declare multidimensional arrays with multiple sets of square brackets, each indicating the size of the corresponding dimension. For example, we might declare a 2-dimensional array of doubles that is 4 elements by 3 elements like this:
1 | double myMatrix[4][3]; |
Indexing Multidimensional Arrays
If we declared two-dimensional array myMatrix
above, when we are access to myMatrix[2]
, we would expect the type to be double *
. If we want to obtain a value, we may need to use myMatrix[2][1]
, which returns a double
type value.
A Subtle Pointers
According to Section 8.7, myMatrix[i]
is equivalent to *(myMatrix + i)
where i
would be implicityly be multiplied by sizeof(Matrix[0])
. As each element of myMatrix
is an array of 3 double
s, so sizeof(Matrix[0]
is 3 * sizeof(double)
, which matches with our expectations for indexing going to a particular element.
Recall that the name of an array is its address. Therefore, when we think about what *(myMatrix + 2)
refers to, we must remember that it is the entire array of 3 elements. Since the way we represent that array is a pointer, that value has to be the pointer we use to represent the array–we cannot have it evaluate to the entire array by being all of their values at once. We can therefore reconcile the math above, by realizing that the compiler must take the address of the array at the end of the computation, and recalling that &
and *
are inverse operators, and therefore canccel each other.
Multidimensional Array Initializers
1 | double myMatrix[4][3] = { {1.0, 2.5, 3.2}, {1.1, 2.2, 3.3}, {1.2, 2.3, 3.4}, {1.3, 2.4, 3.5} }; |
You can elide the first dimension of the size, but you may not elide any other dimension’s size.
A multidimensional array is not limited to two dimensions. For more dimensions, you can write additional []
s specifying the size of each additioanl dimension like: int x[3][4][5]
.
All of the same rules apply to higher dimentional arrays.
Array of Pointers(to Arrays)
We can also represent multidimensional data with arrays that hold pointers to other arrays. For example,
1 | double row0[3]; |
Here, myMatrix
is an array of 4 pointers, which explictly point at the arrays that are the rows of the matrix.
In this array of pointers representation, the pointers to the rows are explicitly stored in memory. Accordingly, evaluating myMatrix[i]
actually involves reading a value from memory, not just computing an offset. The difference has perfromance implications.
Explicitly storing the pointers to the rows of the matrix allows us to do more. First, we can have rows with different sizes. Second, myMatrix[i]
is an lvalue, we can change where the pointers point if we so desire. Third, we can have two rows point at the exact same array.
Incompatibility of Representation
The way we declare multidimensional array and the way we store the pointer are different types, and cannot converted from one to the other. If you try to explicitly convert from one to the other, you will get results ranging from nonsensical answers to your program crashing.
Video 10.2.6 illustrates an example of what can go wrong with a naive casting without understanding the implications of what he is doing.
!!Video 10.2.6 is really good to understanding the error of casting.
Arrays of Strings
Consider the following two statements, each of which declares a multidimensional array of char
s, and initializes it with a braced array of string literals:
1 | char strs[3][4] = {"abc", "def", "ghi"}; |
The size of the second dimension of two declaration, the first one is 4
, which the second is 3
. The first statement includes space for the null terminator, while the second does not include such space.
Figure above shows the difference between two declaration.
We can not omit the size from the second dimension, as C allows us to omit only the first dimensions of a multidimensional array. We might use the array-of-pointers representation for an array of string. We might declare and initialize words
as follows:
1 | const char * words[] = {"A", "cat", "likes", "sleeping."}; |
Observe that here, we declare words
as an array of const char *s
. We should include the const
as we have indicated that words
should be initilized to pointers to string literals, which as in read-only memory. We will note that it is common to end an array of strings with a NULL
pointer, such as this:
1 | const char * words[] = {"A", "cat", "likes", "sleeping.", NULL}; |
This convention is common, as it allows for one to write loops which iterate over the array without knowing a priori how many elemnts are in the array.
Function Pointers
It can be quite useful to have a pointer to the first instruction in a function-which we typically just think of as a pointer to the function itself, and call a function pointer.
The name of any function is a pointer to that function. When we refer to a function pointer, we typically mean a variable or parameter that points at a function. However, the fact that a function’s name is a pointer to it is useful to initialize such variables and/or parameters.
1 | void doToAll(int *data, int n, int(*f)(int)){ |
The parameter declaration above is odd–int (*f)(int)
, whose type is “a pointer to a function which takes an int as a parameter and returns an int”. The syntax makes sense-the return type come first, followed by the name, followed by the parameters in parenthesis. Here, we only need to specify the parameter types; we do not name them. There are times when both the parenthesis and the *
can be omitted, however, it is generally best to be consistent.
We can also use typedef
with function pointers. The syntax is again more similar to function declarations than the other forms of typedef
. We might rewrite our previous example to use typedef
, so that it is easier to read:
1 | typedef int (*int_function_t) (int); |
Once we have this function defined, we can use it by passing in a pointer to any function of the appropriate type. Since the name of a function is a pointer to it, we can write the name of the function we want to use as the value to pass in for that argument:
1 | int inc(int x){ |
Another example of using function pointers as parameters is a generic sourting function. The C library provides such a function: void qsort(void *base, size_t nmemb, size_t size, int(*compar)(const void *, const void *));
The final parameter is the one we are most interested in for this discusssion-compar
is a pointer to a function which takes two const void *
s and returns a int. Here, the const void *
s point at two elements to be compared. A couple of examples may be helpful to understand qsort
.
1 | int compareInts(const void *n1vp, const void *n2vp){ |
Security Hazards
Improper use of strings and related functions frequently lead to security vulnerablities in software-an opportunity for a malicious user to abuse the software and compromise the functionality of the system in some way. One common form is a buffer overflow, in which the code provides a possibility to write a string into an array which is too small for it.
An example of a buffer overflow is illustrates in video 10.4. The gets
function is used to read a string. A malicious user can craft an input string which causes the attacker’s own code to be executed by the program.
Another string error which can lead to security vulnerablities is format string attacks. Recall that printf
takes a format string, and then an appropriate number of other argument for the values to convert. A format string vulnerablity arises whenever there is a possibility that the user may affect the format string in such a way that they can introduce extra format conversions. Imagine that there were a readAString()
function which reads a string from the user. Consider the following vulnerble code, which attempts to read a string then print it back:
1 | char *input = readAString(); |
If %
signs in the string, it will cause the program to behave in different ways. If the user input contain %-conversion, printf
will take the data where these arguments should be, format them as directed, and print them.
Format string vulnerabilities fall into a larger category of security flaws where a program uses unsanitized inputs. If we wanted to let the user control the format string, we could do so safely if, we took care to sanitize the string first-iterating over it and modifying %
signs to remove their special meaning.