[Notes: Chapter 11 Character strings and string functions ]
This chapter discusses the nature of strings, how to declare and initialize strings, how to get them into and out of programs, and how to manipulate strings.
Representing Strings and String I/O
-
A character string is a char array terminated with a null character (\0)
-
The puts() function, like printf() , belongs to the the stdio.h family of input/output functions.
It only displays strings, and, unlike printf() , it automatically appends a newline to the string it displays.
Defining strings with a program
The principal ways to define a string are using string constants, usning char arrays and using char pointers.
character string literals (string constants)
A string literal, also termed a string constant, is anything enclosed in double quotation marks.
The enclosed characters, plus a terminating \0 character automatically provided by the compiler, are stored in memory as a character string.
Eg:
" I am a symbolic string constant" (display)
" I am a symbolic string constant\0" (store in memory)
-
C concatenates string literals if they are seperated by nothing or by whitespace.
Eg:
char greeting[50] = “Hello, and”" how are" " you"" today!";
is equivalent to this:
char greeting[50] = “Hello, and how are you today!”; -
If you want to use a double quotation mark within a string, precede the quotation mark with a backslash, as follows:
printf("“Run, Spot, run!” exclaimed Dick.\n");
This produces the following output:
“Run, Spot, run!” exclaimed Dick. -
character string constants are placed in static storage class, which means that if you use a string constant in a function, the string is stored just once and lasts for the duration of the program, even if the function is called several times. The entire quoted phrase acts as a pointer to where the string is stored, similar to the name of an array.
Eg:
#include <stdio.h>
int main(void)
{
printf("%s, %p, %c\n", “We”, “are”, *“space farers”);
return 0;
}
The output is:
We, 0x00000f61, s
Character string arrays and initilization
- When you define a character string array, you must let the compiler know how much space is needed.
Eg:
const char m1[40] = “Limit yourself to one line’s worth.”;
This form of initialization is short for the standard array initialization form:
const char m1[40] = { ‘L’,‘i’, ‘m’, ‘i’, ‘t’, ’ ', ‘y’, ‘o’, ‘u’, ‘r’, ‘s’, ‘e’, ‘l’,
‘f’, ’ ', ‘t’, ‘o’, ’ ', ‘o’, ‘n’, ‘e’, ’ ',
‘l’, ‘i’, ‘n’, ‘e’, '", ‘s’, ’ ', ‘w’, ‘o’, ‘r’,
‘t’, ‘h’, ‘.’, ‘\0’
};
Note the closing null character. without it, you have a character array, but not a string.
Eg:
const char m2[] = “If you can’t think of anything, fake it.”;
Letting the compiler compute the size of the array works only if you initialize the array.
-
Indeed, you can use pointer notation to set up a string.
const char *pt1 = “something is pointing at me.”;
const char ar1[] = “something is pointing at me”
Both pt1 and ar1 are address of strings.
Array versus pointer
-
Difference between an array and a pointer form
The array form ( ar1[] ) causes an array of 29 elements (one for each character plus one for the terminating ‘\0’ ) to be allocated in the computer memory. Each element is initialized to the corresponding character of the string literal. -
Typically, The quoted string is stored in a data segment that is part of the executable file;
when the program is loaded into memory, so is that string. The quoted string is said to be in static memory .
But the memory for the array is allocated only after the program begins running.
At that time, the quoted string is copied into the array.
Note that, at this time, there are two copies of the string. One is the
string literal in static memory, and one is the string stored in the ar1 array. -
Hereafter, ar1 = &ar1[0] . ar1 is an address constant .
ar1+1 is valid, but ++ar1 is not allowed. -
The pointer form ( *pt1 ) also causes 29 elements in static storage to be set aside for the string.
In addition, once the program begins execution, it sets aside one more storage location for the
pointer variable pt1 and stores the address of the string in the pointer variable.
Eg:
// addresses.c – addresses of strings
#define MSG “I’m special.”
#include <stdio.h>
int main()
{
char ar[] = MSG;
const char *pt = MSG;
printf(“address of “I’m special”: %p \n”, “I’m special”);
printf(" address ar: %p\n", ar);
printf(" address pt: %p\n", pt);
printf(" address of MSG: %p\n", MSG);
printf(“address of “I’m special”: %p \n”, “I’m special”);
return 0;
}
Here’s the output from one system:
address of “I’m special”: 0x100000f0c
address ar: 0x7fff5fbff8c7
address pt: 0x100000ee0
address of MSG: 0x100000ee0
address of “I’m special”: 0x100000f0c
What does this show?
- pt and MSG are the same address, while ar is a different address, just as promised.
- Although the string literal “I’m special.” occurs twice in theprintf() statements, the compiler chose to use one storage location, but not the same address as MSG .
The compiler has the freedom to store a literal that’s used more than once in one or more locations. Another compiler might choose to represent all three occurrences of “I’m special.” with a single storage location. - the part of memory used for static data is different from that used for dynamic memory,
the memory used for ar.
Array and pointer differences
char heart[] = "I love Tillie!";
const char *head = "I love Millie!";
-
The chief difference is that the array name heart is a constant, but the pointer head is a variable.
Eg: head = heart; is valid, but heart = hear; is illegal construction -
Let’s go back to a pointer initialization that doesn’t use the const modifier:
char * p1 = “Klingon”;
Can you use the pointer to change this string?
p1[1] = ‘F’; // allowed??Eg, the following statements could all refer to a single memory location of string “Klingon”.
char * p1 = “Klingon”;
p1[0] = ‘F’; // ok?
printf(“Klingon”);
printf(": Beware the %ss!\n", “Klingon”);
That is, the compiler can replace each instance of “Klingon” with the same address. If the compiler uses this single-copy representation and allows changing p1[0] to ‘F’ , that would affect all uses of the string, so statements printing the string literal “Klingon” would actually display “Flingon” :
Flingon: Beware the Flingons!
Therefore, the recommended practice for initializing a pointer to
a string literal is to use the const modifier:
const char * pl = “Klingon”; // recommended usage
Initializing a non- const array with a string literal, however, poses no such problems, because the array gets a copy of the original string.
In short, don’t use a pointer to a string literal if you plan to alter the string.
Arrays of character strings
It is often convenient to have an array of character strings. Then you can use a subscript to access several different strings.
Eg:
(1). an array of pointers to strings
const char *mytalents[LIM] = {
"Adding numbers swiftly",
"Multiplying accurately", "Stashing data",
"Following instructions to the letter",
"Understanding the C language"
};
(2). an array of char arrays
char yourtalents[LIM][SLEN] = {
"Walking in a straight line",
"Sleeping", "Watching television",
"Mailing letters", "Reading email"
};
-
In some ways, mytalents and yourtalents are much alike. Each represents five strings. When used with one index, as in mytalents[0] and yourtalents[0] , the result is a single string.
And, just as mytalents[1][2] is ‘l’ , yourtalents[1][2] is ‘e’. Both are initialized in the same fashion. -
But there are differences, too. The mytalents array is an array of five pointers, taking up 40 bytes on our system. But yourtalents is an array of five arrays, each of 40 char values, occupying 200 bytes on our system. So mytalents is a different type from yourtalents , even though mytalents[0] and yourtalents[0] both are strings. The pointers in mytalents point to the locations of the string literals used for initialization, which are stored in static memory.
The arrays in yourtalents , however, contain copies of the string literals, so each string is stored twice. Furthermore, the allocation of memory in the arrays is inefficient, for each element of yourtalents has to be the same size, and that size has to be at least large enough to hold the longest string. -
One way of visualizing this difference is to think of yourtalents as a rectangular two-dimensional array, with each row being of the same length, 40 bytes, in this case. Next, think of mytalents as a ragged array, one in which the row length varies.
char fruit1[3][7] = {“Apple”, “Pear”, “Orange”}; // retangular array
A p p l e \0 \0
P e a r \0 \0 \0
O r a n g e \0const char *fruit2[3] = {“Apple”, “Pear”, “Orange”}; // ragged array
A p p l e \0
P e a r \0
O r a n g e \0 -
The upshot is that, if you want to use an array to represent a bunch of strings to be displayed, an array of pointers is more efficient than an array of character arrays. There is, however, a catch. Because the pointers in mytalents point to string literals, these strings shouldn’t be altered. The contents of yourtalents , however, can be changed. So if you want to alter strings or set aside space for string input, don’t use pointers to string literals.
Pointers and strings
Most C operations for strings actually work with pointers.
Eg:
const char * mesg = “Don’t be a fool!”;
const char * copy;
copy = mesg;
- mesg and copy are two pointers stored in different locations. ( &mesg != © )
- mesg and copy point to the same string. (copy = mesg)
- copying one address is more efficient than copying the whole string.
String input
If you want to read a string into a program, you must first set aside space to store the string and then use an input function to fetch the string.
Creating space
The first order of the business is setting up a place to put the string after it is read.
char *name; // do not set up the size and name is an uninitiallized pointer
scanf("%s", name);
The simplest course is to include an explicit array size in the declaration:
char name[81];
The unfortunate gets() function
-
Recall that, when reading a string, scanf() and the %s specifier read just a single word.
gets() function reads an entire line up through the newline character, discards the newline character, stores the remaining characters, adding a null character to create a C string.
It’s often paired with puts(), which displays a string, adding a newline -
The problem is that gets() doesn’t check to see if the input line actually fits into the array.
Thus gets() only knows where the array begins, not how many elements it has.
If the input string is too long, you get buffer overflow
3.So gets() unsafe behavior poses a security risk.
The alternatives to gets()
- The traditional alternative to gets() is fgets().
The fgets() function meets the possible overflow problem by taking a second argument that limits the number of characters to be read. This function is designed for file input, which makes it a little more awkward to use. Here is how fgets() differs from gets() :
■ It takes a second argument indicating the maximum number of characters to read. If this argument has the value n , fgets() reads up to n-1 characters or through the newline character, whichever comes first.
■ If fgets() reads the newline, it stores it in the string, unlike gets() , which discards it.
■ It takes a third argument indicating which file to read. To read from the keyboard, use stdin (for standard input ) as the argument; this identifier is defined in stdio.h .
2.Because the fgets() function includes the newline as part of the string, it’s often paired with fputs(), which works like puts() , except that it doesn’t automatically append a newline.
It takes a second argument to indicate which file to write to.
For the computer monitor we can use stdout (for standard output) as an argument
Eg:
/* fgets1.c – using fgets() and fputs() */
#include <stdio.h>
#define STLEN 14
int main(void)
{
char words[STLEN];
puts(“Enter a string, please.”);
fgets(words, STLEN, stdin);
printf(“Your string twice (puts(), then fputs()):\n”);
puts(words);
fputs(words, stdout);
puts(“Enter another string, please.”);
fgets(words, STLEN, stdin);
printf(“Your string twice (puts(), then fputs()):\n”);
puts(words);
fputs(words, stdout);
puts(“Done.”);
return 0;
}
Here’s a sample run:
Enter a string, please.
**apple pie**
Your string twice (puts(), then fputs()):
apple pie
apple pie
Enter another string, please.
**strawberry shortcake**
Your string twice (puts(), then fputs()):
strawberry sh
strawberry shDone.
- The first input, apple pie , is short enough that fgets() reads the whole input line and stores apple pie\n\0 in the array.(words = apple pie\n\0)
The second input line, strawberry shortcake , exceeds the size limit, so fgets() reads the first 13 characters and stores strawberry sh\0 in the array. (words = strawberry s\0) - The fgets() function returns a pointer to char . If all goes well, it just returns the same address that was passed to it as the first argument. If the function encounters end-of-file, however, it returns a special pointer called the null pointer .
- The fact that fgets() stores the newline presents a problem and an opportunity. The problem is that you might not want the newline as part of the string you store. The opportunity is the presence or absence of a newline character in the stored string can be used to tell whether the whole line was read. If it wasn’t, then you can decide what to do with the rest of the line.
First, how can you get rid of a newline? One way is to search the stored string for a newline and to replace it with a null character:
while (words[i] != '\n') // assuming \n in words
i++;
words[i] = '\0';
Second, what if there are still characters left in the input line? One reasonable choice if the whole line doesn’t fit into the destination array is to discard the part that doesn’t fit:
while (getchar() != '\n') // read but don't store
continue; // input including \n
- The gets_s function
The three main differences from fgets() are these:
■ gets_s() just reads from the standard input, so it doesn’t need a third argument.
■ If gets_s() does read a newline; it discards it rather than storing it.
■ If gets_s() reads the maximum number of characters and fails to read a newline, it takes several steps. It sets the first character of the destination array to the null character.
It reads and discards subsequent input until a newline or end-of-file is encountered. It returns the null pointer. It invokes an implementation-dependent “handler” function (or else one you’ve selected), which may cause the program to exit or abort.
The scanf function
Recall that the scanf() function returns an integer value that equals the number of items successfully read or returns EOF if it encounters the end of file.
String output
Now let’s move from string input to string output. Again, we will use library functions. C has three standard library functions for printing strings: puts() , fputs() , and printf() .
The puts() function
puts() automatically appends a newline when it displays a string.
The fputs() function
The fputs() function is the file-oriented version of puts() . The main differences are these:
■ The fputs() function takes a second argument indicating the file to which to write. You
can use stdout (for standard output ), which is defined in stdio.h , as an argument to
output to your display.
■ Unlike puts() , fputs() does not automatically append a newline to the output.
Note that gets() discards a newline on input, but puts() adds a newline on output. On the other hand, fgets() stores the newline on input, and fputs() doesn’t add a newline on output.
Recall that gets() returns the null pointer if it encounters end-of-file. The null pointer evaluates as zero, or false, so that terminates the loop.
The printf() function
The do-it-yourself option
You can prepare your own versions, building on getchar() and putchar() .
String function
The C library supplies several string-handling functions; ANSI C uses the string.h header file to provide the prototypes.
The strlen() function
The strlen() function, as you already know, finds the length of a string.
The strcat() function
The strcat() (for string concatenation ) function takes two strings for arguments. A copy of the second string is tacked onto the end of the first, and this combined version becomes the new first string.
The second string is not altered. The strcat() function is type char * (that is, a pointer-to- char ).
It returns the value of its first argument—the address of the first character of
the string to which the second string is appended.
The strncat() Function
The strcat() function does not check to see whether the second string will fit in the first array.
Alternatively, you can use strncat() , which takes a second argument indicating the maximum number of characters to add.
The strcmp() function
strcmp() (for string comparison ). This function does for strings what relational operators do for numbers.
In particular, it returns 0 if its two string arguments are the same and nonzero otherwise.
strcmp() compares strings, not arrays.
The strncmp() functioncompares the strings until they differ or until it has compared a number of characters specified
by a third argument.
The strcpy() function and strncpy () function
strcpy() to copy the string from the temporary array to a permanent destination. The strcpy() function is the string equivalent of the assignment operator.
The strcpy() function shares a problem with strcat() —neither checks to see whether the source string actually fits in the target string. The safer way to copy strings is to use strncpy() .
It takes a third argument, which is the maximum number of characters to copy. Listing 11.27 is a rewrite of Listing 11.25 , using strncpy() instead of strcpy() .
The sprintf() function
The sprintf() function is declared in stdio.h instead of string.h . It works like printf() , but it writes to a string instead of writing to a display.
Other string function
The ANSI C library has more than 20 string-handling functions.
Command-line arguments in integrated enviroments
-
Executable file called repeat
C> repeat Resistance is futile
int main(int argc, char *argv[]) -
C compilers allow main() to have no arguments or else to have two arguments.
With two arguments, the first argument is the number of strings in the command line.
By tradition, this int argument is called argc for argument count. The system uses spaces to tell when one string ends and the next begins. Therefore, the repeat example has four strings, including the command name, and the fuss example has three. The program stores the command line strings in memory and stores the address of each string in an array of pointers. The address of this array is stored in the second argument. By convention, this pointer to pointers is called argv , for argument values . When possible (some operating systems don’t allow this), argv[0] is assigned the name of the program itself. Then argv[1] is assigned the first following string, and so on. For our example, we have the following relationships:argv[0] points to repeat (for most systems)
argv[1] points to Resistance
argv[2] points to is
argv[3] points to futile -
int main(int argc, char *argv[]) is equivelent to
int main(int argc, char **argv)
Command-line arguments with the macintosh
A string example: sorting strings
Command-line arguments
String-to-number conversions
If the number is an integer, you can use the atoi() function (for alphanumeric to integer ).
It takes a string as an argument and returns the corresponding integer value.
Key concepts
Many programs deal with text data. A program may ask you to enter your name, a list of corporations, an address, the botanical name for a type of fern, the cast of a musical, or…well, because we interact with the world using words, there’s really no end to examples using text. And strings are the means a C program uses to handle strings.
A C string —whether it be identified by a character array, a pointer, or a string literal—is stored as a series of bytes containing character codes, and the sequence is terminated by the null character.
C recognizes the usefulness of strings by providing a library of functions for manipulating them, searching them, and analyzing them. In particular, keep in mind that you should use strcmp() instead of relational operators when comparing strings, and you should use strcpy() or strncpy() instead of the assignment operator to assign a string to a character array.
Summary
A C string is a series of char s terminated by the null character, ‘\0’ . A string can be stored in a character array. A string can also be represented with a string constant , in which the characters, aside from the null character, are enclosed in double quotation marks. The compiler supplies the null character. Therefore, “joy” is stored as the four characters j , o , y , and \0 . The length
of a string, as measured by strlen() , doesn’t count the null character.
String constants, also known as string literals , can be used to initialize character arrays. The array size should be at least one greater than the string length to accommodate the terminating null character. String constants can also be used to initialize pointers of type pointer-to- char .
Functions use pointers to the first character of a string to identify on which string to act. Typically, the corresponding actual argument is an array name, a pointer variable, or a quoted string. In each case, the address of the first character is passed. In general, it is not necessary to pass the length of the string, because the function can use the terminating null character to
locate the end of a string.
The f gets() function fetches a line of input, and the puts() and fputs() functions display a line of output. They are part of the stdio.h family of functions, as once was the now disgraced and abandoned function gets() .
The C library includes several string-handling functions. Under ANSI C, these functions are declared in the string.h file. The library also has several character-processing functions; they are declared in the ctype.h file.
You can give a program access to command-line arguments by providing the proper two formal variables to the main() function. The first argument, traditionally called argc , is an int and is assigned the count of command-line words. The second argument, traditionally called argv , is a pointer to an array of pointers to char . Each pointer-to- char points to one of the commandline
argument strings, with argv[0] pointing to the command name, argv[1] pointing to the first command-line argument, and so on.
The atoi() , atol() , and atof() functions convert string representations of numbers to type int , long , and double forms, respectively. The strtol() , strtoul() , and strtod() functions convert string representations of numbers to type long , unsigned long , and double forms, respectively.
Reference:
[1] C primer plus 6th edition.https://www.pearson.com/us/higher-education/program/Prata-C-Primer-Plus-6th-Edition/PGM4399.html?tab=overview
[2]. https://github.com/leejianping/C-primer-plus-6th/new/master/chapter 13
[3]. https://blog.youkuaiyun.com/Lee567/article/details/92382776