C语言字符串输入

最新推荐文章于 2023-07-05 22:51:09 发布

明暖橙

最新推荐文章于 2023-07-05 22:51:09 发布

阅读量8.9k

点赞数 26

CC 4.0 BY-SA版权

本文链接：https://blog.youkuaiyun.com/merry1996/article/details/84193856

本文深入探讨了C语言中字符串输入的安全性问题，对比了gets()和fgets()函数的使用，详细解析了gets()函数的危险性和fgets()如何避免缓冲区溢出，以及通过实例演示了fgets()和fputs()的正确用法。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

如果想把一个字符串读入程序，首先必须预留存储该字符串的空间，然后用输入函数获取该字符串。

1.分配空间

要做的第一件事是分配空间，以储存稍后读入的字符串。

假设编写了如下代码：

char *name;
scanf("%s",name);

虽然可能会通过编译（编译器很可能给出警告），但是在读入name时，name可能擦写掉程序中的数据或代码，从而导致程序异常终止。因为scanf()要把信息拷贝到参数指定的地址上，而此时该参数是一个未初始化的指针，name可能指向任何地方。

最简单的方法是，在声明时显示指明数组的大小：

char name[81];

现在name是一个已经分配块(81字节)的地址。

为字符串分配内存后，便可读入字符串。

C库提供了许多读取字符串的函数：scanf() gets()和fgets()。

2.不幸的gets()函数

在读取字符串时，scanf()和转换说明%s只能读取一个单词。可是在程序中经常要读取一整行输入，而不仅仅是一个单词。许多年前，gets()函数就用于处理这种情况。gets()函数简单易用，它读取整行输入，直至遇到换行符，然后丢弃换行符，储存其余字符，并在这些字符的末尾添加一个空字符使其成为一个C字符串。它经常和puts()函数配对使用，该函数用于显示字符串，并在末尾添加换行符。

程序getsputs.c演示了这两个函数的用法。

/*getsputs.c --使用gets()和puts()*/
#include <stdio.h>
#include <stdlib.h>
#define STLEN 81
int main()
{
    char words[STLEN];
    puts("enter a string, please.");
    gets(words);
    printf("your string twice:\n");
    printf("%s\n",words);
    puts(words);
    puts("Done.");
    return 0;
}

下面是在Ubuntu16.04下使用GCC编译的结果：

mary@mary-virtual-machine:~/Desktop/Code/getsputs$ gcc getsputs.c -o getsputs
getsputs.c: In function ‘main’:
getsputs.c:8:5: warning: implicit declaration of function ‘gets’ [-Wimplicit-function-declaration]
     gets(words);
     ^
/tmp/ccyiuvjH.o: In function `main':
getsputs.c:(.text+0x2e): warning: the `gets' function is dangerous and should not be used.
mary@mary-virtual-machine:~/Desktop/Code/getsputs$ ./getsputs 
enter a string, please.
hello
your string twice:
hello
hello
Done.

编译器在输出时插入了一条警告信息"warning: the `gets' function is dangerous and should not be used.“

这是怎么回事呢？问题就出在gets()唯一的参数是words，它无法检查数组是否装得下输入行。数组名会被转换成该数组首元素的地址，因此，gets()函数只知道数组的开始处，并不知道数组中有多少个元素。

如果输入的字符过长，会导致缓冲区溢出(buffer overflow)，即多余的字符超出了指定的目标空间。如果这些多余的字符只是占用了尚未使用的内存，就不会立即出现问题；如果它们擦写掉程序中的其他数据，会导致程序异常终止；或者还有其他情况。

为了输入的字符串容易溢出，我们把程序中的STLEN设置为5，程序的输出如下：

mary@mary-virtual-machine:~/Desktop/Code/getsputs$ ./getsputs 
enter a string, please.
i think i'll be just fine.
your string twice:
i think i'll be just fine.
i think i'll be just fine.
Done.
*** stack smashing detected ***: ./getsputs terminated
Aborted (core dumped)

“stack smashing detected”，栈溢出。维基百科的解释:“Stack buffer overflow bugs are caused when a program writes more data to a buffer located on the stack than what is actually allocated for that buffer”，当一个程序试图在栈中的一个buffer中写入比实际给这个buffer分配的空间还要多的数据时，就会导致栈溢出。

3.gets()的替代品

fgets()函数(和fputs())

fgets()函数通过第2个参数限制读入的字符数来解决溢出的问题。该函数专门设计用于处理文件输入，所以一般情况下可能不太好用。

fgets()和gets()的区别如下：

fgets()函数的第二个参数指明了读入字符的最大数量。如果该参数的值是n,那么fgets()将读入n-1个字符，或者读到遇到的第一个换行符为止。
如果fgets()读到一个换行符，会把它储存在字符串中。这点和gets()不同，gets()会丢弃换行符。
fgets()函数的第3个参数指明要读入的文件。如果读入从键盘输入的数据，则以stdin作为参数。

因为fgets()函数将换行符放在字符串的末尾(假设输入行不溢出),通常要与fputs()函数配对使用。

程序fgets1.c演示了fgets()和fputs()函数的用法。

/*fgets1.c --使用fgets()和fputs()*/
#include <stdio.h>
#include <stdlib.h>
#define STLEN 14
int main()
{
    char words[STLEN];
    puts("enter a string, please.");
    fgets(words, STLEN, stdin);
    printf("your string twice:(puts(),then fputs()):\n");
    puts(words);
    fputs(words, stdout);
    puts("enter another string,please.");
    fgets(words, STLEN, stdin);
    printf("your string twice:(puts(),then fputs()):\n");
    puts(words);
    fputs(words, stdout);
    puts("Done.");
    return 0;
}

下面是该程序的输出示例：

enter a string, please.
Apple pie
your string twice:(puts(),then fputs()):
Apple pie

Apple pie
enter another string,please.
strawberry shortcake
your string twice:(puts(),then fputs()):
strawberry sh
strawberry shDone.

第1行输入，Apple pie，比fgets()读入的整行输入短，因此，Apple pie\n\0被储存在数组里。所以当puts()显示该字符串时又在末尾添加了换行符，因此Apple pie后面有一行空行。因为fputs()不在字符串末尾添加换行符，所以并未打印出空行。

第2行输入，strawberry shortcake，超过了大小的限制，所以fgets()只读入了13个字符，并把strawberry sh\0储存在数组中。因为fputs()不在字符串末尾添加换行符，可以看到strawberry shDone,sh和Done之间没有换行符。

fgets()函数返回指向char的指针。如果一切进行顺利，该函数返回的地址与传入的第1个参数相同。但是，如果函数读到文件末尾，它将返回一个特殊的指针：空指针(null pointer).该指针保证不会指向有效的数据，所以可用于标识这种特殊情况。

程序fgets2.c演示了一个简单的循环，读入并显示用户输入的内容，直到fgets()读到文件结尾或者空行(即，首字符是换行符).

/*fgets2.c --使用fgets()和fputs()*/
#include <stdio.h>
#include <stdlib.h>
#define STLEN 10
int main()
{
    char words[STLEN];
    char *p;
    puts("enter a string(empty line to quit):");
    p = fgets(words, STLEN, stdin);
    printf("p=%p,words[0]=%d\n",p,words[0]);
    puts("enter another string(empty line to quit):");
    fputs(words, stdout);
    while(fgets(words, STLEN, stdin) != NULL && words[0] != '\n')
    {
        puts("\tin while loop");
        fputs(words, stdout);

    }
    puts("Done");
    return 0;
}

下面是该程序的输出示例：

enter a string(empty line to quit):

p=0060FF02,words[0]=10
enter another string(empty line to quit):

by the way, the gets() function
        in while loop
by the wa       in while loop
y, the ge       in while loop
ts() func       in while loop
tion
also returns a null pointer if it
        in while loop
also retu       in while loop
rns a nul       in while loop
l pointer       in while loop
 if it
encounters end-of-file.
        in while loop
encounter       in while loop
s end-of-       in while loop
file.

Done

第一次输入字符串时，直接输入了一个换行符，打印出words[0]=10，10代表换行符，是\n的ACSII值。

可以看到，我在代码中加入了puts("\tin while loop")，用于弄清fgets()的本质。可以从运行结果中看到，当输入by the way, the gets() function时，这个字符串分别以“by the wa" "y, the ge" "ts() func" "tion"这四个字符串输出。这是为什么呢？

程序中的fgets()一次读入STLEN - 1个字符(该例中为9个字符).所以，一开始它只读了“by the wa",并储存为by the wa\0;接着fputs()打印该字符串，而且并未换行。

然后while循环进入下一轮迭代，fgets()继续从剩余的输入中读入数据，即读入"y, the ge"并储存为y, the ge\0;接着fputs()接着打印第2次读入的字符串。然后while进入下一轮迭代，fgets()继续读取输入、fputs()打印字符串，这一过程循环进行，直到读入最后的"tion\n".fgets()将其储存为tion\n\0,fputs()打印该字符串，由于字符串中的\n,光标被移至下一行开始处。

如果去掉puts("\tin while loop")，

    while(fgets(words, STLEN, stdin) != NULL && words[0] != '\n')
    {
        //puts("\tin while loop");
        fputs(words, stdout);

    }

程序的运行结果如下：

enter a string(empty line to quit):

p=0060FF02,words[0]=10
enter another string(empty line to quit):

by the way, the gets() function
by the way, the gets() function
also returns a null pointer if it
also returns a null pointer if it
encounters end_of_file.
encounters end_of_file.

Done

这是因为系统使用缓冲的I/O。这意味着用户在按下Enter键前，输入都被储存在临时存储区(即，缓冲区)中。按下Enter键就在输入中增加了一个换行符，并把整行输入发送给fgets()，接下来，fgets()从缓冲区中读取数据，并通过fputs()将其输出。