程序设计实训第一次作业---单词索引编排

最新推荐文章于 2023-11-10 12:00:50 发布

原创最新推荐文章于 2023-11-10 12:00:50 发布 · 3.1k 阅读

18 ·

CC 4.0 BY-SA版权

C 专栏收录该内容

41 篇文章

订阅专栏

该博客介绍了如何处理英文文章，生成按字典序排列的词汇表。内容包括从可能未经排版的文章中提取单词，对单词进行排序，并考虑文章可能的不完整性和单词数量限制。提供两个样例展示输入输出格式。

【问题描述】

打开一英文文章（保存在一个现有文件in.txt中），为该文件生成词汇表（存到另一个文件out.txt中），要求词汇表中的单词以字典序由小到大存放（只由连续字母组成，且全为小写字母，不重复）。
假设：
1、该文章有可能没有经过排版，格式有可能杂乱无章，也有可能没有写完整。
2、文章中的单词个数不超过1000个，每个单词的长度不超过50个字母。

【输入形式】

保存英文文章的文件in.txt位于当前目录下。

【输出形式】

将生成的词汇表以字典序由小到大输出到当前目录下的文件out.txt中，每个单词单独占一行。

【样例输入1】

假设文件in.txt内容为：

There are two versions of the international standards for C.
The first version was ratified in 1989 by the American National
Standards Institue (ANSI) C standard committee.It is often
referred as ANSI C or C89. The secand C standard was completed
in 1999. This standard is commonly referred to as C99. C99 is a
milestone in C’s evolution into a viable programming language
for numerical and scientific computing.

【样例输出1】
文件out.txt中的词汇表应为：

a
american
and
ansi
are
as
by
c
committee
commonly
completed
computing
evolution
first
for
in
institue
international
into
is
it
language
milestone
national
numerical
of
often
or
programming
ratified
referred
s
scientific
secand
standard
standards
the
there
this
to
two
version
versions
viable
was

【样例输入2】

假设文件in.txt内容为：

There are two versions of the international standards for

【样例输出2】
文件out.txt中的词汇表应为：

are
for
international
of
standards
the
there
two
versions

【样例说明】

将in.txt中出现的所有由字母组成的单词（出现的单个字母也作为单词）全部转换成小写后，按照字典序输出到文件out.txt中（每个单词独占一行，重复出现的只输出一次）。
注意：样例2中in.txt文件内容还不完整，单词for后没有任何字符。

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main() {
    FILE *f, *g;
    char word[50], c, tmp[50];
    char words[1000][50];
    int i, t,j = 0, k = 0;
    f = fopen("in.txt", "r");
    g = fopen("out.txt", "w");
    while (!feof(f)) {
        fscanf(f, "%s", words[k++]);
    }
    rewind(f);

    for (i = 0; i < k; i++) {
        for (j = 0; j < strlen(words[i]); j++) {
            c = words[i][j];
            if ((c <= 'z'&&c >= 'a') || (c <= 'Z'&&c >= 'A')) {
                if ((c <= 'Z'&&c >= 'A')) {
                    c += 32;
                }
                words[i][j] = c;
            }
            else if(c=='\''){
                t = 0;
                int a=j;
                while (words[i][a++] != ' ') {
                    words[k][t++] = words[i][a];
                }
                k++;

            }
            else {

                words[i][j] = '\0';
            }
        }
    }

    for (i = 0; i < k - 1; i++) {
        for (j = i + 1; j < k; j++) {
            if (strcmp(words[i], words[j]) > 0) {
                strcpy(word, words[i]);
                strcpy(words[i], words[j]);
                strcpy(words[j], word);
            }
        }

    }
    for (i = 0; i < k; i++) {
        while ((strcmp(words[i], words[i + 1]) == 0)||(words[i][0]=='\0')) {
            i++;
        }
        fprintf(g, "%s\n", words[i]);
    }
    fclose(f);
    fclose(g);
    return 0;
}