Compound Words

本博客介绍如何在字典中查找由两个其他单词组成的双词复合词。通过输入一系列按字母顺序排列的单词,输出所有双词复合词,并保持排序。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

You are to find all the two-word compound words in a dictionary. A two-word compound word is a word in the dictionary that is theconcatenation of exactly two other words in the dictionary.

Input

Standard input consists of a number of lowercase words, one per line,in alphabetical order. There will be no more than 120,000 words.

Output

Your output should contain all the compound words, one per line, inalphabetical order.

Sample Input

a
alien
born
less
lien
never
nevertheless
new
newborn
the
zebra

 

Sample Output

alien
newborn


AC代码:
#include"iostream"
#include"set"
using namespace std;
int main()
{
    set<string> dic;
    string s;
    while(cin>>s) dic.insert(s);
    set<string> ::iterator it;
    for(it=dic.begin();it!=dic.end();it++)
    {
     string str,sub1,sub2;
     str=*it;
     for(int i=0;i<str.size()-1;i++)
     {
         sub1=str.substr(0,i+1);
         sub2=str.substr(i+1,str.size()-i-1);
         if(dic.find(sub1)!=dic.end()&&dic.find(sub2)!=dic.end())
         {
            cout<<str<<endl;
            break;     //一定要即时break掉,以免重复
         }
     }


    }
    return 0;
}

转载于:https://www.cnblogs.com/zsyacm666666/p/4659949.html

### BERT Vocabulary Definition and Usage In natural language processing, the concept of a vocabulary is fundamental to models like BERT (Bidirectional Encoder Representations from Transformers). The BERT model uses a specific type of tokenization method called WordPiece[^1]. This tokenizer breaks down input text into tokens that can be efficiently processed by the neural network. The BERT vocabulary consists of a predefined set of subword units or word pieces. These include whole words as well as parts of words with special characters such as prefixes or suffixes. For instance, uncommon compound words might not appear directly within this dictionary but instead are split into more common components found in it. A typical size for the BERT base model's vocabulary includes around 30,522 unique tokens which cover both regular English words along with special symbols used during training and inference phases[^2]. When applying BERT for tasks including grammatical error correction, each piece of incoming textual data gets converted using these learned mappings before being fed through layers designed specifically around transformer architecture principles[^3]. During pre-training stages involving masked language modeling objectives, certain positions within sentences get randomly obscured; then predictions attempt recovery based on contextual clues derived via bidirectionality across surrounding context windows—this process inherently leverages rich semantic representations encoded inside its extensive lexicon resource. #### Example Code Demonstrating Tokenizer Use ```python from transformers import BertTokenizer tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') text = "I love programming." tokens = tokenizer.tokenize(text) print(tokens) ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值