472. Concatenated Words

本文介绍了一种算法,用于从给定的单词列表中找出所有能够由至少两个较短单词组成的单词。通过深度优先搜索(DFS)并利用哈希映射来减少重复计算,实现了高效查找。

Given a list of words (without duplicates), please write a program that returns all concatenated words in the given list of words.

A concatenated word is defined as a string that is comprised entirely of at least two shorter words in the given array.

Example:

Input: ["cat","cats","catsdogcats","dog","dogcatsdog","hippopotamuses","rat","ratcatdogcat"]

Output: ["catsdogcats","dogcatsdog","ratcatdogcat"]

Explanation: "catsdogcats" can be concatenated by "cats", "dog" and "cats"; 
"dogcatsdog" can be concatenated by "dog", "cats" and "dog";
"ratcatdogcat" can be concatenated by "rat", "cat", "dog" and "cat".

Note:

  1. The number of elements of the given array will not exceed 10,000
  2. The length sum of elements in the given array will not exceed 600,000.
  3. All the input string will only include lower case letters.
  4. The returned elements order does not matter.

Subscribe to see which companies asked this question.


给出一个单词的集合,要求找出其中能用集合中的两个(或以上)单词组成的单词。和140题类似,只不过是单词放在了一个集合中。也是用dfs的方法,为了减少重复的计算,比如给出{“a”,"aa","aaa","aaaa"},如果已经计算好了“aa”可以用两个a组成,则用map记录下对应的数值2,。到“aaa”的时候先是遍历到单词的第一部分“a”,对第二部分“aa”用dfs,由于本来就记录好了“aa”对应的数值为2,直接返回2,这时候加起来就是3,只要数值大于1说明了这个单词是有效的答案,而不用管数值是多少了,直接返回数值,这样就能减少很多的运算。


代码:

class Solution {
public:
	vector<string> findAllConcatenatedWordsInADict(vector<string>& words) 
	{
		vector<string> res;
		if(words.empty()) return res;
		for(auto word:words) map[word] = 1;
		for(auto word:words) 
		{
			if(dfs(word) > 1) res.push_back(word);
		}
		return res;
	}
private:
	unordered_map<string, int> map;
	unordered_set<string> set;
	int dfs(string& s)
	{
		int ret = -1;
		if(s.empty()) return 0;
		if(map.find(s) != map.end() && set.find(s) != set.end()) return map[s];
		for(int i = 1; i <= s.size(); ++i)
		{
			string l = s.substr(0, i), r = s.substr(i);
			if(map.find(l) != map.end())
			{
				int d = dfs(r);
				if(d == -1) continue;
				ret = max(ret, map[l]+d);
				if(ret > 1) 
				{
					map[s] = ret;
					set.insert(s);
					return ret;
				}
			}
		}
		return ret;
	}
};


import pandas as pd import numpy as np from sklearn.preprocessing import MinMaxScaler, OneHotEncoder import tensorflow from tensorflow.keras.preprocessing.text import Tokenizer from tensorflow.keras.preprocessing.sequence import pad_sequences from tensorflow.keras.models import Model from tensorflow.keras.layers import Input, Dense, Embedding, LSTM, Concatenate, Dropout, BatchNormalization from tensorflow.keras.optimizers import Adam from sklearn.model_selection import train_test_split from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau # 1.数据预处理与特征工程 # 加载数据集 df = pd.read_csv("training_data.csv") # 数值特征标准化 num_features = ['position', 'quality'] scaler = MinMaxScaler() df[num_features] = scaler.fit_transform(df[num_features]) # 序列特征编码 tokenizer = Tokenizer(char_level=True, num_words=4) # 仅A,C,G,T四种碱基 tokenizer.fit_on_texts(df['context']) sequences = tokenizer.texts_to_sequences(df['context']) max_length = max(len(seq) for seq in sequences) padded_sequences = pad_sequences(sequences, maxlen=max_length, padding='post') # 标签提取 labels = df['label'].values # 2.双输入混合模型架构 # 序列输入分支 sequence_input = Input(shape=(max_length,), name='sequence_input') embedding = Embedding(input_dim=5, output_dim=8, input_length=max_length)(sequence_input) # 5=4碱基+填充 lstm_out = LSTM(32, return_sequences=False)(embedding) # 数值特征输入分支 numeric_input = Input(shape=(len(num_features),), name='numeric_input') dense_numeric = Dense(16, activation='relu')(numeric_input) bn_numeric = BatchNormalization()(dense_numeric) # 合并分支 concatenated = Concatenate()([lstm_out, bn_numeric]) dense1 = Dense(64, activation='relu')(concatenated) dropout1 = Dropout(0.3)(dense1) dense2 = Dense(32, activation='relu')(dropout1) output = Dense(1, activation='sigmoid')(dense2) # 构建模型 model = Model(inputs=[sequence_input, numeric_input], outputs=output) model.compile( optimizer=Adam(learning_rate=0.001), loss='binary_crossentropy', metrics=['accuracy', 'AUC'] ) model.summary() # 3.模型训练与评估 # 划分训练集和测试集 X_seq_train, X_seq_test, X_num_train, X_num_test, y_train, y_test = train_test_split( padded_sequences, df[num_features].values, labels, test_size=0.2, stratify=labels ) # 回调函数 callbacks = [ EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True), ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=5, min_lr=1e-6) ] # 训练模型 history = model.fit( [X_seq_train, X_num_train], y_train, validation_data=([X_seq_test, X_num_test], y_test), epochs=100, batch_size=64, callbacks=callbacks, class_weight={0: 1, 1: 2} # 处理类别不平衡 ) # 评估模型 test_loss, test_acc, test_auc = model.evaluate( [X_seq_test, X_num_test], y_test ) print(f"测试准确率: {test_acc:.4f}, AUC: {test_auc:.4f}") 请优化该代码 tensorflow.keras.preprocessing.text tensorflow.keras.preprocessing.sequence tensorflow.keras.models tensorflow.keras.layers tensorflow.keras.optimizers tensorflow.keras.callbacks 无法导入
07-19
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值