472. Concatenated Words

博客介绍了一种处理字符串的方法。给定数组元素数量不超10000,元素长度总和不超600000,输入仅含小写字母。采用递归方法,类似word break,先确认左半部分可查,再递归拆解右半部分,查找前删掉单词本身。

472. Concatenated Words


Given a list of words (without duplicates), please write a program that returns all concatenated words in the given list of words.
A concatenated word is defined as a string that is comprised entirely of at least two shorter words in the given array.

Example:

Input: ["cat","cats","catsdogcats","dog","dogcatsdog","hippopotamuses","rat","ratcatdogcat"]

Output: ["catsdogcats","dogcatsdog","ratcatdogcat"]

Explanation: "catsdogcats" can be concatenated by "cats", "dog" and "cats"; 
 "dogcatsdog" can be concatenated by "dog", "cats" and "dog"; 
"ratcatdogcat" can be concatenated by "rat", "cat", "dog" and "cat".

Note:

  1. The number of elements of the given array will not exceed 10,000
  2. The length sum of elements in the given array will not exceed 600,000.
  3. All the input string will only include lower case letters.
  4. The returned elements order does not matter.

方法1: recursion

思路:

和word break一样,先取左边确认可查,然后递归拆解右半部分。这里需要在每次查找前把单词本身删掉,也不需要查找单词自身(左右空子串的情况都不用考虑),

class Solution {
public:
    vector<string> findAllConcatenatedWordsInADict(vector<string>& words) {
        unordered_set<string> dict(words.begin(), words.end());
        vector<string> result;
        for (auto word: words) {
            dict.erase(dict.find(word));
            if (findAllHelper(word, dict)) {
                result.push_back(word);
            }
            dict.insert(word);
        }
        return result;
    }
    
    bool findAllHelper(string s, unordered_set<string> & dict) {
        if (dict.count(s)) return true;
        
        for (int i = 1; i < s.size(); i++) {
            string left = s.substr(0, i);
            string right = s.substr(i);
            if (dict.count(left) && findAllHelper(right, dict)) {
                return true;
            }
        }
        return false;
    }
};
import pandas as pd import numpy as np from sklearn.preprocessing import MinMaxScaler, OneHotEncoder import tensorflow from tensorflow.keras.preprocessing.text import Tokenizer from tensorflow.keras.preprocessing.sequence import pad_sequences from tensorflow.keras.models import Model from tensorflow.keras.layers import Input, Dense, Embedding, LSTM, Concatenate, Dropout, BatchNormalization from tensorflow.keras.optimizers import Adam from sklearn.model_selection import train_test_split from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau # 1.数据预处理与特征工程 # 加载数据集 df = pd.read_csv("training_data.csv") # 数值特征标准化 num_features = ['position', 'quality'] scaler = MinMaxScaler() df[num_features] = scaler.fit_transform(df[num_features]) # 序列特征编码 tokenizer = Tokenizer(char_level=True, num_words=4) # 仅A,C,G,T四种碱基 tokenizer.fit_on_texts(df['context']) sequences = tokenizer.texts_to_sequences(df['context']) max_length = max(len(seq) for seq in sequences) padded_sequences = pad_sequences(sequences, maxlen=max_length, padding='post') # 标签提取 labels = df['label'].values # 2.双输入混合模型架构 # 序列输入分支 sequence_input = Input(shape=(max_length,), name='sequence_input') embedding = Embedding(input_dim=5, output_dim=8, input_length=max_length)(sequence_input) # 5=4碱基+填充 lstm_out = LSTM(32, return_sequences=False)(embedding) # 数值特征输入分支 numeric_input = Input(shape=(len(num_features),), name='numeric_input') dense_numeric = Dense(16, activation='relu')(numeric_input) bn_numeric = BatchNormalization()(dense_numeric) # 合并分支 concatenated = Concatenate()([lstm_out, bn_numeric]) dense1 = Dense(64, activation='relu')(concatenated) dropout1 = Dropout(0.3)(dense1) dense2 = Dense(32, activation='relu')(dropout1) output = Dense(1, activation='sigmoid')(dense2) # 构建模型 model = Model(inputs=[sequence_input, numeric_input], outputs=output) model.compile( optimizer=Adam(learning_rate=0.001), loss='binary_crossentropy', metrics=['accuracy', 'AUC'] ) model.summary() # 3.模型训练与评估 # 划分训练集和测试集 X_seq_train, X_seq_test, X_num_train, X_num_test, y_train, y_test = train_test_split( padded_sequences, df[num_features].values, labels, test_size=0.2, stratify=labels ) # 回调函数 callbacks = [ EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True), ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=5, min_lr=1e-6) ] # 训练模型 history = model.fit( [X_seq_train, X_num_train], y_train, validation_data=([X_seq_test, X_num_test], y_test), epochs=100, batch_size=64, callbacks=callbacks, class_weight={0: 1, 1: 2} # 处理类别不平衡 ) # 评估模型 test_loss, test_acc, test_auc = model.evaluate( [X_seq_test, X_num_test], y_test ) print(f"测试准确率: {test_acc:.4f}, AUC: {test_auc:.4f}") 请优化该代码 tensorflow.keras.preprocessing.text tensorflow.keras.preprocessing.sequence tensorflow.keras.models tensorflow.keras.layers tensorflow.keras.optimizers tensorflow.keras.callbacks 无法导入
最新发布
07-19
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值