472. Concatenated Words

本文介绍了一个程序,该程序接收一个无重复单词的列表,并返回所有由至少两个较短单词组成的组合词。通过深度优先搜索(DFS)算法实现,有效利用剪枝优化性能。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Given a list of words (without duplicates), please write a program that returns all concatenated words in the given list of words.

A concatenated word is defined as a string that is comprised entirely of at least two shorter words in the given array.

Example:

Input: ["cat","cats","catsdogcats","dog","dogcatsdog","hippopotamuses","rat","ratcatdogcat"]

Output: ["catsdogcats","dogcatsdog","ratcatdogcat"]

Explanation: "catsdogcats" can be concatenated by "cats", "dog" and "cats"; 
 "dogcatsdog" can be concatenated by "dog", "cats" and "dog"; 
"ratcatdogcat" can be concatenated by "rat", "cat", "dog" and "cat".

 

Note:

  1. The number of elements of the given array will not exceed 10,000
  2. The length sum of elements in the given array will not exceed 600,000.
  3. All the input string will only include lower case letters.
  4. The returned elements order does not matter.

DFS求解,中间加入剪枝,程序如下所示:

class Solution {
    Set<String> set = new HashSet<>();
    List<String> list = new ArrayList<>();
    Map<String, Integer> map = new HashMap<>();
    private int cnt = 0;
    public List<String> findAllConcatenatedWordsInADict(String[] words) {    
        for (String val : words){
            set.add(val);
        }
        for (String val : words){
            cnt = 0;
            dfs(val, val.length(), 0, 0);
            if (cnt >= 2){
                list.add(val);
                map.put(val, cnt);
            }
        }
        return list;
    }
    public void dfs(String s, int sLen, int begin, int cur){
        if (begin > sLen){
            return ;
        }
        if (begin == sLen ){
            cnt = Math.max(cur, cnt);
            return;
        }
        if (cnt >= 2){
            return;
        }
        for (int i = begin; i < sLen; ++ i){
            if (i+1 <= sLen){
                String st = s.substring(begin, i+1);
                if (set.contains(st)){
                    String str = s.substring(i+1, sLen);
                    if (map.containsKey(str)){
                        cnt = cur + map.get(str);
                        return;
                    }
                    dfs(s, sLen, i + 1, cur+1);
                }
            }
        }
    }
}

 

 

import pandas as pd import numpy as np from sklearn.preprocessing import MinMaxScaler, OneHotEncoder import tensorflow from tensorflow.keras.preprocessing.text import Tokenizer from tensorflow.keras.preprocessing.sequence import pad_sequences from tensorflow.keras.models import Model from tensorflow.keras.layers import Input, Dense, Embedding, LSTM, Concatenate, Dropout, BatchNormalization from tensorflow.keras.optimizers import Adam from sklearn.model_selection import train_test_split from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau # 1.数据预处理与特征工程 # 加载数据集 df = pd.read_csv("training_data.csv") # 数值特征标准化 num_features = ['position', 'quality'] scaler = MinMaxScaler() df[num_features] = scaler.fit_transform(df[num_features]) # 序列特征编码 tokenizer = Tokenizer(char_level=True, num_words=4) # 仅A,C,G,T四种碱基 tokenizer.fit_on_texts(df['context']) sequences = tokenizer.texts_to_sequences(df['context']) max_length = max(len(seq) for seq in sequences) padded_sequences = pad_sequences(sequences, maxlen=max_length, padding='post') # 标签提取 labels = df['label'].values # 2.双输入混合模型架构 # 序列输入分支 sequence_input = Input(shape=(max_length,), name='sequence_input') embedding = Embedding(input_dim=5, output_dim=8, input_length=max_length)(sequence_input) # 5=4碱基+填充 lstm_out = LSTM(32, return_sequences=False)(embedding) # 数值特征输入分支 numeric_input = Input(shape=(len(num_features),), name='numeric_input') dense_numeric = Dense(16, activation='relu')(numeric_input) bn_numeric = BatchNormalization()(dense_numeric) # 合并分支 concatenated = Concatenate()([lstm_out, bn_numeric]) dense1 = Dense(64, activation='relu')(concatenated) dropout1 = Dropout(0.3)(dense1) dense2 = Dense(32, activation='relu')(dropout1) output = Dense(1, activation='sigmoid')(dense2) # 构建模型 model = Model(inputs=[sequence_input, numeric_input], outputs=output) model.compile( optimizer=Adam(learning_rate=0.001), loss='binary_crossentropy', metrics=['accuracy', 'AUC'] ) model.summary() # 3.模型训练与评估 # 划分训练集和测试集 X_seq_train, X_seq_test, X_num_train, X_num_test, y_train, y_test = train_test_split( padded_sequences, df[num_features].values, labels, test_size=0.2, stratify=labels ) # 回调函数 callbacks = [ EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True), ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=5, min_lr=1e-6) ] # 训练模型 history = model.fit( [X_seq_train, X_num_train], y_train, validation_data=([X_seq_test, X_num_test], y_test), epochs=100, batch_size=64, callbacks=callbacks, class_weight={0: 1, 1: 2} # 处理类别不平衡 ) # 评估模型 test_loss, test_acc, test_auc = model.evaluate( [X_seq_test, X_num_test], y_test ) print(f"测试准确率: {test_acc:.4f}, AUC: {test_auc:.4f}") 请优化该代码 tensorflow.keras.preprocessing.text tensorflow.keras.preprocessing.sequence tensorflow.keras.models tensorflow.keras.layers tensorflow.keras.optimizers tensorflow.keras.callbacks 无法导入
最新发布
07-19
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值