keras text classification

本文详细介绍文本分类任务的各种方法和技术,包括卷积神经网络(CNN)、注意力机制的循环神经网络(Attentional RNN)及层级注意力网络(Hierarchical Attention Network),并提供基于Keras的实际案例与调参技巧。同时探讨了多标签分类的处理方式。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Text Classification, Part I - Convolutional Networks

15  指定GPU并且限制GPU用量

import os
import tensorflow as tf
import keras.backend.tensorflow_backend as KTF

# 指定第一块GPU可用 
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

config = tf.ConfigProto()  
config.gpu_options.allow_growth=True   #不全部占满显存, 按需分配
sess = tf.Session(config=config)

KTF.set_session(sess)


16  Neural network for multi label classification with large number of classes outputs only zero

If you use the tensorflow backend in Keras, you can use the loss function like this (Keras 2.1.1):

import tensorflow as tf
import keras.backend.tensorflow_backend as tfb

POS_WEIGHT = 10  # multiplier for positive targets, needs to be tuned

def weighted_binary_crossentropy(target, output):
    """
    Weighted binary crossentropy between an output tensor 
    and a target tensor. POS_WEIGHT is used as a multiplier 
    for the positive targets.

    Combination of the following functions:
    * keras.losses.binary_crossentropy
    * keras.backend.tensorflow_backend.binary_crossentropy
    * tf.nn.weighted_cross_entropy_with_logits
    """
    # transform back to logits
    _epsilon = tfb._to_tensor(tfb.epsilon(), output.dtype.base_dtype)
    output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
    output = tf.log(output / (1 - output))
    # compute weighted loss
    loss = tf.nn.weighted_cross_entropy_with_logits(targets=target,
                                                    logits=output,
                                                    pos_weight=POS_WEIGHT)
    return tf.reduce_mean(loss, axis=-1)
Then in your model:

model.compile(loss=weighted_binary_crossentropy, ...)


17 How to train a multi-label Classifier



2009年新书,非扫描 Contents List of Figures xiii List of Tables xix Introduction xxi About the Editors xxvii Contributor List xxix 1 Analysis of Text Patterns Using Kernel Methods 1 Marco Turchi, Alessia Mammone, and Nello Cristianini 1.1 Introduction . . . . . . . . . . . . . . . 1 1.2 General Overview on Kernel Methods . . . . . . . 1 1.2.1 Finding Patterns in Feature Space . . . . . . . . . . . 5 1.2.2 Formal Properties of Kernel Functions . . . . . . . . . 8 1.2.3 Operations on Kernel Functions . . . . . . . . . . . . 10 1.3 Kernels for Text . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3.1 Vector SpaceModel . . . . . . . . . . . . . . . . . . . 11 1.3.2 Semantic Kernels . . . . . . . . . . . . . . . . . . . . . 13 1.3.3 String Kernels . . . . . . . . . . . . . . . . . . . . . . 17 1.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.5 Conclusion and Further Reading . . . . . . . . . . . . . . . . 22 2 Detection of Bias in Media Outlets with Statistical Learning Methods 27 Blaz Fortuna, Carolina Galleguillos, and Nello Cristianini 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.2 Overview of the Experiments . . . . . . . . . . . . . . . . . . 29 2.3 Data Collection and Preparation . . . . . . . . . . . . . . . . 30 2.3.1 Article Extraction from HTML Pages . . . . . . . . . 31 2.3.2 Data Preparation . . . . . . . . . . . . . . . . . . . . . 31 2.3.3 Detection of Matching News Items . . . . . . . . . . . 32 2.4 News Outlet Identification . . . . . . . . . . . . . . . . . . . . 35 2.5 Topic-Wise Comparison of Term Bias . . . . . . . . . . . . . 38 2.6 News OutletsMap . . . . . . . . . . . . . . . . . . . . . . . . 40 2.6.1 Distance Based on Lexical Choices . . . . . . . . . . . 42 vii © 2009 by Taylor and Francis Group, LLC viii 2.6.2 Distance Based on Choice of Topics . . . . . . . . . . 43 2.7 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.9 Appendix A: Support Vector Machines . . . . . . . . . . . . . 48 2.10 Appendix B: Bag of Words and Vector Space Models . . . . . 48 2.11 Appendix C: Kernel Canonical Correlation Analysis . . . . . 49 2.12 Appendix D: Multidimensional Scaling . . . . . . . . . . . . . 50 3 Collective Classification for Text Classification 51 Galileo Namata, Prithviraj Sen, Mustafa Bilgic, and Lise Getoor 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.2 Collective Classification: Notation and Problem Definition . . 53 3.3 Approximate Inference Algorithms for Approaches Based on Local Conditional Classifiers . . . . . . . . . . . . . . . . . . . 53 3.3.1 Iterative Classification . . . . . . . . . . . . . . . . . . 54 3.3.2 Gibbs Sampling . . . . . . . . . . . . . . . . . . . . . . 55 3.3.3 Local Classifiers and Further Optimizations . . . . . . 55 3.4 Approximate Inference Algorithms for Approaches Based on Global Formulations . . . . . . . . . . . . . . . . . . . . . . . 56 3.4.1 Loopy Belief Propagation . . . . . . . . . . . . . . . . 58 3.4.2 Relaxation Labeling via Mean-Field Approach . . . . 59 3.5 Learning the Classifiers . . . . . . . . . . . . . . . . . . . . . 60 3.6 Experimental Comparison . . . . . . . . . . . . . . . . . . . . 60 3.6.1 Features Used . . . . . . . . . . . . . . . . . . . . . . . 60 3.6.2 Real-World Datasets . . . . . . . . . . . . . . . . . . . 60 3.6.3 Practical Issues . . . . . . . . . . . . . . . . . . . . . . 63 3.7 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.9 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 66 4 Topic Models 71 David M. Blei and John D. Lafferty 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.2 Latent Dirichlet Allocation . . . . . . . . . . . . . . . . . . . 72 4.2.1 Statistical Assumptions . . . . . . . . . . . . . . . . . 73 4.2.2 Exploring a Corpus with the Posterior Distribution . . 75 4.3 Posterior Inference for LDA . . . . . . . . . . . . . . . . . . . 76 4.3.1 Mean Field Variational Inference . . . . . . . . . . . . 78 4.3.2 Practical Considerations . . . . . . . . . . . . . . . . . 81 4.4 Dynamic Topic Models and Correlated Topic Models . . . . . 82 4.4.1 The Correlated Topic Model . . . . . . . . . . . . . . 82 4.4.2 The Dynamic Topic Model . . . . . . . . . . . . . . . 84 4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 © 2009 by Taylor and Francis Group, LLC ix 5 Nonnegative Matrix and Tensor Factorization for Discussion Tracking 95 Brett W. Bader, Michael W. Berry, and Amy N. Langville 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.1.1 Extracting Discussions . . . . . . . . . . . . . . . . . . 96 5.1.2 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . 96 5.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.3 Tensor Decompositions and Algorithms . . . . . . . . . . . . 98 5.3.1 PARAFAC-ALS . . . . . . . . . . . . . . . . . . . . . 100 5.3.2 Nonnegative Tensor Factorization . . . . . . . . . . . . 100 5.4 Enron Subset . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.4.1 TermWeighting Techniques . . . . . . . . . . . . . . . 103 5.5 Observations and Results . . . . . . . . . . . . . . . . . . . . 105 5.5.1 Nonnegative Tensor Decomposition . . . . . . . . . . . 105 5.5.2 Analysis of Three-Way Tensor . . . . . . . . . . . . . 106 5.5.3 Analysis of Four-Way Tensor . . . . . . . . . . . . . . 108 5.6 Visualizing Results of the NMF Clustering . . . . . . . . . . . 111 5.7 FutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 6 Text Clustering with Mixture of von Mises-Fisher Distributions 121 Arindam Banerjee, Inderjit Dhillon, Joydeep Ghosh, and Suvrit Sra 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 6.2 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 6.3 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.3.1 The von Mises-Fisher (vMF) Distribution . . . . . . . 124 6.3.2 Maximum Likelihood Estimates . . . . . . . . . . . . . 125 6.4 EMon aMixture of vMFs (moVMF) . . . . . . . . . . . . . . 126 6.5 Handling High-Dimensional Text Datasets . . . . . . . . . . . 127 6.5.1 Approximating κ . . . . . . . . . . . . . . . . . . . . . 128 6.5.2 Experimental Study of the Approximation . . . . . . . 130 6.6 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 6.7 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 134 6.7.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . 135 6.7.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . 138 6.7.3 Simulated Datasets . . . . . . . . . . . . . . . . . . . . 138 6.7.4 Classic3 Family of Datasets . . . . . . . . . . . . . . . 140 6.7.5 Yahoo News Dataset . . . . . . . . . . . . . . . . . . . 143 6.7.6 20 Newsgroup Family of Datasets . . . . . . . . . . . . 143 6.7.7 Slashdot Datasets . . . . . . . . . . . . . . . . . . . . 145 6.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 6.9 Conclusions and Future Work . . . . . . . . . . . . . . . . . . 148 © 2009 by Taylor and Francis Group, LLC x 7 Constrained Partitional Clustering of Text Data: An Overview 155 Sugato Basu and Ian Davidson 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 7.2 Uses of Constraints . . . . . . . . . . . . . . . . . . . . . . . . 157 7.2.1 Constraint-Based Methods . . . . . . . . . . . . . . . 157 7.2.2 Distance-BasedMethods . . . . . . . . . . . . . . . . . 158 7.3 Text Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 159 7.3.1 Pre-Processing . . . . . . . . . . . . . . . . . . . . . . 161 7.3.2 DistanceMeasures . . . . . . . . . . . . . . . . . . . . 162 7.4 Partitional Clustering with Constraints . . . . . . . . . . . . 163 7.4.1 COP-KMeans . . . . . . . . . . . . . . . . . . . . . . . 163 7.4.2 Algorithms with Penalties – PKM, CVQE . . . . . . . 164 7.4.3 LCVQE: An Extension to CVQE . . . . . . . . . . . . 167 7.4.4 Probabilistic Penalty – PKM . . . . . . . . . . . . . . 167 7.5 Learning Distance Function with Constraints . . . . . . . . . 168 7.5.1 Generalized Mahalanobis Distance Learning . . . . . . 168 7.5.2 Kernel Distance Functions Using AdaBoost . . . . . . 169 7.6 Satisfying Constraints and Learning Distance Functions . . . 170 7.6.1 Hidden Markov Random Field (HMRF) Model . . . . 170 7.6.2 EMAlgorithm . . . . . . . . . . . . . . . . . . . . . . 173 7.6.3 Improvements to HMRF-KMeans . . . . . . . . . . . 173 7.7 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 7.7.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . 174 7.7.2 Clustering Evaluation . . . . . . . . . . . . . . . . . . 175 7.7.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . 176 7.7.4 Comparison of Distance Functions . . . . . . . . . . . 176 7.7.5 Experimental Results . . . . . . . . . . . . . . . . . . 177 7.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 8 Adaptive Information Filtering 185 Yi Zhang 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 8.2 Standard EvaluationMeasures . . . . . . . . . . . . . . . . . 188 8.3 Standard Retrieval Models and Filtering Approaches . . . . . 190 8.3.1 Existing Retrieval Models . . . . . . . . . . . . . . . . 190 8.3.2 Existing Adaptive Filtering Approaches . . . . . . . . 192 8.4 CollaborativeAdaptive Filtering . . . . . . . . . . . . . . . . 194 8.5 Novelty and Redundancy Detection . . . . . . . . . . . . . . . 196 8.5.1 Set Difference . . . . . . . . . . . . . . . . . . . . . . . 199 8.5.2 Geometric Distance . . . . . . . . . . . . . . . . . . . 199 8.5.3 Distributional Similarity . . . . . . . . . . . . . . . . . 200 8.5.4 Summary of Novelty Detection . . . . . . . . . . . . . 201 8.6 Other Adaptive Filtering Topics . . . . . . . . . . . . . . . . 201 8.6.1 Beyond Bag ofWords . . . . . . . . . . . . . . . . . . 202 © 2009 by Taylor and Francis Group, LLC xi 8.6.2 Using Implicit Feedback . . . . . . . . . . . . . . . . . 202 8.6.3 Exploration and Exploitation Trade Off . . . . . . . . 203 8.6.4 Evaluation beyond Topical Relevance . . . . . . . . . 203 8.7 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 204 9 Utility-Based Information Distillation 213 Yiming Yang and Abhimanyu Lad 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 9.1.1 Related Work in Adaptive Filtering (AF) . . . . . . . 213 9.1.2 Related Work in Topic Detection and Tracking (TDT) 214 9.1.3 Limitations of Current Solutions . . . . . . . . . . . . 215 9.2 A Sample Task . . . . . . . . . . . . . . . . . . . . . . . . . . 216 9.3 Technical Cores . . . . . . . . . . . . . . . . . . . . . . . . . . 218 9.3.1 Adaptive Filtering Component . . . . . . . . . . . . . 218 9.3.2 Passage Retrieval Component . . . . . . . . . . . . . . 219 9.3.3 Novelty Detection Component . . . . . . . . . . . . . 220 9.3.4 Anti-Redundant Ranking Component . . . . . . . . . 220 9.4 EvaluationMethodology . . . . . . . . . . . . . . . . . . . . . 221 9.4.1 Answer Keys . . . . . . . . . . . . . . . . . . . . . . . 221 9.4.2 Evaluating the Utility of a Sequence of Ranked Lists . 223 9.5 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 9.6 Experiments and Results . . . . . . . . . . . . . . . . . . . . . 226 9.6.1 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . 226 9.6.2 Experimental Setup . . . . . . . . . . . . . . . . . . . 226 9.6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 227 9.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . 229 9.8 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 229 10 Text Search-Enhanced with Types and Entities 233 Soumen Chakrabarti, Sujatha Das, Vijay Krishnan, and Kriti Puniyani 10.1 Entity-Aware Search Architecture . . . . . . . . . . . . . . . . 233 10.1.1 Guessing Answer Types . . . . . . . . . . . . . . . . . 234 10.1.2 Scoring Snippets . . . . . . . . . . . . . . . . . . . . . 235 10.1.3 Efficient Indexing and Query Processing . . . . . . . . 236 10.1.4 Comparison with Prior Work . . . . . . . . . . . . . . 236 10.2 Understanding the Question . . . . . . . . . . . . . . . . . . . 236 10.2.1 Answer Type Clues in Questions . . . . . . . . . . . . 239 10.2.2 Sequential Labeling of Type Clue Spans . . . . . . . . 240 10.2.3 From Type Clue Spans to Answer Types . . . . . . . . 245 10.2.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . 247 10.3 Scoring Potential Answer Snippets . . . . . . . . . . . . . . . 251 10.3.1 A ProximityModel . . . . . . . . . . . . . . . . . . . . 253 10.3.2 Learning the Proximity Scoring Function . . . . . . . 255 10.3.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . 257 10.4 Indexing and Query Processing . . . . . . . . . . . . . . . . . 260 © 2009 by Taylor and Francis Group, LLC xii 10.4.1 Probability of a Query Atype . . . . . . . . . . . . . . 262 10.4.2 Pre-Generalize and Post-Filter . . . . . . . . . . . . . 262 10.4.3 Atype Subset Index Space Model . . . . . . . . . . . . 265 10.4.4 Query Time BloatModel . . . . . . . . . . . . . . . . 266 10.4.5 Choosing an Atype Subset . . . . . . . . . . . . . . . . 269 10.4.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . 271 10.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 10.5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 272 10.5.2 Ongoing and Future Work . . . . . . . . . . . . . . . . 273 © 2009
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值