- 博客(33)
- 资源 (1)
- 问答 (1)
- 收藏
- 关注

原创 往期文章集合目录
Logistic Regression, L1, L2 regularization, Gradient/Coordinate descent 详细MLE v.s. MAP ~ L1, L2 Math Derivation 详细XGBoost math Derivation 通俗易懂的详细推导Introduction to Convex Optimization Basic Concept...
2020-04-14 00:44:41
1733
原创 Decoupling Representation and Classifier for Long-Tailed Recognition 图像领域长尾分布分类问题方法
文章目录IntroductionRecent DirectionsSampling StrategiesMethods of Learning ClassifiersClassifier Re-training (cRT)Nearest Class Mean classifier (NCM)τ\tauτ-normalized classifier (τ(\tau(τ-normalized)ExperimentsDatasetsEvaluation ProtocolResultsSampling matter
2021-01-08 11:48:24
1467
原创 Relation Extraction 关系抽取综述
文章目录往期文章链接目录Information Extraction v.s. Relation ExtractionExisting Works of REPattern-based MethodsStatistical Relation Extraction ModelsNeural Relation Extraction MethodsFuture DirectionsUtilizing More DataMethods to Denoise DS DataOpen Problem for Utili
2021-01-03 12:32:42
2099
原创 Connecting the Dots: Document-level Neural Relation Extraction with Edge-oriented Graphs 关系抽取论文总结
文章目录往期文章目录链接Relation Extraction (RE)document-level REIntuitionContributionOverview of Proposed ModelProposed ModelSentence Encoding LayerGraph construction LayerNode ConstructionEdge ConstructionInference LayerFirst StepSecond StepClassification LayerResul
2020-12-31 08:47:50
1231
原创 跨语言学习归纳总结 Cross-Lingual Learning paper summary
往期文章链接目录文章目录往期文章链接目录Cross-lingual learningCross-lingual resourcesMultilingual distributional representationsEvaluation of multilingual distributional representationsParallel corpusWord AlignmentsMachine TranslationUniversal features (out of fashion)Biling
2020-12-28 14:14:15
2152
原创 BERT and RoBERTa 知识点整理
往期文章链接目录文章目录往期文章链接目录BERT RecapOverviewBERT SpecificsThere are two steps to the BERT framework: pre-training and fine-tuningInput Output RepresentationsTasksresultsAblation studiesEffect of Pre-training TasksEffect of Model SizesReplication study of BERT p
2020-09-18 12:11:09
1954
1
原创 What Does BERT Look At? An Analysis of BERT’s Attention 论文总结
文章目录往期文章链接目录Before we startSurface-Level Patterns in AttentionProbing Individual Attention HeadsProbing Attention Head CombinationsClustering Attention Heads往期文章链接目录往期文章链接目录Before we startIn this post, I mainly focus on the conclusions the authors reach
2020-09-14 09:38:27
2292
原创 The More You Know: Using Knowledge Graphs for Image Classification 论文总结
文章目录往期文章链接目录OverviewIntuitionPrevious WorkMajor ContributionGraph Search Neural Network (GSNN)GSNN ExplanationThree networksDiagram visualizationAdvantageIncorporate the graph network into an image pipelineDatasetConclusion往期文章链接目录往期文章链接目录OverviewThis p
2020-09-02 08:54:10
2869
2
原创 Graph Convolutional Neural Network - Spectral Convolution 图卷积神经网络 — 频域卷积详解
Fourier TransformVirtually everything in the world can be described via a waveform - a function of time, space or some other variable. For instance, sound waves, the price of a stock, etc. The Fourier Transform gives us a unique and powerful way of viewin
2020-08-24 08:49:54
2793
4
原创 Graph Convolutional Neural Network - Spatial Convolution 图卷积神经网络 — 空域卷积详解
Convolutional graph neural networks (ConvGNNs)Convolutional graph neural networks (ConvGNNs) generalize the operation of convolution from grid data to graph data. The main idea is to generate a node vvv’s representation byaggregating its own features xv\
2020-08-20 08:40:58
5665
1
原创 Introduction to Graph Neural Network (GNN) 图神经网络入门详解
文章目录往期文章链接目录NoteBackground and IntuitionIntro to Graph Neural NetworksGNNs FrameworkDefinitionRecurrent graph neural networks (RecGNNs)Introduction to RecGNNsBanach's Fixed Point TheoremRecGNNs v.s. RNNsLimitation of RecGNNsGated Graph Neural Networks (GGN
2020-08-17 08:37:40
3795
原创 Kaggle: Jigsaw Multilingual Toxic Comment Classification Top Solutions 金牌思路总结
Before we startTwo of my previous post might be helpful in getting a general understanding of the top solutions of this competition. Please feel free to check them out.Knowledge Distillation clearly explainedCommon Multilingual Language Modeling method
2020-08-11 08:45:49
2552
原创 常见多语言模型详解 (M-Bert, LASER, MultiFiT, XLM)
文章目录往期文章链接目录Ways of tokenizationWord-based tokenizationCharacter-based tokenizationSubword tokenizationExisting approaches for cross-lingual NLPOut-of-vocabulary (OOV) problem in mono/multi-lingual settingsM-BERT (Multi-lingual BERT)WHY MULTILINGUAL BERT W
2020-08-08 07:45:48
7815
原创 Knowledge Distillation 知识蒸馏详解
文章目录往期文章链接目录Shortcoming of normal neural networksGeneralization of InformationKnowledge DistillationA few DefinitionsGeneral idea of knowledge distillationTeacher and StudentTemperature & EntropyTraining the Distil Model往期文章链接目录往期文章链接目录Currently, esp
2020-08-05 06:59:54
2716
原创 Kaggle: Tweet Sentiment Extraction 方法总结 Part 2/2: 金牌思路总结
Before we startI attended two NLP competition in June, Tweet Sentiment Extraction and Jigsaw Multilingual Toxic Comment Classification, and I’m happy to be a Kaggle Expert from now on ????Tweet Sentiment ExtractionGoal:The objective in this competitio
2020-07-01 12:07:22
1233
9
原创 Kaggle: Tweet Sentiment Extraction 方法总结 Part 1/2: 常用方法总结
文章目录往期文章目录链接NoteBefore we startTweet Sentiment ExtractionWhat is the MAGIC?Common MethodsLabel SmoothingImplementation of Label SmoothingIn tensorflowIn pytorchMulti-sample dropoutImplementationStochastic Weight Averaging (SWA)Different learning rate setti
2020-07-01 12:06:29
2220
2
原创 RNN, LSTM 图文详解
文章目录往期文章链接目录Sequence DataWhy not use a standard neural network for sequence tasksRNNDifferent types of RNNsLoss function of RNNBackpropagation through timeVanishing gradients with RNNsAdvantages and Drawbacks of RNNLSTMTypes of gatesformulas and illustrati
2020-06-04 11:44:02
1628
原创 Intro to Deep Learning & Backpropagation 深度学习模型介绍及反向传播算法推导详解
文章目录Deep Neural Network往期文章链接目录Forward PropagationLoss functions of neural networkBack-propagationcompute ∂ℓ∂f(x)\frac{\partial \ell}{\partial f(x)}∂f(x)∂ℓcompute ∂ℓ∂a(L+1)(x)\frac{\partial \ell}{\partial a^{(L+1)}(x)}∂a(L+1)(x)∂ℓcompute ∂ℓ∂h(k)(x)\frac
2020-05-26 04:29:34
659
原创 Log-Linear Model & CRF 条件随机场详解
文章目录往期文章链接目录Log-Linear modelConditional Random Fields (CRF)Formal definition of CRFLog-linear model to linear-CRFInference problem for CRFLearning problem for CRFLearning problem for general Log-Linear modelLearning problem for CRFCompute Z(xˉ,w)Z(\bar x,
2020-05-19 13:15:11
855
原创 GMM & K-means 高斯混合模型和K-means聚类详解
往期文章链接目录文章目录往期文章链接目录Gaussian mixture model (GMM)Interpretation from geometryInterpretation from mixture modelGMM Derivationset upSolve by MLESolve by EM AlgorithmK-means往期文章链接目录Gaussian mixture model (GMM)A Gaussian mixture model is a probabilistic mode
2020-05-16 08:18:16
1302
原创 Probabilistic Graphical Model (PGM) 概率图模型框架详解
往期文章链接目录Probabilistic Graphical Model (PGM)Definition: A probabilistic graphical model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables.In general, PGM obeys following rules:Sum Rul
2020-05-11 02:50:37
3147
2
原创 Hidden Markov Model (HMM) 详细推导及思路分析
往期文章链接目录Before reading this post, you should be familiar with the EM Algorithm and decent among of knowledge of convex optimization. If not, check out my previous postEM Algorithmconvex optimiz...
2020-05-03 03:32:13
1678
1
原创 EM (Expectation–Maximization) Algorithm 思路分析及推导
往期文章链接目录Jensen’s inequalityTheorem: Let fff be a convex function, and let XXX be a random variable. Then:E[f(X)]≥f(E[X])E[f(X)] \geq f(E[X])E[f(X)]≥f(E[X])\quad Moreover, if fff is strictly con...
2020-04-24 05:38:18
1201
3
原创 干货: Skip-gram 详细推导加分析
往期文章链接目录Comparison between CBOW and Skip-gramThe major difference is that skip-gram is better for infrequent words than CBOW in word2vec. For simplicity, suppose there is a sentence “w1w2w3w4w_1w_2...
2020-04-17 11:57:36
2229
原创 Distributed representation, Hyperbolic Space, Gaussian/Graph Embedding 详细介绍
往期文章链接汇总Overview of various word representation and Embedding methodsLocal Representation v.s. Distributed RepresentationOne-hot encoding is local representation and is good for local generalizati...
2020-04-17 11:43:54
1380
原创 NLP基础概览 + Spell Correction with Noisy Channel
NLP = NLU + NLGNLU: Natural Language UnderstandingNLG: Natural Language GenerationNLG may be viewed as the opposite of NLU: whereas in NLU, the system needs to disambiguate the input sentence to ...
2020-04-10 12:32:07
1700
原创 SVM/ Dual SVM math derivation, non-linear SVM, kernel function详细
Linear SVMIdea:We want to find a hyper-plane w⊤x+b=0w^\top x + b = 0w⊤x+b=0 that maximizes the margin.Set up:We first show that the vector www is orthogonal to this hyper-plane. Let x1x_1x1, x2x...
2020-03-28 00:51:02
1383
原创 Convex Optimization: Primal Problem to Dual problem clearly explained 详细
Consider an optimization problem in the standard form (we call this a primal problem):We denote the optimal value of this as p⋆p^\starp⋆. We don’t assume the problem is convex.The Lagrange dual fun...
2020-03-27 23:57:28
1636
原创 Introduction to Convex Optimization Basic Concepts 详细
Optimization problemAll optimization problems can be written as:Optimization Categoriesconvex v.s. non-convexDeep Neural Network is non-convexcontinuous v.s.discreteMost are continuous vari...
2020-03-27 23:15:27
558
原创 XGBoost math Derivation 通俗易懂的详细推导
Bagging v.s. Boosting:Bagging:Leverages unstable base learners that are weak because of overfitting.Boosting:Leverages stable base learners that are weak because of underfitting.XGBoostLearning ...
2020-03-27 14:19:53
701
原创 MLE, MAP 对比及 MAP 转换到 L1, L2 norm 的 Math Derivation 详细
MLE v.s. MAPMLE: learn parameters from data.MAP: add a prior (experience) into the model; more reliable if data is limited. As we have more and more data, the prior becomes less useful.As data inc...
2020-03-27 13:14:04
508
原创 Logistic Regression, L1, L2 regularization, Gradient/Coordinate descent 详细
Generative model v.s. Discriminative model:Examples:Generative model: Naive Bayes, HMM, VAE, GAN.Discriminative model:Logistic Regression, CRF.Obejective function:Generative model: max p (x,y...
2020-03-27 12:39:18
983
node2vec.pdf
2020-05-05
XGBoost 的并行化是怎么实现的
2020-04-05
TA创建的收藏夹 TA关注的收藏夹
TA关注的人