python 实现SOM: 函数更新

最新推荐文章于 2025-10-27 10:34:41 发布

原创

最新推荐文章于 2025-10-27 10:34:41 发布 · 3.5k 阅读

28 ·

CC 4.0 BY-SA版权

本文详细介绍了Python中MiniSom库实现的自组织映射（SOM）网络，包括类对象初始化、权重初始化方法、训练函数、激活响应分析和赢者图等功能。此外，还提供了PCA初始化权重、随机初始化权重和标签映射等实用功能，适用于数据降维和分类任务。

因作业要求，我在之前的代码（python 实现SOM：代码注释与应用示例）上做了一点修改（增加了几个简单的函数与功能），这里同时对各个函数的功能做一个列表提示。

先创建一个MiniSom的类对象：

som = MiniSom(x,y,input_len,sigma=1.0,learning_rate=0.5,neighborhood_function='gaussian')
"""
参数说明（这里只列出了几个常用的参数）：
x: 竞争层x维度，int
y: 竞争层y维度，int
input_len:输入层的向量维度，int
sigma: 获胜邻域的初始半径（参数）
neighborhood_function： 获胜邻域的计算方式，常用的包括“gaussian”，"bubble"，"mexican_hat"，'triangle'，"winner_take_all"。其中"winner_take_all"模式是我新加的，等价于"bubble"下sigma=1的情况。

"""

常用的函数功能介绍如下：

初始化函数：

不使用以下初始化函数时，网络的自带初始化算法，把权重初始化为[-1,1]之间的小数，

som.random_weights_init(data):从数据集中随机采样初始化网络权重，

som.random_weights_init_random01():（新加）用（0，1）之间的小数初始化权重，

som.pca_weights_init(data):用数据集计算PCA提取主分量初始化网络权重，有时更利于收敛

网络训练函数：

som.train(data, num_iteration, random_order=False, verbose=False):用数据集训练网络。
    random_order=True : 打乱data中的顺序输入som
    random_order=False : 按data中的样本顺序输入som

som.train_random(data, num_iteration,verbose=False):等价于som.train(data, num_iteration, random_order=True, verbose=False)

train_batch(self, data, num_iteration, verbose=False):等价于som.train(data, num_iteration, random_order=False, verbose=False)

网络应用分析函数：

som.get_weights():返回一个张量表示所有神经元的权重，shape=[x,y,input_len],weight[i,j,:]表示网络的第（i,j）位置处的神经元的权重矢量。

som.winner(x): 返回一个坐标，样本x激活的竞争层神经元的坐标。

som.activation_response(data): 返回一个矩阵，统计每个神经元被激活的次数

som.win_map(data, return_indices=False):返回一个字典，反映每个神经元收集到的样本。字典的索引是竞争神经元的坐标，字典的内容是该神经元收集到的数据集样本，当return_indices=True时字典的内容是该神经元收集到的样本在data中的索引。

som.labels_map(data, labels):返回一个双层字典，反映每个神经元收集到的标签种类，及每个标签下的样本个数。

som.get_data_winner_map(data):（新加）返回一个矩阵，反映data中每个样本的神经元激活结果，是一个01矩阵，适用于“winner_take_all”模式。可用于CPN网络。

som.activate(x):返回一个矩阵，描述输入样本x到所有神经元的距离。

源码如下：

from math import sqrt
import numpy as np
from numpy import (array, unravel_index, nditer, linalg, random, subtract, max,
                   power, exp, pi, zeros, ones, arange, outer, meshgrid, dot,
                   logical_and, mean, std, cov, argsort, linspace, transpose,
                   einsum, prod, nan, sqrt, hstack, diff, argmin, multiply)
from numpy import sum as npsum
from numpy.linalg import norm
from collections import defaultdict, Counter
from warnings import warn
from sys import stdout
from time import time
from datetime import timedelta
import pickle
import os

# for unit tests
from numpy.testing import assert_almost_equal, assert_array_almost_equal
from numpy.testing import assert_array_equal
import unittest

"""
    Minimalistic implementation of the Self Organizing Maps (SOM).
"""


def _build_iteration_indexes(data_len, num_iterations,
                             verbose=False, random_generator=None):
    """Returns an iterable with the indexes of the samples
    to pick at each iteration of the training.
    If random_generator is not None, it must be an instance
    of numpy.random.RandomState and it will be used
    to randomize the order of the samples."""
    iterations = arange(num_iterations) % data_len
    if random_generator:
        random_generator.shuffle(iterations)
    if verbose:
        return _wrap_index__in_verbose(iterations)
    else:
        return iterations


def _wrap_index__in_verbose(iterations):
    """Yields the values in iterations printing the status on the stdout."""
    m = len(iterations)
    digits = len(str(m))
    progress = '\r [ {s:{d}} / {m} ] {s:3.0f}% - ? it/s'
    progress = progress.format(m=m, d=digits, s=0)
    stdout.write(progress)
    beginning = time()
    stdout.write(progress)
    for i, it in enumerate(iterations):
        yield it
        sec_left = ((m-i+1) * (time() - beginning)) / (i+1)
        time_left = str(timedelta(seconds=sec_left))[:7]
        progress = '\r [ {i:{d}} / {m} ]'.format(i=i+1, d=digits, m=m)
        progress += ' {p:3.0f}%'.format(p=100*(i+1)/m)
        progress += ' - {time_left} left '.format(time_left=time_left)
        stdout.write(progress)


def fast_norm(x):
    """
    快速计算向量的二范数，速度比linalg.norm快。
    Returns norm-2 of a 1-D numpy array.
    * faster than linalg.norm in case of 1-D arrays (numpy 1.9.2rc1).
    """
    return sqrt(dot(x, x.T))


def asymptotic_decay(learning_rate, t, max_iter):
    """Decay function of the learning process.
    Parameters
    ----------
    learning_rate : float
        current learning rate.
    t : int
        current iteration.
    max_iter : int
        maximum number of iterations for the training.
    """
    return learning_rate / (1+t/(max_iter/2))


class MiniSom(object):
    def __init__(self, x, y, input_len, sigma=1.0, learning_rate=0.5,
                 decay_function=asymptotic_decay,
                 neighborhood_function='gaussian', topology='rectangular',
                 activation_distance='euclidean', random_seed=None):
        """Initializes a Self Organizing Maps.
        A rule of thumb to set the size of the grid for a dimensionality
        reduction task is that it should contain 5*sqrt(N) neurons
        where N is the number of samples in the dataset to analyze.
        E.g. if your dataset has 150 samples, 5*sqrt(150) = 61.23
        hence a map 8-by-8 should perform well.
        Parameters
        ----------
        x : int
            x dimension of the SOM.
        y : int
            y dimension of the SOM.
        input_len : int
            Number of the elements of the vectors in input.
        sigma : float, optional (default=1.0)
            Spread of the neighborhood function, needs to be adequate
            to the dimensions of the map.
            (at the iteration t we have sigma(t) = sigma / (1 + t/T)
            where T is #num_iteration/2)
        learning_rate : initial learning rate
            (at the iteration t we have
            learning_rate(t) = learning_rate / (1 + t/T)
            where T is #num_iteration/2)
        decay_function : function (default=None)
            Function that reduces learning_rate and sigma at each iteration
            the default function is:
                        learning_rate / (1+t/(max_iterarations/2))
            A custom decay function will need to to take in input
            three parameters in the following order:
            1. learning rate
            2. current iteration
            3. maximum number of iterations allowed
            Note that if a lambda function is used to define the decay
            MiniSom will not be pickable anymore.
        neighborhood_function : string, optional (default='gaussian')
            Function that weights the neighborhood of a position in the map.
            Possible values: 'gaussian', 'mexican_hat', 'bubble', 'triangle'
        topology : string, optional (default='rectangular')
            Topology of the