Python for the practicing neuroscientist_origin 置信区间绘制-优快云博客

You may notice that we specified index zero when we imported the variable t. This is because t is a vector (one-dimensional) rather than an array (multi-dimensional).

显示EEG整体

imshow(EEGa,                                   # Image the data from condition A.
           cmap='BuPu',                            # ... set the colormap (optional)
           extent=[t[0], t[-1], 1, ntrials],       # ... set axis limits (t[-1] represents the last element of t)
           aspect='auto',                          # ... set aspect ratio 
           origin='lower')                         # ... put origin in lower left corner
xlabel('Time[s]')                              # Label the axes
ylabel('Trial #')
colorbar()                                     # Show voltage to color mapping
vlines(0.25, 1, 1000, 'k', lw=2)               # Indicate stimulus onset with line
savefig('imgs/2-4')
show()

在这里插入图片描述
2. 中心极限定理
95% of the values drawn from a normal distribution lie within approximately two standard deviations of the mean
mn = EEGa.mean(0)
sd = EEGa.std(0)
sdmn = sd / sqrt(ntrials)

定义好画图参数

# Create a function to label plots

def labelPlot(title_string="Title"):
    '''
    A function that labels the x-axis as 'Time [s]' and
    the y-axis as 'Voltage [$\mu V$]'. 
    Arguments:
        title_string:  string variable to be used as
                       the plot title (default: 'Title')
                       
    '''
    xlabel('Time [s]')           # x-axis is time
    ylabel('Voltage [$/mu V$]')  # y-axis is voltage
    title(title_string)          # use the input here
    autoscale(tight=True)        # no white-space in plot

调用：

labelPlot('ERP of conditions A and B')

比较两组数据之间的差异，其中的方差需要按照新公式计算
To facilitate further inspection of the data, we compute the difference between the ERPs in the two conditions. In the differenced signal, large deviations between the two conditions will appear as large differences from zero. To determine whether a deviation is significantly different from zero, we need to determine the confidence interval for the differenced ERP. This requires we propagate the standard deviation of the mean for both ERPs to the new differenced ERP. The propagated standard deviation of the mean at a fixed moment in time is computed as:

𝜎=𝜎2𝐴𝐾+𝜎2𝐵𝐾⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯√

where 𝜎𝐴 is the standard deviation of the data from condition A, 𝜎𝐵 is the standard deviation of the data from condition B, and 𝐾 is the number of trials.

In Python we compute the differenced ERP and standard deviation of the mean of the difference as follows:

mnA = EEGa.mean(0)  # ERP of condition A
sdmnA = EEGa.std(0) / sqrt(ntrials)    # ... and standard dev of mean

mnB = EEGb.mean(0)  # ERP of condition B
sdmnB = EEGb.std(0) / sqrt(ntrials)    # ... and standard dev of mean

mnD = mnA - mnB     # Differenced ERP
sdmnD = sqrt(sdmnA ** 2 + sdmnB ** 2)  # ... and its standard dev

plot(t, mnD, 'k', lw=3)         # plot the differenced ERP
plot(t, mnD + 2 * sdmnD, 'k:')  # ... the upper CI
plot(t, mnD - 2 * sdmnD, 'k:')  # ... and the lower CI
plot([0, 1], [0, 0], 'b')       # ... and a horizontal line at 0
labelPlot('Differenced ERP')    # label the plot
show()

分别绘制两者在一张图片上，
在这里插入图片描述
绘制两者的差值在一张图片上：

利用boostrap，计算5%和95%的点，从而绘制置信区间。
step 1. Sample with replacement 1,000 trials of the EEG data from condition A.
step 2. Average these 1,000 trials to create a resampled ERP.
step 3. Repeat these two steps 3,000 times to create a distribution of ERPs.
step 4. For each time point, identify the values greater than 2.5% and less than 97.5% of all 3,000 values. This range determines the 95% confidence interval for the ERP for that time point.

step 1:
# Draw 1000 integers with replacement from [0, 1000)
i = randint(0, ntrials, size=ntrials)
EEG0 = EEGa[i]  # Create the resampled EEG.

step 2:
ERP0 = EEG0.mean(0)  # Create the resampled ERP

step 3:
i = randint(ntrials, size=ntrials); # Draw integers,
EEG1 = EEGa[i];                     # ... create resampled EEG,
ERP1 = EEG1.mean(0);                # ... create resampled ERP.

i = randint(ntrials, size=ntrials); # Draw integers,
EEG2 = EEGa[i];                     # ... create resampled EEG,
ERP2 = EEG2.mean(0);                # ... create resampled ERP.

i = randint(ntrials, size=ntrials); # Draw integers,
EEG3 = EEGa[i];                     # ... create resampled EEG,
ERP3 = EEG3.mean(0);                # ... create resampled ERP.

def bootstrapERP(EEGdata, size=None):  # Steps 1-2
    """ Calculate bootstrap ERP from data (array type)"""
    ntrials = len(EEGdata)             # Get the number of trials
    if size == None:                   # Unless the size is specified,
        size = ntrials                 # ... choose ntrials
    i = randint(ntrials, size=size)    # ... draw random trials,
    EEG0 = EEGdata[i]                  # ... create resampled EEG,
    return EEG0.mean(0)                # ... return resampled ERP.
                                       # Step 3: Repeat 3000 times 
ERP0 = [bootstrapERP(EEGa) for _ in range(3000)]
ERP0 = array(ERP0)                     # ... and convert the result to an array

···

In Python it is common to see for-loops written in the form

`y = [f(x) for x in some_set]`
This will return a list datatype, which is why we had to convert it to an array in the code above. We could also have written the loop in an alternative way:

`ERP0 = zeros((3000, EEGa.shape[1]))`

`for k in range(3000):`

`    ERP0[k, :] = bootstrapERP()`

step 4:
 # With the resampled values sorted in this way, we then find the resampled ERP value at index  0.025×3000=75  and  0.975×3000=2925 
ERP0.sort(axis=0)         # Sort each column of the resampled ERP
N = len(ERP0)             # Define the number of samples
ciL = ERP0[int(0.025*N)]  # Determine the lower CI
ciU = ERP0[int(0.975*N)]  # ... and the upper CI
mnA = EEGa.mean(0)        # Determine the ERP for condition A
plot(t, mnA, 'k', lw=3)   # ... and plot it
plot(t, ciL, 'k:')        # ... and plot the lower CI
plot(t, ciU, 'k:')        # ... and the upper CI
hlines(0, 0, 1, 'b')      # plot a horizontal line at 0
                          # ... and label the axes
labelPlot('ERP of condition A with bootstrap confidence intervals')  # We define this function above!

在这里插入图片描述上图只是绘制了EEGa的均值和置信区间图，接下来绘制EEGa与EEGb之间差异的图（用boostrap方法）：
step 1: Merge the 1,000 trials each of EEG data from conditions A and B to form a combined distribution of 2,000 trials.
step 2: Sample with replacement 1,000 trials of EEG data from the combined distribution, and compute the resampled ERP.
step 3: Repeat step 2 and compute a second resampled ERP.
step 4: Compute the statistic, the maximum absolute value of the difference between the two resampled ERPs.
step 5: Repeat steps 2-4, 3,000 times to create a distribution of statistic values.
step 6: Compare the observed statistic to this distribution of statistic values. If the observed statistic is greater than 95% of the bootstrapped values, then reject the null hypothesis that the two conditions are the same.

step 1:
mbA = mean(EEGa,0)          # Determine ERP for condition A
mnB = mean(EEGb,0)          # Determine ERP for condition B
mnD = mnA - mnB             # Compute the differenced ERP
stat = max(abs(mnD))        # Compute the statistic
print('stat = {:.4f}'.format(stat))

step 2"
EEG = vstack((EEGa, EEGb))                # Step 1. Merge EEG data from all trials
seed(123)                                 # For reproducibility

def bootstrapStat(EEG):                   # Steps 2-4.
    mnA = bootstrapERP(EEG, size=ntrials) # Create resampled ERPa. The function 'bootstrapERP' is defined above!
    mnB = bootstrapERP(EEG, size=ntrials) # Create resampled ERPb
    mnD = mnA - mnB                       # Compute differenced ERP
    return max(abs(mnD))                  # Return the statistic
                                          # Resample 3,000 times
statD = [bootstrapStat(EEG) for _ in range(3000)]

在这里插入图片描述
结果显示，之前分别计算EEGa和EEGb的最大差值为0.292，这值出现的概率很低，所以，EEGa和EEGb是不同的。

03.ipynb
自相关
if we know the data tends to oscillate near 60 Hz, then given the value of the EEG data now, we can accurately predict the value of the EEG data 1/60 s in the future (i.e., one cycle of the 60 Hz activity); it should be similar. One technique to assess the dependent structure in the data is the autocovariance.
在这里插入图片描述
In words, the autocovariance multiplies the data 𝑥 at index 𝑛+𝐿 , by the data 𝑥 at index 𝑛 , and sums these products over all indices 𝑛 . Notice that, in both terms, the mean value 𝑥¯ is subtracted from 𝑥 before computing the product, and we divide the resulting sum by the total number of data points in 𝑥 .
在这里插入图片描述
Physically, the decrease in autocovariance with lag is consistent with the notion that data becomes less similar as time progresses

We may therefore think of the Fourier transform as comparing the data 𝑥 to the sinusoids oscillating at frequency 𝑓𝑗 . When the data and sinusoid at frequency 𝑓𝑗 align the summation in the Fourier transform is large and the result 𝑋𝑗 is a large number. When the data and sinusoid at frequency 𝑓𝑗 do not align, the summation in the Fourier transform is small and 𝑋𝑗 is a tiny number.

Relation of the Spectrum to Multiple Linear Regression

The idea of linear regression is to express a response variable at time 𝑛 (call it 𝑥𝑛 ) in terms of predictor variables (call them 𝑧1𝑛,𝑧2𝑛,…,𝑧𝑝𝑛 for 𝑝 predictor variables) as
在这里插入图片描述
以60Hz为例，来计算

from statsmodels import *
from panda import DataFrame as df

predictors = df(data={            # Create a dataframe with the predictors
    'sin': sin(2 * pi * 60 * t),  # ... including the sine function
    'cos': cos(2 * pi * 60 * t),  # ... and the cosine function
    'EEG': EEG
})
                                  # Fit the model to the data
model = smf.ols('EEG ~ sin + cos', data=predictors).fit()
print(model.params)

EEG_60Hz_modeled = model.predict()    # Get the model prediction
# Remove the model prediction from the EEG data
EEG_cleaned = EEG - EEG_60Hz_modeled
plot(t, EEG_cleaned)  # ... and plot the result
xlabel('Time [s]')
ylabel('EEG Cleaned [$\mu$V]')
show()

在这里插入图片描述
未去60Hz干扰前的信号：

在这里插入图片描述

The Power Spectrum (Part 2)

Spectral Analysis: The Rectangular Taper and Zero Padding

在这里插入图片描述

Taper选择的影响

在这里插入图片描述
The spectrum consists of features at many frequencies to represent the rapid increase and decrease of the rectangular taper (i.e., the rapid transition from 0 to 1, and then from 1 to 0). The intuition for this is that many sinusoids, aligned in a particular way, are required to represent a sharp transition in a time series. As an example, consider the following square wave function:
在这里插入图片描述

Zero padding.

An interesting issue to consider is how appending zeros impacts the spectrum of the rectangular taper.
We know that increasing the signal length (𝑇) improves the frequency resolution。 We therefore expect that adding more points to the signal (even noninformative points, such as zeros) will increase the number of points along the frequency axis. However, appending zeros to a time series is not equivalent to observing more data (and thereby increasing 𝑇 ). By appending zeros, we of course do not gain any additional information about the signal. Therefore appending zeros to a signal cannot improve the frequency resolution. Instead, the impact of appending zeros is to increase the number of points along the frequency axis in the spectrum. This can be useful in visualizing the spectrum; for example, by appending more and more zeros to the rectangular taper, we produce a less jagged spectrum.

Beyond the Rectangular Taper—The Hanning Taper

在这里插入图片描述

Beyond the Hanning Taper—The Multitaper Method

在这里插入图片描述

Confidence Intervals of the Spectrum

pmtm() function
在这里插入图片描述

05 Autocovariance and Cross-covariance

在这里插入图片描述

We conclude that the coherence ( 𝜅𝑥𝑦,𝑗 ) is a scaled version of the frequency domain representation of the statistical model coefficients ( 𝛾𝑗 ) for predicting 𝑦 from 𝑥 .

filter

Do not use the naive rectangular filter or the Hanning filter on your data.

The Finite Impulse Response (FIR) Filter: An Example.

The most common category of filter applied in neuroscience applications is the finite impulse response (FIR) filter. The name for this approach is actually quite informative. In the previous section, we defined the impulse response; it represents the response of the filter to a signal composed of only a single impulse. “Finite impulse response” indicates that the impulse response consists of only a finite number of nonzero terms.

We focus specifically on the application of a lowpass FIR filter to the EEG data. We use the firwin() command in from the SciPy Signal module to design this filter and then apply it using the convolution function we defined earlier.
在这里插入图片描述
To summarize, the design and application of a lowpass FIR filter can be performed in two simple steps:
b = firwin(n, Wn) # Design the lowpass filter,
xnew_lfilt = lfilter(b, 1, x) # … and apply it to the signal x

data = loadmat('matfiles/EEG-1.mat')    # Load the EEG data.
eeg = data['EEG']
t = data['t'][0]

dt = t[1] - t[0]               # Define the sampling interval.
fNQ = 1 / dt / 2               # Determine the Nyquist frequency.
K = len(eeg)                   # Determine no. of trials.

n = 100                        # Define the filter order
Wn = 30 / fNQ                  # ... and specify the cutoff frequency,
b = firwin(n, Wn)              # ... build lowpass filter.
eeg_lo = array([filtfilt(b, 1, eeg[k]) for k in range(K)])  # Zero-phase filter each trial
mn = eeg_lo.mean(0)           # Compute mean of filtered EEG across trials (ERP)
sd = eeg_lo.std(0)            # Compute std of filtered EEG data across trials.
sdmn = sd / sqrt(K);          # Compute the std of the mean.

plot(t, mn)                   # Plot the ERP of the filtered data
plot(t, mn + 2 * sdmn, 'r:'); # ... and the confidence intervals,
plot(t, mn - 2 * sdmn, 'r:');
xlabel('Time [s]')            # ... and label the axes.
ylabel('Voltage [ mV]')
title('Evoked response')
savefig('imgs/6-14a')
show()

在这里插入图片描述

def spectrum(x, t, hann=True):
    '''
    Define a function that computes the power spectrum 
    for any given trial data (x) and corresponding sample time (t).
    '''
    dt = t[1] - t[0]
    T = t[-1]
    xh = hanning(len(x)) * (x - x.mean()) if hann else x - x.mean()
    xf = rfft(xh)[:-1]
    Sxx = 2 * dt ** 2 / T * (xf * xf.conj())
    return Sxx.real

N = len(t)                             # Define the number of time points per trial.
faxis = fftfreq(N, dt)[:N // 2]        # Define the positive frequency axis,
Sxx = [spectrum(trial, t, False) for trial in eeg_lo]  # Compute the spectrum for each trial.
Sxxmn = array(Sxx).mean(0)             # Convert the result into an array and compute the mean.
semilogx(faxis, 10 * log10(Sxxmn))     # Plot the result in decibels vs frequency,
xlim([df, 100])                        # ... in limited frequency range,
ylim([-60, 0])                         # ... and a limited power range,
xlabel('Frequency [Hz]')               # ... with axes labeled.
ylabel('Power [dB]')
title('Trial averaged spectrum')
savefig('imgs/6-14b')
show()

在这里插入图片描述

Phase-Amplitude Coupling (PAC)

Cross Frequent coupling (CFC)中的一种
To assess whether different frequency rhythms interact in the LFP recording, we implement a measure to calculate the CFC. The idea of CFC analysis, in our case, is to determine whether a relation exists between the phase of a low-frequency signal and the envelope or amplitude of a high-frequency signal. In general, computing CFC involves three steps. Each step contains important questions and encompasses entire fields of study. Our goal in this section is to move quickly forward and produce a procedure we can employ, investigate, and criticize. Continued study of CFC - and the associated nuances of each step - is an active area of ongoing research.

CFC Analysis Steps
1、Filter the data into high- and low-frequency bands.

2、Extract the amplitude and phase from the filtered signals.

3、Determine if the phase and amplitude are related.

Spiking

def autocorr(x, lags):
    xcorr = correlate(x - x.mean(), x - x.mean(), 'full')  # Compute the autocorrelation
    xcorr = xcorr[xcorr.size//2:] / xcorr.max()               # Convert to correlation coefficients
    return xcorr[:lags+1]                                     # Return only requested lags

time_bins = arange(0, 30, 0.001)                    # Define the time bins
IncrementsLow1, _ = histogram(SpikesLow, time_bins) # ... compute the histogram to create increment process
ACFLow = autocorr(IncrementsLow1, 100)                 # ... and the autocorrelation

plot(ACFLow, '.')                  # Plot autocorrelation vs lags,
N1 = len(IncrementsLow1)           # ... compute the sample size
sig = 2 / sqrt(N1)              # ... and the significance level
plot([0, 100], [sig, sig], 'r:')   # ... and plot the upper and lower significance lines
plot([0, 100], [-sig, -sig], 'r:')
xlim([0, 100])                 # ... set x-limits
ylim([-.1, .1])                # ... and y-limits
xlabel('Time [ms]')                # ... with axes labeled
ylabel('Autocorrelation')
show()

在这里插入图片描述

IncrementsHigh1, _ = histogram(SpikesHigh, time_bins) # Compute the histogram to create increment process
ACFHigh = autocorr(IncrementsHigh1, 100)                 # ... and the autocorrelation.
plot(ACFHigh, '.')                                       # Plot the autocorrelation,
sig = 2 / sqrt(len(IncrementsHigh1))                  # ... compute and plot the significance level,
plot([0, 100], [sig, sig], 'r:')                               
plot([0, 100], [-sig, -sig], 'r:')
xlim([0, 100])                                       # ... and set the plot limits,
ylim([-.1, .1])
xlabel('Time [ms]')                                      # ... with axes labeled.
ylabel('Autocorrelation')
show()

在这里插入图片描述

N2 = len(IncrementsHigh1)
ACFDiff = ACFHigh - ACFLow                    # Compute differences of autocorrelations
plot(ACFDiff, '.')                            # ... and plot them
sd = sqrt(1/N1+1/N2)                       # ... with significance bounds.
plot([0, 100], [2 * sd * x for x in [1, 1]], 'r:')
plot([0, 100], [-2 * sd * x for x in [1, 1]], 'r:')
xlim([0, 100])                            # Set the plot limits and label the axes.
ylim([-.1, .1])
xlabel('Time [ms]')
ylabel('Autocorrelation difference')
show()

在这里插入图片描述

GLMs

spikeindex = where(spiketrain!=0)[0]     # Get the spike indices.
plot(t, X)                               # Plot the position,
plot(t[spikeindex], X[spikeindex], '.')  # ... and the spikes.  
xlabel('Time [sec]')                     # Label the axes.
ylabel('Position [cm]')
show()

在这里插入图片描述

# Fit Model 3 to the spike train data (omitting last input).
predictors['X2'] = X**2       # Add column for X^2

# GLM model with Poisson family and identity link function
model3 = sm.GLM(spiketrain, predictors, family=Poisson())
model3_results = model3.fit() # Fit model to our data
b3 = model3_results.params    # Get the predicted coefficient vector
print('b3:\n', b3)

bar(bins, spikehist / occupancy, width=8)    # Plot results as bars.
plot(bins, exp(b3[0] + b3[1] * bins + b3[2] * bins**2) * 1000,  # Plot model.
     'k', label='Model')   
xlabel('Position [cm]')                      # Label the axes.
ylabel('Occupancy norm. hist. [spikes/s]')
legend()
show()

在这里插入图片描述

Comparing and Evaluating Models

Method 1: Comparing AIC Values.

LL2 = model2.loglike(b2)
AIC2 = -2 * LL2 + 2 * 2
print('AIC2: ', AIC2)
print('model2_results.aic: ', model2_results.aic)

Method 2: Chi-Square Test for Nested Models.

p = 1 - chi2.cdf(dev2 - dev3, 1) # Compare Models 2 and 3, nested GLMs.
print('p:', p)

Method 3: Confidence Intervals for Individual Model Parameters.

CI2 = array([b2 - 2 * model2_results.bse, b2 + 2 * model2_results.bse]) # Compute 95% CI for parameters of Model 2.
print('CI2:\n', CI2)
eCI2 = exp(CI2)
print(eCI2)

Method 4: KS Test for Model Goodness-of-Fit.

Plot Raster

要求train是0-1矩阵，row是trial， col是time

imshow(train, aspect='auto',cmap='gray_r')
show()

在这里插入图片描述

Ltrials = where(direction==0)   #Identify left trials,
Rtrials = where(direction==1)   #... and right trials.
imshow(train[Ltrials], aspect = 'auto', cmap='gray_r', extent = (-1000,1000,50,0))  # Image left trials,
xlabel('Time (ms)')
ylabel('Trial')
title('Left trials')
show()

imshow(train[Rtrials], aspect = 'auto', cmap='gray_r', extent = (-1000,1000,50,0))  # ... and right trials.
xlabel('Time (ms)')
ylabel('Trial')
title('Right trials')
show()

在这里插入图片描述

PHST

spiketrials, spiketimes = where(train)
hist = histogram(spiketimes,2000)[0]/50/1e-3
bar(t, hist, 2) 
xlabel('Time (ms)')
ylabel('Spike Rate (spikes/s)')
show()

在这里插入图片描述

PSTH10 = histogram(spiketimes, 200)[0]                  #Compute histogram using 10 ms bins
bar(t[arange(0, 2000, 10)], PSTH10/50/10*1000,width=10) #... and plot it.
show()

在这里插入图片描述

i_plan = where(t < 0)[0]                   #Indices for planning.
i_move = where(t >= 0)[0]                  #Indices for movement.
                                           #Compute the average spike rate,
PlanRate = mean(train[:, i_plan]) / 1e-3   #...during planning,
MoveRate = mean(train[:, i_move]) / 1e-3   #...during movement.

Ltrials = where(direction==0)           #Find left trials,
Rtrials = where(direction==1)           #... and right trials,
LRate = mean(train[Ltrials, :]) / 1e-3  #... and compute rates.
RRate = mean(train[Rtrials, :]) / 1e-3

PlanL = sum(train[Ltrials][:,i_plan], axis = 1)   # Firing rate L, planning.
PlanR = sum(train[Rtrials][:,i_plan], axis = 1)   # Firing rate R, planning.
MoveL = sum(train[Ltrials][:,i_move], axis = 1)   # Firing rate L, movement.
MoveR = sum(train[Rtrials][:,i_move], axis = 1)   # Firing rate R, movement.
figure(figsize=(5, 3))
boxplot([PlanL, PlanR, MoveL, MoveR], sym='+', 
        labels = ['Plan Left', 'Plan Right', 'Move Left', 'Move Right'])
ylabel('Spike Rate (spikes/s)')
show()

在这里插入图片描述

#In the planning period,       
spiketrialsPlan, spiketimesPlan = where(train[:,i_plan] > 0) #Find spikes
PlanISIs = diff(spiketimesPlan)                              #... compute ISIs,
PlanISIs = PlanISIs[where(PlanISIs > 0)]                     #... drop spurious ones.
planhist = histogram(PlanISIs, 250)[0]

subplot(121)
bar(linspace(0, 250, 250), planhist, width = 2)              #Plot ISIs,
xlabel('Interspike Interval (ms)')                           #... label axes.
ylabel('Count')
title('Planning')

#In the movement period,       
spiketrialsMove, spiketimesMove = where(train[:, i_move] > 0)#Find spikes
MoveISIs = diff(spiketimesMove)                              #... compute ISIs,
MoveISIs = MoveISIs[where(MoveISIs>0)]                       #... drop spurious ones.
movehist = histogram(MoveISIs, 250)[0]

subplot(122)
bar(linspace(0, 250, 250), movehist, width = 2)              #Plot ISIs,
xlabel('Interspike Interval (ms)')                           #... label axes.
ylabel('Count')
title('Movement')
show()

在这里插入图片描述
A formula is a character string that describes the model. It will look something like responses ~ predictors (read “responses as a function of predictors”). Note that this is simply a different style of programming; the underlying mathematics is the same.