You may notice that we specified index zero when we imported the variable t. This is because t is a vector (one-dimensional) rather than an array (multi-dimensional).
- 显示EEG整体
imshow(EEGa, # Image the data from condition A.
cmap='BuPu', # ... set the colormap (optional)
extent=[t[0], t[-1], 1, ntrials], # ... set axis limits (t[-1] represents the last element of t)
aspect='auto', # ... set aspect ratio
origin='lower') # ... put origin in lower left corner
xlabel('Time[s]') # Label the axes
ylabel('Trial #')
colorbar() # Show voltage to color mapping
vlines(0.25, 1, 1000, 'k', lw=2) # Indicate stimulus onset with line
savefig('imgs/2-4')
show()
2. 中心极限定理
95% of the values drawn from a normal distribution lie within approximately two standard deviations of the mean
mn = EEGa.mean(0)
sd = EEGa.std(0)
sdmn = sd / sqrt(ntrials)
- 定义好画图参数
# Create a function to label plots
def labelPlot(title_string="Title"):
'''
A function that labels the x-axis as 'Time [s]' and
the y-axis as 'Voltage [$\mu V$]'.
Arguments:
title_string: string variable to be used as
the plot title (default: 'Title')
'''
xlabel('Time [s]') # x-axis is time
ylabel('Voltage [$/mu V$]') # y-axis is voltage
title(title_string) # use the input here
autoscale(tight=True) # no white-space in plot
调用:
labelPlot('ERP of conditions A and B')
- 比较两组数据之间的差异,其中的方差需要按照新公式计算
To facilitate further inspection of the data, we compute the difference between the ERPs in the two conditions. In the differenced signal, large deviations between the two conditions will appear as large differences from zero. To determine whether a deviation is significantly different from zero, we need to determine the confidence interval for the differenced ERP. This requires we propagate the standard deviation of the mean for both ERPs to the new differenced ERP. The propagated standard deviation of the mean at a fixed moment in time is computed as:
where 𝜎𝐴 is the standard deviation of the data from condition A, 𝜎𝐵 is the standard deviation of the data from condition B, and 𝐾 is the number of trials.
In Python we compute the differenced ERP and standard deviation of the mean of the difference as follows:
mnA = EEGa.mean(0) # ERP of condition A
sdmnA = EEGa.std(0) / sqrt(ntrials) # ... and standard dev of mean
mnB = EEGb.mean(0) # ERP of condition B
sdmnB = EEGb.std(0) / sqrt(ntrials) # ... and standard dev of mean
mnD = mnA - mnB # Differenced ERP
sdmnD = sqrt(sdmnA ** 2 + sdmnB ** 2) # ... and its standard dev
plot(t, mnD, 'k', lw=3) # plot the differenced ERP
plot(t, mnD + 2 * sdmnD, 'k:') # ... the upper CI
plot(t, mnD - 2 * sdmnD, 'k:') # ... and the lower CI
plot([0, 1], [0, 0], 'b') # ... and a horizontal line at 0
labelPlot('Differenced ERP') # label the plot
show()
分别绘制两者在一张图片上,
绘制两者的差值在一张图片上:
- 利用boostrap,计算5%和95%的点,从而绘制置信区间。
step 1. Sample with replacement 1,000 trials of the EEG data from condition A.
step 2. Average these 1,000 trials to create a resampled ERP.
step 3. Repeat these two steps 3,000 times to create a distribution of ERPs.
step 4. For each time point, identify the values greater than 2.5% and less than 97.5% of all 3,000 values. This range determines the 95% confidence interval for the ERP for that time point.
step 1:
# Draw 1000 integers with replacement from [0, 1000)
i = randint(0, ntrials, size=ntrials)
EEG0 = EEGa[i] # Create the resampled EEG.
step 2:
ERP0 = EEG0.mean(0) # Create the resampled ERP
step 3:
i = randint(ntrials, size=ntrials); # Draw integers,
EEG1 = EEGa[i]; # ... create resampled EEG,
ERP1 = EEG1.mean(0); # ... create resampled ERP.
i = randint(ntrials, size=ntrials); # Draw integers,
EEG2 = EEGa[i]; # ... create resampled EEG,
ERP2 = EEG2.mean(0); # ... create resampled ERP.
i = randint(ntrials, size=ntrials); # Draw integers,
EEG3 = EEGa[i]; # ... create resampled EEG,
ERP3 = EEG3.mean(0); # ... create resampled ERP.
def bootstrapERP(EEGdata, size=None): # Steps 1-2
""" Calculate bootstrap ERP from data (array type)"""
ntrials = len(EEGdata) # Get the number of trials
if size == None: # Unless the size is specified,
size = ntrials # ... choose ntrials
i = randint(ntrials, size=size) # ... draw random trials,
EEG0 = EEGdata[i] # ... create resampled EEG,
return EEG0.mean(0) # ... return resampled ERP.
# Step 3: Repeat 3000 times
ERP0 = [bootstrapERP(EEGa) for _ in range(3000)]
ERP0 = array(ERP0) # ... and convert the result to an array
···
In Python it is common to see for-loops written in the form
`y = [f(x) for x in some_set]`
This will return a list datatype, which is why we had to convert it to an array in the code above. We could also have written the loop in an alternative way:
`ERP0 = zeros((3000, EEGa.shape[1]))`
`for k in range(3000):`
` ERP0[k, :] = bootstrapERP()`
step 4:
# With the resampled values sorted in this way, we then find the resampled ERP value at index 0.025×3000=75 and 0.975×3000=2925
ERP0.sort(axis=0) # Sort each column of the resampled ERP
N = len(ERP0) # Define the number of samples
ciL = ERP0[int(0.025*N)] # Determine the lower CI
ciU = ERP0[int(0.975*N)] # ... and the upper CI
mnA = EEGa.mean(0) # Determine the ERP for condition A
plot(t, mnA, 'k', lw=3) # ... and plot it
plot(t, ciL, 'k:') # ... and plot the lower CI
plot(t, ciU, 'k:') # ... and the upper CI
hlines(0, 0, 1, 'b') # plot a horizontal line at 0
# ... and label the axes
labelPlot('ERP of condition A with bootstrap confidence intervals') # We define this function above!
上图只是绘制了EEGa的均值和置信区间图,接下来绘制EEGa与EEGb之间差异的图(用boostrap方法):
step 1: Merge the 1,000 trials each of EEG data from conditions A and B to form a combined distribution of 2,000 trials.
step 2: Sample with replacement 1,000 trials of EEG data from the combined distribution, and compute the resampled ERP.
step 3: Repeat step 2 and compute a second resampled ERP.
step 4: Compute the statistic, the maximum absolute value of the difference between the two resampled ERPs.
step 5: Repeat steps 2-4, 3,000 times to create a distribution of statistic values.
step 6: Compare the observed statistic to this distribution of statistic values. If the observed statistic is greater than 95% of the bootstrapped values, then reject the null hypothesis that the two conditions are the same.
step 1:
mbA = mean(EEGa,0) # Determine ERP for condition A
mnB = mean(EEGb,0) # Determine ERP for condition B
mnD = mnA - mnB # Compute the differenced ERP
stat = max(abs(mnD)) # Compute the statistic
print('stat = {:.4f}'.format(stat))
step 2"
EEG = vstack((EEGa, EEGb)) # Step 1. Merge EEG data from all trials
seed(123) # For reproducibility
def bootstrapStat(EEG): # Steps 2-4.
mnA = bootstrapERP(EEG, size=ntrials) # Create resampled ERPa. The function 'bootstrapERP' is defined above!
mnB = bootstrapERP(EEG, size=ntrials) # Create resampled ERPb
mnD = mnA - mnB # Compute differenced ERP
return max(abs(mnD)) # Return the statistic
# Resample 3,000 times
statD = [bootstrapStat(EEG) for _ in range(3000)]
结果显示,之前分别计算EEGa和EEGb的最大差值为0.292,这值出现的概率很低,所以,EEGa和EEGb是不同的。
03.ipynb
自相关
if we know the data tends to oscillate near 60 Hz, then given the value of the EEG data now, we can accurately predict the value of the EEG data 1/60 s in the future (i.e., one cycle of the 60 Hz activity); it should be similar. One technique to assess the dependent structure in the data is the autocovariance.
In words, the autocovariance multiplies the data 𝑥 at index 𝑛+𝐿 , by the data 𝑥 at index 𝑛 , and sums these products over all indices 𝑛 . Notice that, in both terms, the mean value 𝑥¯ is subtracted from 𝑥 before computing the product, and we divide the resulting sum by the total number of data points in 𝑥 .
Physically, the decrease in autocovariance with lag is consistent with the notion that data becomes less similar as time progresses
- We may therefore think of the Fourier transform as comparing the data 𝑥 to the sinusoids oscillating at frequency 𝑓𝑗 . When the data and sinusoid at frequency 𝑓𝑗 align the summation in the Fourier transform is large and the result 𝑋𝑗 is a large number. When the data and sinusoid at frequency 𝑓𝑗 do not align, the summation in the Fourier transform is small and 𝑋𝑗 is a tiny number.
Relation of the Spectrum to Multiple Linear Regression
The idea of linear regression is to express a response variable at time 𝑛 (call it 𝑥𝑛 ) in terms of predictor variables (call them 𝑧1𝑛,𝑧2𝑛,…,𝑧𝑝𝑛 for 𝑝 predictor variables) as
以60Hz为例,来计算
from statsmodels import *
from panda import DataFrame as df
predictors = df(data={ # Create a dataframe with the predictors
'sin': sin(2 * pi * 60 * t), # ... including the sine function
'cos': cos(2 * pi * 60 * t), # ... and the cosine function
'EEG': EEG
})
# Fit the model to the data
model = smf.ols('EEG ~ sin + cos', data=predictors).fit()
print(model.params)
EEG_60Hz_modeled = model.predict() # Get the model prediction
# Remove the model prediction from the EEG data
EEG_cleaned = EEG - EEG_60Hz_modeled
plot(t, EEG_cleaned) # ... and plot the result
xlabel('Time [s]')
ylabel('EEG Cleaned [$\mu$V]')
show()
未去60Hz干扰前的信号:
The Power Spectrum (Part 2)
Spectral Analysis: The Rectangular Taper and Zero Padding
Taper选择的影响
The spectrum consists of features at many frequencies to represent the rapid increase and decrease of the rectangular taper (i.e., the rapid transition from 0 to 1, and then from 1 to 0). The intuition for this is that many sinusoids, aligned in a particular way, are required to represent a sharp transition in a time series. As an example, consider the following square wave function:
Zero padding.
An interesting issue to consider is how appending zeros impacts the spectrum of the rectangular taper.
We know that increasing the signal length (𝑇) improves the frequency resolution。 We therefore expect that adding more points to the signal (even noninformative points, such as zeros) will increase the number of points along the frequency axis. However, appending zeros to a time series is not equivalent to observing more data (and thereby increasing 𝑇 ). By appending zeros, we of course do not gain any additional information about the signal. Therefore appending zeros to a signal cannot improve the frequency resolution. Instead, the impact of appending zeros is to increase the number of points along the frequency axis in the spectrum. This can be useful in visualizing the spectrum; for example, by appending more and more zeros to the rectangular taper, we produce a less jagged spectrum.
Beyond the Rectangular Taper—The Hanning Taper
Beyond the Hanning Taper—The Multitaper Method
Confidence Intervals of the Spectrum
pmtm() function
05 Autocovariance and Cross-covariance
We conclude that the coherence ( 𝜅𝑥𝑦,𝑗 ) is a scaled version of the frequency domain representation of the statistical model coefficients ( 𝛾𝑗 ) for predicting 𝑦 from 𝑥 .
filter
Do not use the naive rectangular filter or the Hanning filter on your data.
The Finite Impulse Response (FIR) Filter: An Example.
The most common category of filter applied in neuroscience applications is the finite impulse response (FIR) filter. The name for this approach is actually quite informative. In the previous section, we defined the impulse response; it represents the response of the filter to a signal composed of only a single impulse. “Finite impulse response” indicates that the impulse response consists of only a finite number of nonzero terms.
We focus specifically on the application of a lowpass FIR filter to the EEG data. We use the firwin() command in from the SciPy Signal module to design this filter and then apply it using the convolution function we defined earlier.
To summarize, the design and application of a lowpass FIR filter can be performed in two simple steps:
b = firwin(n, Wn) # Design the lowpass filter,
xnew_lfilt = lfilter(b, 1, x) # … and apply it to the signal x
data = loadmat('matfiles/EEG-1.mat') # Load the EEG data.
eeg = data['EEG']
t = data['t'][0]
dt = t[1] - t[0] # Define the sampling interval.
fNQ = 1 / dt / 2 # Determine the Nyquist frequency.
K = len(eeg) # Determine no. of trials.
n = 100 # Define the filter order
Wn = 30 / fNQ # ... and specify the cutoff frequency,
b = firwin(n, Wn) # ... build lowpass filter.
eeg_lo = array([filtfilt(b, 1, eeg[k]) for k in range(K)]) # Zero-phase filter each trial
mn = eeg_lo.mean(0) # Compute mean of filtered EEG across trials (ERP)
sd = eeg_lo.std(0) # Compute std of filtered EEG data across trials.
sdmn = sd / sqrt(K); # Compute the std of the mean.
plot(t, mn) # Plot the ERP of the filtered data
plot(t, mn + 2 * sdmn, 'r:'); # ... and the confidence intervals,
plot(t, mn - 2 * sdmn, 'r:');
xlabel('Time [s]') # ... and label the axes.
ylabel('Voltage [ mV]')
title('Evoked response')
savefig('imgs/6-14a')
show()
def spectrum(x, t, hann=True):
'''
Define a function that computes the power spectrum
for any given trial data (x) and corresponding sample time (t).
'''
dt = t[1] - t[0]
T = t[-1]
xh = hanning(len(x)) * (x - x.mean()) if hann else x - x.mean()
xf = rfft(xh)[:-1]
Sxx = 2 * dt ** 2 / T * (xf * xf.conj())
return Sxx.real
N = len(t) # Define the number of time points per trial.
faxis = fftfreq(N, dt)[:N // 2] # Define the positive frequency axis,
Sxx = [spectrum(trial, t, False) for trial in eeg_lo] # Compute the spectrum for each trial.
Sxxmn = array(Sxx).mean(0) # Convert the result into an array and compute the mean.
semilogx(faxis, 10 * log10(Sxxmn)) # Plot the result in decibels vs frequency,
xlim([df, 100]) # ... in limited frequency range,
ylim([-60, 0]) # ... and a limited power range,
xlabel('Frequency [Hz]') # ... with axes labeled.
ylabel('Power [dB]')
title('Trial averaged spectrum')
savefig('imgs/6-14b')
show()
Phase-Amplitude Coupling (PAC)
Cross Frequent coupling (CFC)中的一种
To assess whether different frequency rhythms interact in the LFP recording, we implement a measure to calculate the CFC. The idea of CFC analysis, in our case, is to determine whether a relation exists between the phase of a low-frequency signal and the envelope or amplitude of a high-frequency signal. In general, computing CFC involves three steps. Each step contains important questions and encompasses entire fields of study. Our goal in this section is to move quickly forward and produce a procedure we can employ, investigate, and criticize. Continued study of CFC - and the associated nuances of each step - is an active area of ongoing research.
CFC Analysis Steps
1、Filter the data into high- and low-frequency bands.
2、Extract the amplitude and phase from the filtered signals.
3、Determine if the phase and amplitude are related.
Spiking
def autocorr(x, lags):
xcorr = correlate(x - x.mean(), x - x.mean(), 'full') # Compute the autocorrelation
xcorr = xcorr[xcorr.size//2:] / xcorr.max() # Convert to correlation coefficients
return xcorr[:lags+1] # Return only requested lags
time_bins = arange(0, 30, 0.001) # Define the time bins
IncrementsLow1, _ = histogram(SpikesLow, time_bins) # ... compute the histogram to create increment process
ACFLow = autocorr(IncrementsLow1, 100) # ... and the autocorrelation
plot(ACFLow, '.') # Plot autocorrelation vs lags,
N1 = len(IncrementsLow1) # ... compute the sample size
sig = 2 / sqrt(N1) # ... and the significance level
plot([0, 100], [sig, sig], 'r:') # ... and plot the upper and lower significance lines
plot([0, 100], [-sig, -sig], 'r:')
xlim([0, 100]) # ... set x-limits
ylim([-.1, .1]) # ... and y-limits
xlabel('Time [ms]') # ... with axes labeled
ylabel('Autocorrelation')
show()
IncrementsHigh1, _ = histogram(SpikesHigh, time_bins) # Compute the histogram to create increment process
ACFHigh = autocorr(IncrementsHigh1, 100) # ... and the autocorrelation.
plot(ACFHigh, '.') # Plot the autocorrelation,
sig = 2 / sqrt(len(IncrementsHigh1)) # ... compute and plot the significance level,
plot([0, 100], [sig, sig], 'r:')
plot([0, 100], [-sig, -sig], 'r:')
xlim([0, 100]) # ... and set the plot limits,
ylim([-.1, .1])
xlabel('Time [ms]') # ... with axes labeled.
ylabel('Autocorrelation')
show()
N2 = len(IncrementsHigh1)
ACFDiff = ACFHigh - ACFLow # Compute differences of autocorrelations
plot(ACFDiff, '.') # ... and plot them
sd = sqrt(1/N1+1/N2) # ... with significance bounds.
plot([0, 100], [2 * sd * x for x in [1, 1]], 'r:')
plot([0, 100], [-2 * sd * x for x in [1, 1]], 'r:')
xlim([0, 100]) # Set the plot limits and label the axes.
ylim([-.1, .1])
xlabel('Time [ms]')
ylabel('Autocorrelation difference')
show()
GLMs
spikeindex = where(spiketrain!=0)[0] # Get the spike indices.
plot(t, X) # Plot the position,
plot(t[spikeindex], X[spikeindex], '.') # ... and the spikes.
xlabel('Time [sec]') # Label the axes.
ylabel('Position [cm]')
show()
# Fit Model 3 to the spike train data (omitting last input).
predictors['X2'] = X**2 # Add column for X^2
# GLM model with Poisson family and identity link function
model3 = sm.GLM(spiketrain, predictors, family=Poisson())
model3_results = model3.fit() # Fit model to our data
b3 = model3_results.params # Get the predicted coefficient vector
print('b3:\n', b3)
bar(bins, spikehist / occupancy, width=8) # Plot results as bars.
plot(bins, exp(b3[0] + b3[1] * bins + b3[2] * bins**2) * 1000, # Plot model.
'k', label='Model')
xlabel('Position [cm]') # Label the axes.
ylabel('Occupancy norm. hist. [spikes/s]')
legend()
show()
Comparing and Evaluating Models
Method 1: Comparing AIC Values.
LL2 = model2.loglike(b2)
AIC2 = -2 * LL2 + 2 * 2
print('AIC2: ', AIC2)
print('model2_results.aic: ', model2_results.aic)
Method 2: Chi-Square Test for Nested Models.
p = 1 - chi2.cdf(dev2 - dev3, 1) # Compare Models 2 and 3, nested GLMs.
print('p:', p)
Method 3: Confidence Intervals for Individual Model Parameters.
CI2 = array([b2 - 2 * model2_results.bse, b2 + 2 * model2_results.bse]) # Compute 95% CI for parameters of Model 2.
print('CI2:\n', CI2)
eCI2 = exp(CI2)
print(eCI2)
Method 4: KS Test for Model Goodness-of-Fit.
Plot Raster
要求train是0-1矩阵,row是trial, col是time
imshow(train, aspect='auto',cmap='gray_r')
show()
Ltrials = where(direction==0) #Identify left trials,
Rtrials = where(direction==1) #... and right trials.
imshow(train[Ltrials], aspect = 'auto', cmap='gray_r', extent = (-1000,1000,50,0)) # Image left trials,
xlabel('Time (ms)')
ylabel('Trial')
title('Left trials')
show()
imshow(train[Rtrials], aspect = 'auto', cmap='gray_r', extent = (-1000,1000,50,0)) # ... and right trials.
xlabel('Time (ms)')
ylabel('Trial')
title('Right trials')
show()
PHST
spiketrials, spiketimes = where(train)
hist = histogram(spiketimes,2000)[0]/50/1e-3
bar(t, hist, 2)
xlabel('Time (ms)')
ylabel('Spike Rate (spikes/s)')
show()
PSTH10 = histogram(spiketimes, 200)[0] #Compute histogram using 10 ms bins
bar(t[arange(0, 2000, 10)], PSTH10/50/10*1000,width=10) #... and plot it.
show()
i_plan = where(t < 0)[0] #Indices for planning.
i_move = where(t >= 0)[0] #Indices for movement.
#Compute the average spike rate,
PlanRate = mean(train[:, i_plan]) / 1e-3 #...during planning,
MoveRate = mean(train[:, i_move]) / 1e-3 #...during movement.
Ltrials = where(direction==0) #Find left trials,
Rtrials = where(direction==1) #... and right trials,
LRate = mean(train[Ltrials, :]) / 1e-3 #... and compute rates.
RRate = mean(train[Rtrials, :]) / 1e-3
PlanL = sum(train[Ltrials][:,i_plan], axis = 1) # Firing rate L, planning.
PlanR = sum(train[Rtrials][:,i_plan], axis = 1) # Firing rate R, planning.
MoveL = sum(train[Ltrials][:,i_move], axis = 1) # Firing rate L, movement.
MoveR = sum(train[Rtrials][:,i_move], axis = 1) # Firing rate R, movement.
figure(figsize=(5, 3))
boxplot([PlanL, PlanR, MoveL, MoveR], sym='+',
labels = ['Plan Left', 'Plan Right', 'Move Left', 'Move Right'])
ylabel('Spike Rate (spikes/s)')
show()
#In the planning period,
spiketrialsPlan, spiketimesPlan = where(train[:,i_plan] > 0) #Find spikes
PlanISIs = diff(spiketimesPlan) #... compute ISIs,
PlanISIs = PlanISIs[where(PlanISIs > 0)] #... drop spurious ones.
planhist = histogram(PlanISIs, 250)[0]
subplot(121)
bar(linspace(0, 250, 250), planhist, width = 2) #Plot ISIs,
xlabel('Interspike Interval (ms)') #... label axes.
ylabel('Count')
title('Planning')
#In the movement period,
spiketrialsMove, spiketimesMove = where(train[:, i_move] > 0)#Find spikes
MoveISIs = diff(spiketimesMove) #... compute ISIs,
MoveISIs = MoveISIs[where(MoveISIs>0)] #... drop spurious ones.
movehist = histogram(MoveISIs, 250)[0]
subplot(122)
bar(linspace(0, 250, 250), movehist, width = 2) #Plot ISIs,
xlabel('Interspike Interval (ms)') #... label axes.
ylabel('Count')
title('Movement')
show()
A formula is a character string that describes the model. It will look something like responses ~ predictors (read “responses as a function of predictors”). Note that this is simply a different style of programming; the underlying mathematics is the same.