\documentclass{seminar} \usepackage{colordvi} \begin{document}
Information in the optic nerve
\begin{itemize} \item given a visual stimulus, what patterns of activity will be induced in the optic nerve? \item \textbf{given a pattern of activity in the optic nerve, what does that pattern mean?}
\begin{itemize} \item Neural code is unknown \item So, what can we do without it, or to learn it? \begin{itemize} \item Attempt to learn to reconstruct signal from spike train \item Constrain the neural code by estimating the amount of information in it \end{itemize} \end{itemize} \end{itemize}
Stimulus reconstruction from spike train
Given a pair (stimulus signal, spike train signal), learn to guess stimulus signal when given spike train.
- A.I. supervised learning methods
- Electrical engineering methods
Note: doing optimal reconstruction is a different problem from predicting the system's response to stimulus
- Even if you know the I/O (encoding) function, that is, $f: visual\ stimuli \to spike\ trains$, you can't necessarily just invert that to get the \emph{optimal} reconstruction
- $P(stimulus
response) = \frac{P(response | stimulus) P(stimulus)}{P(response)}$ |
- $P(stimulus)$: the a priori probability distribution of incoming stimuli affects what you should guess
- $P(response
stimulus)$: the distribution of noise matters |
Note: doing optimal reconstruction is a different problem from predicting the system's response to stimulus cont'd
- It's possible for a system to have a nonlinear response to stimuli, yet for the optimal reconstruction function to be linear!
- This happens in fly H1 cells
- Looking at reconstruction is biologically relevant; the brain has to decode
Signal estimation
- Given a signal $S_2$, estimate another signal $S_1$
- $S_1 = $ visual stimulus, $S_2 = $ spike train
- Thinking in terms of continuous-time signals lets us deal gracefully with the problem of interpolating between the times when there is a spike (compare to thinking about I/O functions)
- Consider a single neuron (although the procedure can be generalized to multiple neurons)
Signal estimation cont'd
- We'll focus on the continuous-time Wiener filter (also known as Kolmogorov-Wiener filter)
- The Wiener filter is the MMSE linear estimator
- Another choice would have been a Kalman filter, which is a generalization of the Wiener filter which deals better with non-stationary processes
Signal estimation cont'd
The filter
Here's the equation that we'll use to predict the stimulus, given a spike train (it's a convolution of the spike train with some kernel K):
\begin{equation} stimulus(t) = \int K(\tau)\ spike\ train(t - \tau) d \tau \end{equation}
Signal estimation cont'd
The spike train
Model spike train as a sum of delta functions at the spikes (which occur at times $t_i$, $i = 1\ldots N$):
\begin{equation*} spike\ train(t) = \sum_{i=1}^N \delta(t - t_i) \end{equation*}
Signal estimation cont'd
The kernel
\begin{equation*} K(t) = \int \frac{dw}{2 \pi} e^{-i w t} \frac{E\left[ \tilde{s}(w) \sum_j e^{-i w t_j} \right]}{E\left[ \left
\sum_{j} e^{i w t_j} \right | ^2 \right]} |
\end{equation*}
- $\tilde{s}(w)$ is the Fourier transform of the (training) stimulus.
- Note: this is actually the \emph{acausal} Wiener filter. To get a causal one, we'll truncate it (and shift it, to account for delays). This is a hack, so it obviates our theoretical guarantees:
- $K_2 = \Theta(\tau) K(\tau - delay)$, where $\Theta(\tau) = 1$ if $\tau > 0$, and 0 otherwise.
- No time for derivation in 10-minute slide presentation; ask if you're interested, though
Information rate
- Without knowing what the code is, information rate estimates can help us constrain our search for the code
- For now, consider only a single neuron, and neglect temporal correlations
Collecting data
- Two data sets:
- \emph{ensemble data}: Present many different stimuli, and record evoked spike trains
- \emph{conditional data}: Present one stimulus over and over again, and record evoked spike trains
\includegraphics[scale=.3]{twoDataSets.eps}
\emph{red = ensemble data, green = conditional data}
Discretize data
Now each recording is equivalent to a \emph{string of symbols} over some alphabet $A$
\includegraphics[scale=.2]{symbolStream.eps}
Information rate cont'd
Overview of procedure
- Estimate total entropy rate ($\hat{h}$)from ensemble data
- Estimate noise entropy rate ($\hat{h_{noise}} = \hat{h}(R
S = s)$) from conditional data |
- $estimated\ information\ rate = \hat{h} - \hat{h_{noise}}$
\bigskip
Estimating entropy rates will be very tricky when we account for temporal correlations, later!!!
Information rate cont'd
Estimating an entropy rate
- Neglect temporal correlations $\Rightarrow$ pretend that $H = h$ \hfill
- Make a histogram of the frequency of the occurence of each symbol in the data ($\hat{p_i}$ for symbol $i$)
- $\hat{h} = \hat{H} = \sum_{i \in A} - \hat{p}_i \log_2 \hat{p}_i$
Information rate cont'd
- One channel (axon in optic nerve) transmitting two signals; noise signal and neural code.
- $\therefore$ $h = h_{noise} + information\ rate$
Information rate cont'd
Temporal correlations
- But we neglected temporal correlations. So, each entropy rate will seem less predictable than it actually is (i.e. the entropies will be overestimated).
- So, instead of considering individual symbols, consider \emph{blocks} of $N$ symbols
- Treat each block as if it were one symbol in a larger alphabet
\includegraphics[scale=.3]{wordHist.eps}
- $h_N = \frac{H(blocks\ of\ length\ N)}{N}$ \hfill
- The larger $N$ is, the longer-delayed temporal correlations which you can account for
- If we had infinite data, then $\lim_{N \to \infty} h_N = h$
\includegraphics[scale=.33]{wordLengthInfiniteData.eps}
Information rate cont'd
The danger of undersampling
- Imagine that there are 10000 symbols which occur with equal probability, and we take 3 data points. At the best, 3 $p_i$s will be nonzero, and the rest will be zero. Notice that the true distribution is very unpredictable, but the empirical distribution looks very predictable ("we know it'll always be one of these three")
- So, when we don't have enough data, entropy rates will be underestimated
- Increasing the size of blocks makes us vulnerable to undersampling (in the extreme, we only have one block for each contiguous time segment)
Information rate cont'd
Conflicting biases!
- Neglecting temporal correlations (i.e. $N$ too small) leads to \emph{over}estimate of entropy rates
- Undersampling (i.e. $N$ too large) lead to \emph{under}estimate of entropy rates
- We have neither an upper nor a lower bound with this procedure :(
Information rate cont'd
If we're lucky
If we're lucky, there will be a plateau where $N$ is "just right"
\includegraphics[scale=.3]{competingBias3.eps}
Information rate cont'd
"Direct method"
In the past, people have tried to approach the plateau from the left (i.e. an overestimate of entropy rate) and used their intuition to decide if they are near the plateau. Then then fit the entropy rate vs. $N$ curve with a 2nd-degree polynomial, and extrapolate to the asymptote.
Information rate cont'd
Plateau not guaranteed
But it's possible for there to be no plateau, or for it to be hard to detect.
\includegraphics[scale=.2]{noPlateau.eps}
Information rate cont'd
What to do?
More advanced methods are under development to automatically choose values of $N$, search for plateaus, etc. Some of these methods end up with no free parameters which a human must guess. However, there's always a price: these models make assumptions about the underlying process, or at the least they assume a prior probability distribution from which the underlying process was drawn.
\end{document}