Bayle Shanks's website: work-storage-byTopic-neuroscience-quals-Q3slides

\documentclass{seminar} \usepackage{colordvi} \begin{document}

Information in the optic nerve

\begin{itemize} \item given a visual stimulus, what patterns of activity will be induced in the optic nerve? \item \textbf{given a pattern of activity in the optic nerve, what does that pattern mean?}

\begin{itemize} \item Neural code is unknown \item So, what can we do without it, or to learn it? \begin{itemize} \item Attempt to learn to reconstruct signal from spike train \item Constrain the neural code by estimating the amount of information in it \end{itemize} \end{itemize} \end{itemize}

Stimulus reconstruction from spike train

Given a pair (stimulus signal, spike train signal), learn to guess stimulus signal when given spike train.

A.I. supervised learning methods
Electrical engineering methods

Note: doing optimal reconstruction is a different problem from predicting the system's response to stimulus

Even if you know the I/O (encoding) function, that is, $f: visual\ stimuli \to spike\ trains$, you can't necessarily just invert that to get the \emph{optimal} reconstruction
$P(stimulus

response) = \frac{P(response

stimulus) P(stimulus)}{P(response)}$

$P(stimulus)$: the a priori probability distribution of incoming stimuli affects what you should guess
$P(response

stimulus)$: the distribution of noise matters

Note: doing optimal reconstruction is a different problem from predicting the system's response to stimulus cont'd

It's possible for a system to have a nonlinear response to stimuli, yet for the optimal reconstruction function to be linear!
This happens in fly H1 cells
Looking at reconstruction is biologically relevant; the brain has to decode

Signal estimation

Given a signal $S_2$, estimate another signal $S_1$
$S_1 = $ visual stimulus, $S_2 = $ spike train
Thinking in terms of continuous-time signals lets us deal gracefully with the problem of interpolating between the times when there is a spike (compare to thinking about I/O functions)
Consider a single neuron (although the procedure can be generalized to multiple neurons)

Signal estimation cont'd

We'll focus on the continuous-time Wiener filter (also known as Kolmogorov-Wiener filter)
The Wiener filter is the MMSE linear estimator
Another choice would have been a Kalman filter, which is a generalization of the Wiener filter which deals better with non-stationary processes

Signal estimation cont'd

The filter

Here's the equation that we'll use to predict the stimulus, given a spike train (it's a convolution of the spike train with some kernel K):

\begin{equation} stimulus(t) = \int K(\tau)\ spike\ train(t - \tau) d \tau \end{equation}

Signal estimation cont'd

The spike train

Model spike train as a sum of delta functions at the spikes (which occur at times $t_i$, $i = 1\ldots N$):

\begin{equation*} spike\ train(t) = \sum_{i=1}^N \delta(t - t_i) \end{equation*}

Signal estimation cont'd

The kernel

\begin{equation*} K(t) = \int \frac{dw}{2 \pi} e^{-i w t} \frac{E\left[ \tilde{s}(w) \sum_j e^{-i w t_j} \right]}{E\left[ \left

\end{equation*}

\sum_{j} e^{i w t_j} \right

^2 \right]}

$\tilde{s}(w)$ is the Fourier transform of the (training) stimulus.
Note: this is actually the \emph{acausal} Wiener filter. To get a causal one, we'll truncate it (and shift it, to account for delays). This is a hack, so it obviates our theoretical guarantees:
$K_2 = \Theta(\tau) K(\tau - delay)$, where $\Theta(\tau) = 1$ if $\tau > 0$, and 0 otherwise.
No time for derivation in 10-minute slide presentation; ask if you're interested, though

Information rate

Without knowing what the code is, information rate estimates can help us constrain our search for the code
For now, consider only a single neuron, and neglect temporal correlations

Collecting data

Two data sets:
\emph{ensemble data}: Present many different stimuli, and record evoked spike trains
\emph{conditional data}: Present one stimulus over and over again, and record evoked spike trains

\includegraphics[scale=.3]{twoDataSets.eps}

\emph{red = ensemble data, green = conditional data}

Discretize data

Now each recording is equivalent to a \emph{string of symbols} over some alphabet $A$

\includegraphics[scale=.2]{symbolStream.eps}

Information rate cont'd

Overview of procedure

Estimate total entropy rate ($\hat{h}$)from ensemble data
Estimate noise entropy rate ($\hat{h_{noise}} = \hat{h}(R

S = s)$) from conditional data

$estimated\ information\ rate = \hat{h} - \hat{h_{noise}}$

\bigskip

Estimating entropy rates will be very tricky when we account for temporal correlations, later!!!

Information rate cont'd

Estimating an entropy rate

Neglect temporal correlations $\Rightarrow$ pretend that $H = h$ \hfill
Make a histogram of the frequency of the occurence of each symbol in the data ($\hat{p_i}$ for symbol $i$)
$\hat{h} = \hat{H} = \sum_{i \in A} - \hat{p}_i \log_2 \hat{p}_i$

Information rate cont'd

One channel (axon in optic nerve) transmitting two signals; noise signal and neural code.
$\therefore$ $h = h_{noise} + information\ rate$

Information rate cont'd

Temporal correlations

But we neglected temporal correlations. So, each entropy rate will seem less predictable than it actually is (i.e. the entropies will be overestimated).
So, instead of considering individual symbols, consider \emph{blocks} of $N$ symbols
Treat each block as if it were one symbol in a larger alphabet

\includegraphics[scale=.3]{wordHist.eps}

$h_N = \frac{H(blocks\ of\ length\ N)}{N}$ \hfill
The larger $N$ is, the longer-delayed temporal correlations which you can account for
If we had infinite data, then $\lim_{N \to \infty} h_N = h$

\includegraphics[scale=.33]{wordLengthInfiniteData.eps}

Information rate cont'd

The danger of undersampling

Imagine that there are 10000 symbols which occur with equal probability, and we take 3 data points. At the best, 3 $p_i$s will be nonzero, and the rest will be zero. Notice that the true distribution is very unpredictable, but the empirical distribution looks very predictable ("we know it'll always be one of these three")
So, when we don't have enough data, entropy rates will be underestimated
Increasing the size of blocks makes us vulnerable to undersampling (in the extreme, we only have one block for each contiguous time segment)

Information rate cont'd

Conflicting biases!

Neglecting temporal correlations (i.e. $N$ too small) leads to \emph{over}estimate of entropy rates
Undersampling (i.e. $N$ too large) lead to \emph{under}estimate of entropy rates
We have neither an upper nor a lower bound with this procedure :(

Information rate cont'd

If we're lucky

If we're lucky, there will be a plateau where $N$ is "just right"

\includegraphics[scale=.3]{competingBias3.eps}

Information rate cont'd

"Direct method"

In the past, people have tried to approach the plateau from the left (i.e. an overestimate of entropy rate) and used their intuition to decide if they are near the plateau. Then then fit the entropy rate vs. $N$ curve with a 2nd-degree polynomial, and extrapolate to the asymptote.

Information rate cont'd

Plateau not guaranteed

But it's possible for there to be no plateau, or for it to be hard to detect.

\includegraphics[scale=.2]{noPlateau.eps}

Information rate cont'd

What to do?

More advanced methods are under development to automatically choose values of $N$, search for plateaus, etc. Some of these methods end up with no free parameters which a human must guess. However, there's always a price: these models make assumptions about the underlying process, or at the least they assume a prior probability distribution from which the underlying process was drawn.

\end{document}