work-storage-byTopic-neuroscience-quals-Q3

\documentclass{article}

%% NOTE: before running EasyLatex? on this, remove comments within [[[...?]]], and this comment

	\oddsidemargin  -0.6in
	\evensidemargin -0.6in
	\textwidth  7.7in

\topmargin -.4in \textheight 10.3in

\begin{document}

Bayle Shanks

An interesting question in neuroscience is, what is the neural code for vision in the optic nerve? That is, given a visual stimulus, what patterns of activity will be induced in the optic nerve; and, given a pattern of activity in the optic nerve, what does that pattern mean? Here, we'll focus on the latter question.

If we knew the neural code, we could perhaps say what a given pattern of optic nerve activity meant\footnote{at least, we could make a guess; either ambiguity in the code or noise could still prevent us from reconstructing the stimulus correctly}. But the code is unknown; today, "no one would venture to describe the visual scene given only a recording of the optic nerve's spike trains." (Meister and Barry '99). So, I'll focus on what we can do without knowing the code; these tools will help us learn about the code.

Stimulus reconstruction

Given a pattern of optic nerve activation, what was the stimulus which caused it? Even if you knew the function $f$ which the retina applied to the stimulus in order to produce spike trains in the optic nerve, this would not be as simple as inverting that function. In order to answer this in a stochastic environment, you would look at the probability distribution P(stimulus $

$ response), using Bayes' equation:

\begin{equation} P(stimulus

\end{equation}
response) = \frac{P(response stimulus) P(stimulus)}{P(response)}

In addition to $f$, you would have to factor in the distribution of noise in order to produce P(response $

$ stimulus). Note that you also need to include the distribution of stimuli, P(stimulus).

However, instead of developing a model of how the retina transforms various stimuli and then inverting the model, we can proceed directly to the inverse problem (which might later inform our search for a neural code). Consider the spike trains of an individual neuron.

If we record a data set consisting of visual stimuli paired with the spike trains which they evoke, we could then feed this to an A.I. supervised learning method.

We could also solve analytically for the best least-squares linear reconstruction filter. First, we model our spike train as continuous signal which is a sum of delta functions at the spikes (which occur at times $t_i$, $i = 1\ldots N$):

\begin{equation} spike\ train(t) = \sum_{i=1}^N \delta(t - t_i) \end{equation}

Next, we will find the Kolmogorov-Wiener filter, which gives the least-squares optimal linear filter to get the stimulus signal when given the spike train signal. This will be expressed in terms of a kernel $K(\tau)$ (see below). Then, when given a spike train, we will convolve $K$ with the spike train signal to estimate a stimulus signal:

\begin{equation} stimulus(t) = \int K(\tau)\ spike\ train(t - \tau) \end{equation}

Now let's look at $K$. The Kolmogorov-Wiener filter for this system is:

\begin{equation} K(t) = \int \frac{dw}{2 \pi} e^{-i w t} \frac{\left< \tilde{s}(w) \sum_j e^{-i w t_j} \right>}{\left< \sum_{i,j} e^{i w (t_i - t_j)} \right>} \end{equation}

I'm out of space, but please ask me for some of the intuitive meaning of this equation, or for the derivation, if you're interested.

Estimating the information rate

Motivation

Without knowing what neural coding is being used, the average mutual information rate of some signal provides an upper bound on the amount of information encoded in that signal\footnote{I say upper bound because it's possible that the "neural code" used by the brain ignores some of the Shannon information in the signal}. Therefore, knowing something about the information rate helps us constrain our search for the neural code. The units of information rate are in bits per second.

Experiments

To begin with, we'll look at the spike train of a single neuron, and we'll neglect temporal correlations. Present an ensemble of different visual stimuli, and record the optic nerve spike trains. We'll call this the "ensemble spike train data", or just "ensemble data".

Now, present __one__ visual stimulus many different times, and record the optic nerve spike trains. We'll call this the "conditional spike train data", or just "conditional data".

Discretize both data sets into bins of some length $\tau$. The set of possible events within each bin (i.e. no spikes, 1 spike, 2 spikes..) will be called our __alphabet__, $A$. Note that our discretized spike train signals are equivalent to strings of symbols from $A$.

Definitions

To make things easier to follow later, we'll define some symbols upfront. $f$ denotes information rate. $H$ denotes entropy of the spike train. The entropy of a string of symbols is defined\footnote{In fact, information theory provides an optimal coding scheme in which each symbol can be coded using a string of bits with length $-\log_2 p_i$, and the formula for entropy is derived from that scheme; it is the expected length of each message} in terms of the probability distribution over the alphabet $A$: [[[$H = \sum_{i \in A} - p_i \log_2 p_i$]]]

\begin{equation}H = \sum_{i \in A} - p_i \log_2 p_i\end{equation}

$h$ denotes entropy rate per symbol. The entropy rate of a signal represents how many bits/symbol you would need to send to retransmit this signal if you used the best possible compression. If each symbol is drawn i.i.d. from some probability distribution (i.e. if there are no temporal correlations between symbols), then $h = H$. $H_{noise}$ and $h_{noise}$ denote the noise entropy, and the noise entropy rate.

In general, a symbol with a caret on top denotes an __estimate__ based on data; e.g. $\hat{h}$ is an estimate of $h$. $\hat{p}_i$ denotes the observed frequency of symbol $i$ in the context of some spike train data set. That is, if you make a histogram of the frequency of the occurence of each symbol in the data set, then these frequencies are the $\hat{p}_i$s.

Preview

Here is an overview of how we'll come up with our estimate of the information rate, $\hat{f}$. First, we'll make estimates of two entropy rates: the noise entropy rate, $h_{noise}$, and the total entropy rate, $h$. Both of these estimates will use the same estimation procedure; but the noise entropy rate is the result of applying this procedure to the conditional data, whereas the total entropy rate comes from the ensemble data.

Finally, we'll estimate the information rate as $\hat{f} = \hat{h} - h_{noise}$.

Estimating the entropy rate of spike train data

Here's how we can estimate an entropy rate for some particular spike train data set.

Since we are neglecting temporal correlations, we'll pretend that $H = h$, and just estimate $H$. The definition for $H$ is straightforward (see "Definitions"); but unfortunately, we don't have direct access to the probability distributions $p_i$; all we know is that the strings of symbols that we have observed were drawn from $p_i$.

The simplest thing to do is to make a histogram of the frequency of the occurence of each symbol in the data and then to to estimate the entropy from the histogram. We do this by plugging the empirical frequencies into the definition of entropy:

\begin{equation} \hat{h} = \hat{H} = \sum_{i \in A} - \hat{p}_i \log_2 \hat{p}_i \end{equation}

In the limit of infinite data, this converges to the true entropy, but with finite data, we are in danger of undersampling. Undersampling is when the empirical distribution (the $\hat{p}_i$s) is significantly more peaked than the true distribution (the $p_i$s), due to lack of data.[[[ For example, imagine that there are 10000 symbols which occur with equal probability, and we take 3 data points. At the best, 3 $p_i$s will be nonzero, and the rest will be zero. Notice that the true distribution is very unpredictable, but the empirical distribution looks very predictable ("we know it'll always be one of these three") (in the worst case, with only one data point, the empirical distribution looks perfectly predictable, and its entropy is 0).]]] Hence, when we undersample, $\hat{h}$ can be much less than $h$.

Calculating the information rate from the entropy rates

If there was a deterministic mapping from visual stimuli to optic nerve spike trains, then the optic nerve's information rate would be equal to its entropy rate. However, in reality the same visual stimulus can evoke different spike trains. Therefore, there is something other than the neural code for the stimulus which is affecting the optic nerve; we'll call this "noise". We model this as __two__ signals being sent through the same channel; the neural code for the visual stimulus, and the noise signal. Information theory tells us that the entropy of the channel is the sum of the information rate and the noise entropy rate\footnote{where "noise entropy" is the entropy of a subset of the signal data in which the neural code is known to be always transmitting the same stimulus; that is, the conditional data} ($h = f + h_{noise}$).

So, we apply the entropy rate estimation procedure to the ensemble data, yielding $\hat{h}$, and we apply it again to the conditional data, yielding $\hat{h}_{noise}$. Then, we estimate $f$ as [[[$\hat{f} = \hat{h} - h_{noise}$.]]]

\begin{equation}h = f + h_{noise}\end{equation}

So, we have shown how we might estimate the entropy rate. I've also researched how to take temporal and inter-neuron correlations into account, but I'm out of space, so please ask me if you're interested.

\newpage __References__

Meister M, Berry MJ 2nd. The neural code of the retina. Neuron. Mar;22(3):435-50. (1999)

F Rieke, D Warland, R de Ruyter van Steveninck and W Bialek. Spikes: Exploring the Neural Code. (book; MIT Press, Cambridge, 1997).

S.P. Strong, R. R. de Ruyter van Steveninck, E. Bialek and R. Koberle. On the Application of Information Theory to Neural Spike Trains. Pacific Symposium on Biocomputing 3:619-630 (1998).

Wessel R, Koch C, Gabbiani F. Coding of time-varying electric field amplitude modulations in a wave-type electric fish. J Neurophysiol. Jun;75(6):2280-93. (1996) \end{document}