i'm not too clear on this, but apparently when used for unsupervised clustering assumes a small number of topics. What are some corresponding distribution and inference systems that assume a long tail? After all, if you were eg categorizing books on Amazon, they would probably follow some heavy-tailed distribution (esp if books were weighted by sales); most book sales (and probably most books) would probably be about a few common topics, but there would be a few books/sales about esoteric yet distinct topics eg dirichlet distributions.