Does TF-IDF force a normal distribution?

After my recent Powershell/Accord K-Means text classifier attempt I wondered about the distribution of the data which is word character bigrams (no spaces or punctuation) featurized by a self-made TF-IDF function and later normalized.

Tonight I decided to plot two dimensions the featurized data to get an idea of how the values are distributed.

  • What I got was unexpected as there are distint sloped lines no matter which two (present) dimensions I compare
  • I have the non-normalized and normalized vectors in different vars, so I looked to see how they differ; they don’t
  • I compared the data side-by-side, and both are the same values

Ok, that shouldn’t be possible. I think I must have changed the original array by reference or did something else wrong in my code. Ok, definitely I inadvertently altered the original array because otherwise it wouldn’t have negative numbers.

(The next day:) The pre-normalized data has the same patterns, although I now think the 2d patterns may be an artifact not only of tf-idf but perhaps moreso that I’m plotting at a point resolution of 100x100 or effectively much worse. My original intent for graphing the distribution was to see how the data is distributed, so I should back up and simply do some distribution plots.