Skip to content Skip to sidebar Skip to footer

Sklearn How To Get The 10 Words From Each Topic

I want to get the top 10 frequency of words from each topic, and after I use TfidfTransformer, I get: and the type is scipy.sparse.csr.csr_matrix But I don't know how to get the hi

Solution 1:

You can use the TfidfVectorizer to expose the get_feature_names method. The transformer doesn't have this method, but the docs clearly state that the Vectorizer is equivalent to CountVectorizer followed by the transformer. If you don't want to use this, then I think you're going to be stuck building a lookup before you vectorize.

TfidfVectorizer in the docs: https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html

Edit: to sort and slice the output of fit_transform from the TfidfVectorizer normal sparse matrix operations should work.

Post a Comment for "Sklearn How To Get The 10 Words From Each Topic"