7

2

For a task on sentiment analysis, suppose we have some classes represented by $c$ and features $i$.

We can represent the conditional probability of each class as: $$P(c | w_i) = \frac{P(w_i|c) \cdot P(c)}{P(w_i)}$$ where $w_i$ represents each feature and $c$ is the class we have. Then empirically, we can represent $$P(w_i|c) = \frac{n_{ci}}{n_c}$$ $$P(w_i) = \frac{n_{i}}{n}$$ Our priors for each classes are then given by: $$P(c) = \frac{n_c}{n}$$ where:

$n$ is the total number of features in all classes.

$n_{ci}$ represents the number of counts of that feature $i$ in class $c$.

$n_c$ is the total number of features for the class, and

$n_i$ is the total number of features for all classes.

Is my understanding of the above correct? So given these $P(c|w_i)$ probabilities for each word, I'm the naive bayes assumption is that the words are independent, so I simply multiply each word in a document for a certain class, i.e. to compute $\prod P(c|w_i), i \in N$ where $N$ is the number of words in the document. Is this correct?

To actually compute the conditional probability numerically, would it suffice to do the following:

$$P(c | w_i) = \frac{P(w_i|c) \cdot P(c)}{P(w_i)} = \frac{n_{ci}}{n_c} \cdot \frac{n_c}{n}\cdot \frac{n}{n_i} = \frac{n_{ci}}{n_i}$$

The last part of the equation looks a bit suspicious to me as it seems way too simple to compute for a rather complex probability.

Thanks for your answer. In actual programs, why can't this result be achieved? I have seen many implementations of naive bayes and none of those go directly to compute $n_{ci}$ a word. – user19241256 – 2018-01-24T21:34:57.133

not sure I understand the question... in some form or another it would come down to counting. can you give an example? – oW_ – 2018-01-24T22:32:29.247