What is non-classicality?

I'm not sure if there's a universally accepted definition, but the way that I'd define it is: if all possible outcomes of experiments on a particular quantum system can be described by a probability distribution, then the system is classical. Otherwise, it is non-classical. In alternative terminology, for a classical system, people say that there's a (local) hidden variable model that explains the experimental outcomes.

A trivial example is a diagonal density matrix when measured in the computational basis. The diagonal elements just give the probabilities of the different outcomes, so the state is classical.

What is negative probability?

This is rather loose terminology. For a true probability distribution (in the discrete setting, a set $\{p_i\}$ such that $p_i\geq 0$ and $\sum_ip_i=1$) never contains negative probabilities by definition.

You only get "negative probability" in some quasi-probability distributions, and so it should probably be called "negative quasi-probability" to avoid misunderstandings. As stated in the question, this is one way of detecting non-classicality. That leads us to...

What is quasi-probability?

(which may be what you're meaning by pseudoprobability). These are distributions that behave a lot like probabilities in many ways, but relax at least one of the constraints in the definition, usually the non-negativity of the elements. According to Wikipedia, any density matrix can be written as a diagonal matrix using an over-complete basis. Those diagonal elements then form a quasi-probability distribution - some of the elements can be negative.

What is (non)-contextuality?

Contextuality is another test that can be used to prove the non-classicality of a quantum system. This is a substantial topic that I'm not inclined to address in answer to small part of a question. You probably want to start finding out about the Kochen-Specker Theorem.

It is worth noting that Bell tests, such as the CHSH test, can be considered as contextuality tests, they're just made a little simpler because they're supplemented with some extra information about non-locality between certain measurement operators, ensuring their commutation. So, with CHSH, you evaluate some expectation value $S$. If $|S|\leq 2$, the state is classical, while if $|S|>2$, it cannot be explained by a local hidden variable model; the state is non-classical.