## Notation for features (general notation for continuous and discrete random variables)

2

I'm looking for the right notation for features from different types. Let us say that my samples as $$m$$ features that can be modeled with $$X_1,...,X_m$$. The features Don't share the same distribution (i.e. some categorical, some numerical, etc.). Therefore, while $$X_i$$ might be a continuous random variable, $$X_j$$ could be a discrete random variable.

Now, given a data sample $$x=(x_1,...,x_m)$$, I want to talk about the probability, for example, $$P(X_k=x_k). But $$X_k$$ might be a continuous variable (i.e. the height of a person). Therefore, $$P(X_k=x_k)$$ will always be zero. However, it can also be a discrete variable (i.e. categorical feature or number of kids).

I'm looking for a notation that is equivalent to $$P(X_k=x_k)$$ but can work for both continuous and discrete random variables.

0

As far as I am concerned, there is no distinction between a continuous and a discrete variable when it comes to notation. So $$P(X_k=x_k)$$ is perfectly fine for either.

To my knowledge, if $X$ is a continuous variable then for each constant $c$, $P(X=c)=0$. Instead of constants, we should talk about intervals and measure the probability using a probability density function. – Yael M – 2020-05-27T11:46:43.813

0

Maybe relying on set notation would work?

$$P(X_k \in s_k)$$ where:

• $$s_k = \{ x_k \}$$ if $$X_k$$ is discrete
• $$s_k = [ x_k-\epsilon , x_k+\epsilon]$$ if $$X_k$$ is continuous