What are difference between Reinforcement Learning (RL) and Supervised Learning?
The main difference is to do with how "correct" or optimal results are learned:
In Supervised Learning, the learning model is presented with an input and desired output. It learns by example.
In Reinforcement Learning, the learning agent is presented with an environment and must guess correct output. Whilst it receives feedback on how good its guess was, it is never told the correct output (and in addition the feedback may be delayed). It learns by exploration, or trial and error.
Could we say RL has more difficulty in [finding] a stable solution?
No, because the types of problems it solves are usually different, you cannot compare like with like.
You could use RL as a framework to produce a predictive model for a dataset (using RL to solve a Supervised Learning problem). This would be inefficient, but it would work and there is no reason to expect it to be unstable.
In general, RL problems present additional challenges to supervised learning problems with the same degree of complexity in the relationship between input and output. They are "harder" in that sense, as more details need to be managed, there are more hyperparameters to tune.
RL can become unstable in ways that don't apply in supervised learning. For instance Q learning with neural network approximation tends to diverge, and needs special care (often experience replay is enough)
Could we say that [getting stuck] in [a] local minimum is seen more in supervised learning?
No. Internally, a RL agent will often use one kind of supervised learning algorithm to predict value functions. RL does not include any special features that can avoid or work around local minima. In addition, a RL agent can get stuck in ways that don't apply to supervised learning, for instance if an agent never discovers a high reward, it will create a policy that ignores entirely the possibility of getting to that reward.
Is this figure is correct saying that Supervised Learning is part of RL?
No. The figure is at best an over-simplified view of one of the ways you could describe relationships between the Supervised Learning, Contextual Bandits and Reinforcement Learning.
The figure is broadly correct in that you could use a Contextual Bandit solver as a framework to solve a Supervised Learning problem, and a RL solver as a framework to solve the other two. However, this is not necessary, and does not capture the relationships or differences between the three types of algorithm.
Otherwise the figure contains so much over-simplification and things that I would consider as errors (although this might just be missing context that explains the diagram), that I recommend you simply forget about trying to interpret it.