I think what you're seeing is normal behaviour:
With only few samples (like 2000) it's easy for a model to (over)fit the data - but it doesn't generalize well. So you get high training accuracy, but the model might not work well with new data (i.e. low validation/test accuracy).
As you add more samples (like 9000) it becomes harder for the model to fit the data - so you get a lower training accuracy, but the model will work better with new data (i.e. validation/test accuracy starts to rise).
As the training dataset increases, the training accuracy is supposed to decrease because more data is harder to fit well.
As the training dataset increases, the validation/test accuracy is supposed to increase as well since less overfitting means better generalization.
Andrew Ng has a video about learning curves. Note that he plots the error on the y axis, you have the accuracy on the y axis.. so the y axis is flipped.
Also take a look at the second half of the video. It explains high bias and high variance problems.
Your model seems to have high variance (due to the big "gap" between the two curves) - it's still too complex for the small amount of data you've got. Either getting more data or using a simpler model (or more regularization on the same model) might improve the results.