There is a good chance that your DQN is already optimized but you would have to take a look at its performance to really check and see whether its actions seem up to par.
Reasons it may not be optimized: If you are tracking the reward after every episode, unstable rewards are very common just due to random chance, but if you are averaging the reward over the past 50 episodes or so, then it may also be your learning rate or epsilon.
If your learning rate is too high or low you may either never be able to reach a fully optimized DQN or be stuck in a local minimum. An easy way to solve a problem like this would be to just add a simple learning rate decay, so that the learning rate would start off high as to not get stuck in local minima yet decay to a small enough number where you know that the agent has found the global minimum.
The other problem could be that your epsilon may be too high or low. A high epsilon will never allow the agent to fully optimize while a low epsilon doesn't allow the agent to explore and discover better strategies, so it may be a good idea to mess around with this as well.
The only way to really gauge the agent's performance would be to look at it making some decisions through a video or by analyzing some of its predictions. And, if it seems to be doing well, then it may very well be optimized, but, if the agent is not performing as well as it should, then it may be a good idea to try out some of the strategies above.