I've seen both of these techniques be used for image search. One difference I can think of is that autoencoders don't rely on labeled data. I'm not sure, but it seems logical therefore that they can possibly generate more discriminatory dimensions for the final vector-representation, given that you're no longer bound by the classifications from the labels.
For my particular problem, I have labeled data, which is why I'm stuck between the 2. This is for non-CNN autoencoders.
What are the advantages/disadvantages of using Autoencoders over CNNs for image search?