I'm trying to dive into deep learning for tasks on images, and trying to figure out how to reuse some well-known structures* that have been published, mainly on github.
( *Here, structure can be replaced by here by one or more of the concepts used hereafter.)
But while reading articles, blog posts and watching videos or paper presentations on deep learning, especially about ConvNets and researches applied on images (classification, object detection, semantic segmentation or scene understanding) I'm struggling with these concepts; backbones, frontends, models, networks, and architectures.
For me, they are almost interchangeable, except maybe for the model, which is according to my current knowledge, the resulting weights matrix of the learning process (which undoubtedly has to be associated with the network used for this training phase).
I would be very kind if someone can define these concepts and explain their differences thoroughly and rigorously (maybe with links to papers if they have been commonly accepted in the scientific literature).