I'm in the middle of a project in which I want to generate a TV series script (characters answering to each other, scene by scene) using SOTA models, and I need some guidance to simplify my architecture.
My current intuition is as follows: for a given character C1, I have pairs of sentences from the original scripts where C1 answers other characters, for example, C2 (C2->C1). These are used to fine-tune a data-driven chatbot. At inference time, the different chatbots simply answer each other, and, hopefully, the conversation will have some sense.
This is however unpractical and will be kind of a mess with many characters, especially if I use heavy models.
Is there an architecture out there that could be used for conversational purposes, which could be trained only once with the whole dataset while separating the different characters?
I'm open to any ideas!