Commercial API Q: is there an api for converting vision tags into a caption?


There are many machine learning api for scanning images but they just return a bunch of tags.

{ "tags": [ "train", "platform", "station", "building", "indoor", "subway", "track", "walking", "waiting", "pulling", "board", "people", "man", "luggage", "standing", "holding", "large", "woman", "yellow", "suitcase" ],  "confidence": 0.833099365 } ] }

Are there any apis for combining these into a sentence? MS Cognitive Vision is the only one that produces a full caption

"captions": [ { "text": "people waiting at a train station",

Google sentiment analysis can split a sentence into grammar parts but is there any api that does the reverse?

INPUT: "train", "platform", "station", "building", "indoor", "subway", "track", "walking", "waiting", "pulling", "board", "people", "man", "luggage", "standing", "holding", "large", "woman", "yellow", "suitcase"

OUTPUT: "people waiting at a train station"


