What are best practices for collaborative feature engineering?



I work in a large company on several data science projects. For each of the projects me and my colleagues construct features that have some predictive value for the specific target in that project.

Some project are similar in that they a predict something for the same kind of entity, for example customers or goods.

It would make sense to me to share features between projects that are about the same entity. Or at the least, make it easy to reuse features from another project. For example, in some project someone could construct the feature "customer since" which would indicate the number of years someone is a customer. In some other project some constructed a feature "estimated age" that is the outcome of some machine learning pipeline. In a third project I might want to use both features.

What are best practices in sharing these features? Should I share code or materialized outcomes? Are there packages that aid this process? How does your company solve this?


Posted 2018-02-20T21:50:46.160

Reputation: 941

No answers