The 10 Commandments of Data Science projects

Shraddha
4 min readMar 28, 2019
Credits: Shraddha Surana. Purchase Info — https://www.shutterstock.com/g/Shraddha+Surana?rid=227303041&utm_medium=email&utm_source=ctrbreferral-link

Let’s get straight to the point:

ONE

Post Analysis — There is no point just communicating a number. You have to analyse the numbers your algorithm throws out. Tie it back to the domain. Work with the domain experts as to what the results mean.

E.g. What prices is your algorithm recommending? What does the curve (fitment) for these products look like? The algorithm recommends something, does it make sense? Does it raise any eyebrows on the results? (This can be both good if you have discovered something new.. Or it could mean you need to re-check your analysis). Can you as a human immediately put a finger on the graph as to where the price range should be?

How have you verified your results? Do you know what products your algorithm is able to recommend prices for (fish? Canned goods? Wine? chocolate?)? Does it intuitively make sense for those products?

TWO

Incorporate more features vs. add more data (or make the existing data more accurate). Analyse this after your initial draft (quick and dirty implementation) algorithm. And proceed accordingly. Don’t go down a rabbit hole that will lead you nowhere.

THREE

Use correct terminology. The problem I have with using terms in a wrong manner is that it gives the wrong impression, opens up space for confusion, invites ambiguity, showcases that you don’t have enough domain knowledge & opens up the stage for mis-interpretation. Hence, it is extremely important to use the correct terminology.

FOUR

People! Never forget the people aspect of the project. Not only the people on your team, but remember that the results are being communicated to people! Business people. Any basic course/ study will tell you that anything is better than tables! Make the best of algorithms but if you cannot convey the results to your customer then it is not of much use. Conveying to your customer is not : “These are the numbers”. It does not matter if the result is “In your face”, you have to spend time thinking about what is the best way to convey the results in a manner that is understandable by the customer. Pie charts; bar charts; histograms; get creative. But also use relevant graphs :) According to me this is part of a scientists job.

FIVE

Clean code principles apply. See my other post on it.

SIX

Use a project management software (like Jira/ Trello). For transparency — with client & within the team. For not stepping on each others toes (when you have a team of data scientists working together). For being accountable. For easy collaboration. For easy retrieval of information & decisions. For easy answerability of why we chose certain paths/ algorithms over others & where the team has spent their time. To understand with each piece of work, what value is it delivering.

SEVEN

Do literature survey. Don’t reinvent the wheel. Don’t spend hours figuring out a solution that already exists. Make use of existing implementations and solutions. Build on them. This way you take from the community & are able to give back too.

EIGHT

What does a number mean? How do your results compare with the industry result?

Eg.: It is not enough to say: “Your score is 172”.

What’s your reaction? Wouldn’t you want to know “172 out of?”; “What does 172 mean? Is it better than others? Is that the average? Is it a good score?”

Another example — Once upon a time in India, a class 10 student scoring 80% was fantastic! However, now most students score above 90% & above. I guess 98–99% must be a fantastic score these days. So the crucial piece of information here is the year of clearing 10th standard as well.

NINE

I believe in delivering value to the client. In some cases that need not be delivering the final model. If you are in a situation where you know that the model developed may/ may not work (due to various reasons — data quantity/ quality), what value have you delivered to the client after all the months of work? Have they got a better insight into their data? Can all the analysis that has been done be reused at a later point in time? When they are equipped with better data, can your code be reused to accelerate the model development at that future point in time? I see the above atleast of value. If that is in place there would be something the client would get, irrespective of the projects direction in the future.

TEN

Presentation matters. Period.

--

--

Shraddha

A data scientist &researcher, enjoys painting, crafts, dancing and dreaming