Wednesday, June 17, 2015

On Names and Naming #1: Cheat Sheet on Criteria for Assessing Data Efficacy


 This is the first in my series on the accuracy, connotations and denotations of terms/expressions widely known and used in the global society of data management for social change and development. 
*

FORMULA:  Data Efficacy = Data Integrity + Data Actionability

Data Integrity = Reliability + Validity + Comprehensiveness

Therefore Data Efficacy = [ Reliability + Validity + Comprehensiveness] + Actionability.

Reliability: Is what is referred to as the dataset consistently the same?

Validity: Is the dataset accurate?

Comprehensiveness: Are the constituent elements of the dataset sufficient enough to solve the problem?

Actionability: Is the dataset in a form that facilitates quick and productive use of it for your purpose? If not, could the dataset be easily rendered in a format that will facilitate quick and productive use of it for your purpose?
*
DEMO
You have observed that a lot of the citizens in your host town lack vitamin C. An easy fix for that is lemonade. You need lemons to make lemonade. Your host says “no problem! There are lots of lemons in my sister’s orchard about 30 miles away. We will go there next week.”  You said “sure, but in the interim, could you describe for me what a lemon is? I want to make sure we are referring to the same thing.” She does.

Imagine the lemons in this scenario as your dataset.  Now let’s assess its efficacy.

Reliability: You ask the host on several occasions to describe to you what a lemon is. Every time, she gives you the same description. You then ask her neighbors to describe to you what a lemon is. They too give you the same or very similar description. Conclusion: The dataset is reliable.

Validity: Let’s skip this for a minute.

Comprehensiveness: You calculated that you need 100,000 lemons to make the volume of lemonade needed for the amount of vitamin C required. You ask how many lemon trees there are in the orchard and how many lemons each tree produces. They tell you there are 100 trees and that each tree bears 1000 lemons. Conclusion: The dataset is comprehensive.

Actionability:  Time’s short. You need to get the lemons prepared quickly for you to make the lemonade. Your host says “no problem. I will call and tell them to squeeze all the lemons into buckets so that they will have the lemon juice ready when we get there.” On the day of your trip to the orchard they called to tell you that the buckets of lemon juice are ready. Conclusion: The dataset is comprehensive.

[Now back to] Validity: Are the lemons really lemons? When you arrived at the orchard, you were warmly greeted and showed with great pride and enthusiasm the buckets of lemon juice. You looked at the trees and stared into the buckets. They were not lemons. They were clementines. That’s what your hosts call lemons. Clementines. What you know as lemons they call lime. 

No comments:

Post a Comment