Monday, September 28, 2015

Is it Data or Information or Knowledge or Wisdom? Is a Tomato a Fruit or a Vegetable?

Does it really matter?
That’s what I was left wondering at the close of each of the past six or so conferences/events this year that I have attended and participated in, variably on data management (mostly open data) for civic engagement and global sustainable development.

This perplexing problem of accurate nomenclature is not of mere pedantic preoccupation by university academicians. It is in many ways a significant contributing factor to the enduring and arguably enlarging silos and fissures between and among open data practitioners and advocates, information and records managers, and national statisticians/census managers. They have long informed project priorities, design and implementation, and oftentimes loggerheaded shortsightedness that result in inefficiencies such as project duplication across several siloed MDAs (ministries, departments, agencies). 

The usual result? Failure to achieve objectives, with huge financial and human resources price tags that government (and even intergovernmental sponsors/partners) could ill-afford.

Much to chew on here. In fact, too much for here. So “since brevity is the soul of wit… [let’s] be brief” and cut to the chase by cutting through the fat to the meat. Or tofu. See, one may well attribute this whole DIKW pyramid craze to the seemingly algorithmic simplicity that makes it readily endearing to data, information and knowledge practitioners/managers of all stripes. My response? Much thanks to David Weinberger whose superb piece in the Harvard Business Review says it all for me. Or nearly all: Bunkum!

Before we talk about data versus information, let’s dispense first with knowledge and wisdom. While these are important concepts or constructs that relate to data and in at least one case (knowledge) needed to process it, data does not (as I argue further down) need any of them to exist.

Now to data versus information. First, it is settled that it is singular, no longer plural. Goodbye datum!           Second, please don’t say raw data. It is irksomely tautological. It’s like saying PIN number.
 
Which brings us to the third and most important point: Data is neutral. It is amorphously nothing and everything. It only takes precise shape and meaning upon first contact with us: You. Me. How? Let’s…
*
Demo
So your friend shared an article by a celebrity chef on tomato being a fruit and on the highly nutritional benefits of including it in fruit salads. Your friend expressed their opinion by quoting Miles Kington. You liked it so much (the quote that is) that you shared it on Facebook and Twitter, along with a photoshopped salad to convey your own opinion, with a link to the article.
 
Okay. Now. See, at that exact moment when you received the friend’s article with their comment, it was not information to you. It was just data. It only became information when you reacted to it by inferring from it your friend’s reaction to it. It also became information when you manipulated that data by sharing it with others. Though you expressed your opinion and “like” of it with your photoshop, it still would have been manipulation had you only shared your friend’s piece “as is”; that is hook, line and sinker, with zero alteration or modification (pardon the tautology there). The very act of you sharing it transformed it from data to information.

And to access that data, to react to it, to interpret it as desired or differently, to modify it, and to share that information, you need all sorts of ability, know-how, too many to list here, way too many. That ability, that know-how, is called knowledge.


But what you received from your friend at that very moment when you opened it and before you even photoshopped and shared it, that data, was information. Again, though your friend expressed their opinion and “like" or "dislike” of it with the accompanying quote of Milton Kington, it still would have been manipulation had they only shared the piece “as is”. The very act of they sharing it rendered it information.

And what your friend who shared it with you received at first contact, the data, was information to the one who shared it and to the one before and before, and before, and before, and so on and so forth back till the Big Bang or the Creation, and each to whom was data at first contact. And to access that data, to react to it, to interpret it as desired or differently, to modify it, and to share that information, each needed all sorts of ability, know-how, too many to list here, way too many. That ability, that know-how, is called knowledge.

It goes without saying—so why am I saying it?—that the identity remains the same for the interactions and processes after you. And what your friend with who whom you shared your information received at first contact was data, which became information when they shared it, which became data  at first contact to the one with whom they shared it, the one with whom they in turn shared it and to the one after and after, and after, and after, and so on and so forth till apocalypse or nuclear annihilation. And to access that data, to react to it, to interpret it as desired or differently, to modify it, and to share that information, each would need all sorts of ability, know-how, too many to list here, way too many. That ability…OKOK I’ll stop. You get it!



So let’s end with an answer to a paraphrase of Marcus Aurelius’ philosophical question, made famous by that classic scene in Silence of the Lambs:  

Question: “Simplicity Clarice! Read Marcus Aurelius. “What is it this thing that we [call data/information/knowledge/wisdom]?”

Answer:  Data. We call it data.

It’s the only unchangeable referent, the only denotation guaranteed to clearly and consistently convey and receive the same connotation. Irrespective of sender, source or intent.

Data.
Simple.

*
Oh! Oh! And wisdom? Where does it fit in? Well, you probably guessed it by now with this whole tomato article setup thingy and this whole Miles Kington quote thingy. So let’s get to it and get on with it, shall we?


Knowledge is knowing that tomato is a fruit. Wisdom is not putting it in a fruit salad.

Tuesday, September 15, 2015

Open Letter From Open Data: I am Your Most Valuable Extractive and I am Open for Business


Dear Government

I am your country’s most valuable mineral resource. Better than your bauxite. Manganese. Copper. Your gold even. These swell your coffers, for sure. They have swollen them for a long time now. But few have they enriched. And among those farthing few, quite a few filthy rich at that. And many have they impoverished. And of them, many made dirt poor. And many more have they in their wake left dead. And many more enslaved. Tortured. Raped.

I am your country’s most valuable mineral resource. Better than your diamonds even.  Svelte diamonds may be. But they cut a very rough figure to my finesse. My suave.

A man’s best friend? Yes. But who boasts of a friend with blood in their hands? Literally blood in their hands. From blood from kids’ hands......



 Dear Mother Earth

Let truth be said so that you shall be set free:  You have never been that willing a partner in "progress." Not under the much-to-be-desired stewardship of Excellencies Messrs Drs. President and Prime Minister since independence. Those other extractives, they are as expectant for extrication as is a patient for tooth extraction. They are as enamored by the advances of the drills and dredges that disembowel you, Mother Earth, as you are by those of T-Rex a month without food. Right at that moment when you become breakfast. Those emaciated, denuded, scarified, scary-looking blotches and blights that replace the once lush, verdant landscape that defined you? That, my lass, alas, is you.

But I’m here, Earth dearest. Unlike the other extractives who are co-victims made hapless co-conspirators in your plight and plunder, I am your watchful, vigilant protector. I am your salve, your savior. The more they mine me, the more cures I reveal to heal you. To revitalize you. To rejuvenate you....

Dear Bounty Hunter

I am low-hanging fruit. Pluck me! No, strike that. I am diamond in the sky. Heck! I  am diamond that rains from the sky. Sing my praises! I am the chocolate factory yes. But one better. The product of many wonks, I am at your wink and will pliable, malleable. I am one hundred percent pure power and pleasure, no pain. No guilt. No toothache. No headache.

No nightmare either. Nothing but a lean, mean thoroughbred. All Triple Crown at the races in infinite loop. And you are the jockey! I wake you up from the stupor of illusory growth to the vigor of vital development. I bring peace to war. Knowledge to ignorance. Growth to stagnation. Transparency to kleptocracy. Democracy to oligarchy. Equality to discrimination..

Dear CSOs, Program Sponsors, Implementing Partners, InterGovs...

I am conundrum gift-wrapped in enigma. I am a puzzle solved by a riddle. Ridiculous how I relish that the more you take, the more I have and love to give. Tap [into] me half-empty down and I replenish four times full. Amazing grace indeed that the more you dig and probe and excise me, the more wholesome I become. The more you mine me, the bigger I become. The better I become. The healthier citizens become. The longer they live. The smarter they get. The richer they become.

The happier we all become...

Dear world
I am the most desirable extractive in the world. 

I am open data. And I am open for business.  

Tuesday, July 28, 2015

Situating Open Data Within The Bigger Picture of the Global Data Economy

We make a stronger case for open data when we situate our argument within the larger picture of the global data economy, and then place developing countries within this economy to make the case how and why effective data management is the most efficient engine of sustainable growth and development. 


Global Data Economy Map: Bah 2015


The global data economy comprises seven commercial data brokerage industries.  They are those involved with:    

1.   Data creation: All sectors of society, particularly government, create data in their day-to-day activities. People Data is what Steve Adler calls the non-government aspects of this process. The creation may be passive or active, known and unknown, local or international—e.g. social media applications widely used in developing countries, such as Facebook, Twitter, WhatsApp and Viber.

2.   Data collection: Companies, in concert with government ministries, departments and agencies (MDAs), non-profits and other companies, provide products and/or services geared toward facilitating efficient data collection. In other words, in a way that would indicate the data’s level of integrity (i.e. reliability, validity and comprehensiveness) and actionability (i.e. in formats that facilitate quick and effective use of the data for their intended purposes).

3.      Data storage:  Companies that provide hardware and software products and services to facilitate data storage to MDAs, businesses and non-profits . Cloud computing services typically fit here.

4.     Data sharing:  Companies and MDAs that provide products and services to facilitate data sharing.

5.   Data archiving :  Companies, non-profits and MDAs that provide products and services for the archiving of government and business data, per statutory requirements.

6.   Data destruction : Companies, non-profits and MDAs that provide products and services for the destruction of government, non-profit and business data per statutory requirements. E.g. Paper-shredding products and multi-billion-dollar data/document shredding companies such as Iron Mountain

7.   Data safety & security: Companies that provide products and services to individuals, MDAs and businesses to verify data integrity, prevent data breach, and/or prevent or mitigate harm borne of data breach (e.g. identity theft and ransomewaring). This category serves as the fulcrum to the rest. Some of the major big businesses in this sector are LifeLock and the three major credit bureaus in the United States (i.e. Equifax, Experian and TransUnion) with many subsidiaries overseas. A significant revenue source of these companies is now the various credit-monitoring services that they provide to individuals and companies.

Admittedly, the categorization above of the trillion-dollar global data economy is simplistic, serving merely to facilitate description and comprehension. In reality, the vast majority of the companies fall in more than one category, depend on one another to thrive and in many cases serve as B2Bs to one-another.
 *
Risk assessment would be the two-word expression that best describes what dictates what we do, with whom, when, where, how, why.  Just like the economy. The greater the perceived risk, the less inclined we tend to be to take such risk. Or when we do, the higher ROI margin we demand. And how do we well assess (the) risk? With data that pass the efficacy test. In other words, with data that are valid, reliable, comprehensive and actionable. Reflect a minute on the compendium of regulation, policies, procedures and institutions governing the financial services industry in developed countries; almost all of them deal with data management in one form or the other, and all geared toward assessing investment risk. Of the individual (e.g. individual credit reference bureaus), the company (e.g. the various stock exchanges), the institution and the state (e.g. credit ratings agencies such as Fitch, Moody’s and Standard & Poor’s).

Impressive economic growth worldwide, attributable in no small measure to the BRIC, MINT and UAE economies, leaves the world flush with financial capital in need of ventures, mostly in developing countries, and mostly in Africa South of the Sahara. “China’s World Bank” all but guarantees this to remain so  in the foreseeable future. What is preventing infusion of private investment capital into these regions is not the purse-holders’ inability to see the huge market potential. Nor the erstwhile inhibitors that cluster around the phrase “inadequate infrastructure"; we are now so far technologically advanced that companies from any part of the world could offer well-heeled individuals living in any other part of the world the wherewithal to live off the oft dysfunctional government grid.  Think solar panels for electricity, boreholes for water, and Facebook’s drones for internet connectivity.

Nor is the problem investors’ inability to accurately assess risk. Rather, it is their inability to access efficacious data on, within and from these countries to enable them to estimate and monitor at their own level of confidence, the (potential) risks associated with their (potential) investment. We’re talking here, for example, about open data such as court records to show liens and other forms of encumbrances by lending agencies to protect their investment. We are also talking about shared data such as individual credit reports to help lending institutions assess individuals’ likelihood of defaulting. 

My point is, there’s a lot of data-management-related activities going on in developing countries. These activities span all scope and types of data--open, shared, closed and permutations between and betwixt. We make a stronger case for open data when we situate our argument within the larger picture of the global data economy, and then place developing countries within this economy to make the case how and why effective data management is the most efficient engine of growth and development.

I list four benefits of a comprehensive national data management blueprint geared toward optimal competitiveness in the global data economy:

1.      Generation of consistent revenue flow for the country’s coffers from both international and local sources. The government of Sierra Leone is a typical example of developing countries currently losing tens of of millions of dollars annually due to lack of effective national data management regulation, policies, procedures and practices. One main revenue source is “copy fees” typically levied by MDAs when fulfilling FOIA requests. Another is licensing fees paid to MDAs by legitimate businesses such as insurance agencies, for access to non-open data such as driving records.

2.      Support and growth of local innovation and entrepreneurship in data brokerage.

3.      Job and overall economic growth, borne primarily of #2 above and of making the country much more business-friendly for local, regional and international investors.

4.      Greater transparency and good governance.
 *
Something to consider.


Thursday, July 9, 2015

On Names and Naming #2: Problems with Open Data



This is the second in my series on the accuracy, connotations and denotations of terms/expressions widely known and used in the global society of data management for social change and development. Here is the first.
*

    My impressions so far with the reactions to the term Open Data of people across diverse sectors of the global community of data management for social change and development.

#1. Data is scary
Say “data” and people hear a jamboree of geeks such as statisticians, economists, computer scientists, number-crunchers and evil hackers.  

#2. Data is exclusive and exclusionary
As a result, the knee-jerk reaction of the significantly larger majority of those in the global community of data management for social change and development who do not see themselves as geeks is “Not interested. That’s not my thing.”

#3. Open Data is scarier
Open Data projects a global community of hacktivists who manipulate data to portray in a bad light (in)famous people such as politicians, corporatists, celebrities, barons and cartels. To these “embattled victims”, ethical and legal obligations governing data integrity, access and use, or those relating to balance and fairness or defamation matter little to not at all to the hacktivists.

#4. Open Data is adversarial
As a result, those of the government sector (whose support and participation are critical, especially in developing countries) are wary about the “true intent” of program sponsors and implementing partners. They tend to see them as tools enabling their opponents to cause trouble and remove them from office, rather than effective tools for good governance, sustainable development and local innovation and entrepreneurship. It does not help at all that quite a few of them have a foggy knowledge of this dreaded alien thing called Open Data.

#5. Open Data is (only) digital data accessible online
This view is doubtless informed by the history in developed countries of the collection of citizens’ data, first by government, then by corporations, currently by both in concert. Citizens’ apprehension of technological, economic and political developments and activities on these continue to influence their relentless advocacy on information(al) access, privacy and security.  In this information society (more accurately data society), practically all data are created, collected, stored, shared, archived and/or destroyed digitally.

Therefore for those in developed countries, the focus on Open Data is not at all data; this is assumed to be digital. Nor is it much on access; many laws guarantee them that. Laws such as court/open public records acts, open meetings/sunshine acts, freedom of information acts and public domain provisions in copyright and patent laws, to name but a few. More precisely, the focus is on making open data actionable to enable citizens to advocate for a government that is more transparent, answerable and responsive to the needs, wants and desires of its citizenry.

This belief influences the design and implementation strategies of most of the open data programs in developing countries. But with two distinct differences.  First, developing countries need to be open-data ready—in fact, readiness is a key indicator in the prestigious global open data barometer study conducted annually by the Open Data Research Network. And for this to happen, much faith is placed on government as the host and primary actor in the creation of laws and institutions similar if not identical to those in developed countries.

Secondly, programs that use Open Data to solve problems relating to health, education, agriculture, poverty alleviation, environmental protection and the like are actively encouraged and supported.  The activities of GODAN readily come to mind.

But as I, like a growing number of others, continue to point out, our understanding of data and of open data needs to be expanded to embrace the reality of the nature and magnitude of data management (i.e. from collection to destruction) in developing countries.

#6. Open Data is a privilege
A significant number (though not the majority) of Open Data advocates in developed countries tend to think that Open Data issues are negligibly applicable to developing countries. “They have more pressing issues to deal with” is the spoken and unspoken belief. Pressing issues like access to basic needs such as safe drinking water, reliable electricity supply, K-12 education, health, gender equality, individual and public safety & security, etc.

This view is easily debunked by countless examples on the ground that show the laudable extent to which Open Data is used to effectively address said pressing issues. Suffice it to say that this perception is strongly undergirded by that of #5 above.

Evidently, with the exception of the few above-noted, all of these beliefs are far from the reality. But then again we tend to be driven more by our perceptions than by the truth. It would help a lot to explore ways of addressing these misperceptions in our workshops, symposia, conferences and the like. It would help in equal measure to bear these (mis)perceptions in mind when we work on policies and programs geared primarily to those in developing countries.

Sunday, June 21, 2015

The Second Coming: On Data Colonialism and the Rush to the New Frontier


The Ebola epidemic wrought untold damage and suffering to the people, politics and economics of Guinea, Sierra Leone and Liberia. It still does.  And will for some time into the future. But the Ebola epidemic was a big economic boon for a whole lot of businesses outside of Guinea, Sierra Leone and Liberia. Outside of Africa even. And mostly in the West. It still is.  And will for some time into the future.

And no sector of the economy in the West benefited more than data brokers. How? Let’s go to the Ebola Open Data Jam. This Washington, D.C. event was designed to be held simultaneously with others in New York, Winchester (UK), Kampala (Uganda), Monrovia (Liberia) and Freetown (Sierra Leone). I know that Freetown wasn’t able to jam with us. I don’t know if Kampala and Monrovia were able to join the party, but if they did, it was nowhere close to those of us in the West, especially in D.C., which pulled no stops to make the event the phenomenal success that it was. There were many reasons for this lopsided attendance, most of which are the usual suspects, namely the intractable problems of unreliable internet connectivity and electricity supply, and (for the case of Freetown and Monrovia, where Ebola was raging) the wisdom of staying away from any contact with any other person.




The DC event, held February 21, 2015, was part of the annual Washington, DC Open Data Day Hackathon. This Saturday event at the World Bank headquarters was jam-packed: King Kong the raging abominable snowman got nothing on us! Practically all members of the major sponsors were there, including World Bank, USAID, WHO, CDC and a much larger contingent from private companies big and small.

The goal of course was well intentioned: To capture online data available in the public domain or with Creative Commons license, render them in actionable formats and make them available to anyone (especially policymakers in Africa) who wants to use them to combat Ebola. As I mentioned in an earlier entry, all the data jammers present could be placed into two categories, namely the for-profiters (i.e. the commercial data brokers) and the rest of us.

Let’s start with the commercial data brokers at the event, mostly highly innovative small-business entrepreneurs with lots of IT savvy. They openly and actively participated in the jam like the rest of us. Some even co-sponsored the event. But their purpose was devoid of any scheme or desire to help African leaders govern better and more responsibly. Or to help eradicate Ebola.  Their goal was simple. They saw in Ebola an opportunity to make money and all their engagements and interactions with any form of data (opened, shared or otherwise) were solely and ultimately directed to that end.

Thus, they collected any and all forms and sources of open data that they could at the event. 
But perhaps much more important to my argument, they later collected data from these African countries, probably without the knowledge or consent of their leaders. How? 

I would use one particular company, which shall remain nameless, as an example. This company reached out to one of the two major mobile carriers in one of the three most affected countries. This carrier allowed them to access and download all the numbers of all their two million-plus subscribers. Using text-based surveys and SMS geotagging, the company was able to accurately identify Ebola hotspots in real-time. It then sold such data on a subscription basis to program sponsors for a very, very handsome profit.

Clearly this example raises a lot troubling questions relating to individual informational security and privacy, as well as national security. But much more relevant to my argument is that this data broker, like hundreds of its kind sprouting all over the world, and mostly in the West, recognizes and is taking advantage of a huge market opportunity in the exploitation of data on, in and from developing countries, mostly in Africa. And there’s nothing most African governments and quite a few in Asia, the Caribbean and Latin America, can at present do about this data colonization.  The magnitude of this colonization could be gleaned from the list of sponsors at the Third International Open Data Conference in Ottawa, Canada. Quite a few were commercial data brokers who demoed their services there, which in most cases were comprised of data collected from developing countries, thanks to cutting-edge data-management hardware and software tools.

For the rest of us at the Ebola data jam in Washington DC on a very snow-heavy Saturday February 21, 2015, our actions were arguably informed by the following assumptions:

1.      That a significant reason why Ebola was raging out of control in the three most affected countries was because of insufficient or no efficacious data on such basic things as number of hospitals and number of beds per hospital.

2.      That if we made said data available directly to policymakers or through program sponsors or implementing partners, they will readily and effectively use them to successfully combat the epidemic.

These assumptions were troubling to say the least. They were stripped of all the political, economic and sociocultural realities not only on the ground but also, it must be said, within some program sponsors themselves, none the least being World Health Organization.  But let us go back and critically examine the aforementioned assumptions.

1.         That a significant reason why Ebola was raging out of control in the three most affected   countries was because of insufficient or no efficacious data on such basic things as number of hospitals and number of beds per hospital.

·         If this is true, why? To what extent might this be due to bad governance, specifically  chronically poor records management by its leaders?

·         Is this in fact true, given that we here in the West (and anyone anywhere else with Internet connectivity) could access the same data that we did?

·         Okay, one may well counter that the problem is not with access to the data per se, but with the technical know-how to reformulate and (re)package them to enable effective use by policymakers. But to what extent would this be true? Examples abound of highly qualified programmers and other IT specialists, most citizens, living and working in the Ebola-affected countries.

2.         That if we made the data available directly to policymakers or through   program sponsors or implementing partners, they will readily and effectively use them to successfully combat the epidemic.

·         Again, assuming this is true, and that the Ebola epidemic was due significantly to poor records management by leaders of the affected countries, isn’t our faith woefully misplaced that the same leaders would put the data to their intended use?

The assumptions underlying these well intentioned and well-funded activities to a large extent inform the very high priorities that program sponsors place on such Open Data-related programs in developing countries as Open Government initiatives and Open Portal initiatives. They reflect in short my observation that data in and on developing countries are not seen as valuable assets in and of themselves. That they are seen merely as tools/indicators of good governance, health, agriculture, education, etc. As a result, program sponsors, instead of helping support the nascent but struggling commercial data brokerage industries in these countries, spend an inordinate amount of resources on the establishment of government institutions geared primarily towards transparency, accountability and good governance.

While these activities are doubtless laudable, their success is doubtful, for many reasons. Among these are the fact that their implementation is disproportionately and ill-advisedly placed in the government sector, where institutions newly established to implement the programs are often staffed with unqualified cronies well paid from donor funds. Said institutions atrophy when funds run out and/or when new political parties come to power, in effect robbing them of the continuity critically required for the programs to succeed and thrive.

The inclusion of non-government actors in this game plan rarely go beyond engagement with local activists and similar civil society members. Doubtless, better recognition and support are needed of local private-sector actors who are more (if not entirely) concerned with making money within the local commercial data brokerage industry, but who, in so doing, are much more likely to help engender the responsible governments that by any indicator we desire, but with much less resources and much more bang for our taxpayer bucks.

Clearly, we need a new paradigm of data management for social change and development. Hopefully there will be room for greater discussion of this in future open data conferences, most significantly the Africa Open Data conference in Dar-Es-Salaam, Tanzania.

Wednesday, June 17, 2015

On Names and Naming #1: Cheat Sheet on Criteria for Assessing Data Efficacy


 This is the first in my series on the accuracy, connotations and denotations of terms/expressions widely known and used in the global society of data management for social change and development. 
*

FORMULA:  Data Efficacy = Data Integrity + Data Actionability

Data Integrity = Reliability + Validity + Comprehensiveness

Therefore Data Efficacy = [ Reliability + Validity + Comprehensiveness] + Actionability.

Reliability: Is what is referred to as the dataset consistently the same?

Validity: Is the dataset accurate?

Comprehensiveness: Are the constituent elements of the dataset sufficient enough to solve the problem?

Actionability: Is the dataset in a form that facilitates quick and productive use of it for your purpose? If not, could the dataset be easily rendered in a format that will facilitate quick and productive use of it for your purpose?
*
DEMO
You have observed that a lot of the citizens in your host town lack vitamin C. An easy fix for that is lemonade. You need lemons to make lemonade. Your host says “no problem! There are lots of lemons in my sister’s orchard about 30 miles away. We will go there next week.”  You said “sure, but in the interim, could you describe for me what a lemon is? I want to make sure we are referring to the same thing.” She does.

Imagine the lemons in this scenario as your dataset.  Now let’s assess its efficacy.

Reliability: You ask the host on several occasions to describe to you what a lemon is. Every time, she gives you the same description. You then ask her neighbors to describe to you what a lemon is. They too give you the same or very similar description. Conclusion: The dataset is reliable.

Validity: Let’s skip this for a minute.

Comprehensiveness: You calculated that you need 100,000 lemons to make the volume of lemonade needed for the amount of vitamin C required. You ask how many lemon trees there are in the orchard and how many lemons each tree produces. They tell you there are 100 trees and that each tree bears 1000 lemons. Conclusion: The dataset is comprehensive.

Actionability:  Time’s short. You need to get the lemons prepared quickly for you to make the lemonade. Your host says “no problem. I will call and tell them to squeeze all the lemons into buckets so that they will have the lemon juice ready when we get there.” On the day of your trip to the orchard they called to tell you that the buckets of lemon juice are ready. Conclusion: The dataset is comprehensive.

[Now back to] Validity: Are the lemons really lemons? When you arrived at the orchard, you were warmly greeted and showed with great pride and enthusiasm the buckets of lemon juice. You looked at the trees and stared into the buckets. They were not lemons. They were clementines. That’s what your hosts call lemons. Clementines. What you know as lemons they call lime. 

Tuesday, June 16, 2015

Is Datum Dead?

I got quite a few well-intentioned comments on my "profligate" and sometimes “incorrect” use of data in my two previous entries.  Some of my colleagues in the academic realm are dismayed by my use of the word a few times as singular instead of plural; e.g. “data is”  instead of  “data are” or “datum is.” A few of my colleagues outside of academe (i.e. practitioners, program sponsors, implementing partners, innovators and entrepreneurs) were equally quick to point out that the phrase the data dictate (v) should have been the data dictates. Obviously, to them data is singular.

Quite a large number of you correctly noted that I was invoking poetic license with the word, pointing out my punctuation of God versus god and Data versus data--e.g. Zeus is god but Data is God, to show that my deified anthropomorphic Data is way more powerful than Zeus.

I know of course that data, like media, is plural, their singular forms being datum and medium respectively. But who says Datum is God? Not a god that evokes awe and respect and total obeisance. Clearly this is an ongoing debate about the role and status of language in society. Dictionary and thesaurus companies owe their raison d’ĂȘtre to the interminably vexing question “when does a word become officially accepted and acceptable [to use]?” But what is unarguable is that it is we as a society who collectively decide, over an extended period of time, when words die and get buried in an etymological graveyard, when others go into comma and get rarely used, and when others morph semantically to gain new life.

To me, datum is in a coma one twitch-of-a-smile short of vegetative. Like medium as singular for [mass/social] media, I use it only to correct students, when in the company of my academic peers or when writing with either as my exclusive target audience.

This brings me to the question for you. Is datum dead? I am literally dying to know. Ha! Literally. Topic for another discussion.


********************************************************************
Join the debate at Facebook or @DataDictate
Post your comments below this entry or directly to us
Send us your article for publication consideration.