Sunday, June 21, 2015

The Second Coming: On Data Colonialism and the Rush to the New Frontier


The Ebola epidemic wrought untold damage and suffering to the people, politics and economics of Guinea, Sierra Leone and Liberia. It still does.  And will for some time into the future. But the Ebola epidemic was a big economic boon for a whole lot of businesses outside of Guinea, Sierra Leone and Liberia. Outside of Africa even. And mostly in the West. It still is.  And will for some time into the future.

And no sector of the economy in the West benefited more than data brokers. How? Let’s go to the Ebola Open Data Jam. This Washington, D.C. event was designed to be held simultaneously with others in New York, Winchester (UK), Kampala (Uganda), Monrovia (Liberia) and Freetown (Sierra Leone). I know that Freetown wasn’t able to jam with us. I don’t know if Kampala and Monrovia were able to join the party, but if they did, it was nowhere close to those of us in the West, especially in D.C., which pulled no stops to make the event the phenomenal success that it was. There were many reasons for this lopsided attendance, most of which are the usual suspects, namely the intractable problems of unreliable internet connectivity and electricity supply, and (for the case of Freetown and Monrovia, where Ebola was raging) the wisdom of staying away from any contact with any other person.




The DC event, held February 21, 2015, was part of the annual Washington, DC Open Data Day Hackathon. This Saturday event at the World Bank headquarters was jam-packed: King Kong the raging abominable snowman got nothing on us! Practically all members of the major sponsors were there, including World Bank, USAID, WHO, CDC and a much larger contingent from private companies big and small.

The goal of course was well intentioned: To capture online data available in the public domain or with Creative Commons license, render them in actionable formats and make them available to anyone (especially policymakers in Africa) who wants to use them to combat Ebola. As I mentioned in an earlier entry, all the data jammers present could be placed into two categories, namely the for-profiters (i.e. the commercial data brokers) and the rest of us.

Let’s start with the commercial data brokers at the event, mostly highly innovative small-business entrepreneurs with lots of IT savvy. They openly and actively participated in the jam like the rest of us. Some even co-sponsored the event. But their purpose was devoid of any scheme or desire to help African leaders govern better and more responsibly. Or to help eradicate Ebola.  Their goal was simple. They saw in Ebola an opportunity to make money and all their engagements and interactions with any form of data (opened, shared or otherwise) were solely and ultimately directed to that end.

Thus, they collected any and all forms and sources of open data that they could at the event. 
But perhaps much more important to my argument, they later collected data from these African countries, probably without the knowledge or consent of their leaders. How? 

I would use one particular company, which shall remain nameless, as an example. This company reached out to one of the two major mobile carriers in one of the three most affected countries. This carrier allowed them to access and download all the numbers of all their two million-plus subscribers. Using text-based surveys and SMS geotagging, the company was able to accurately identify Ebola hotspots in real-time. It then sold such data on a subscription basis to program sponsors for a very, very handsome profit.

Clearly this example raises a lot troubling questions relating to individual informational security and privacy, as well as national security. But much more relevant to my argument is that this data broker, like hundreds of its kind sprouting all over the world, and mostly in the West, recognizes and is taking advantage of a huge market opportunity in the exploitation of data on, in and from developing countries, mostly in Africa. And there’s nothing most African governments and quite a few in Asia, the Caribbean and Latin America, can at present do about this data colonization.  The magnitude of this colonization could be gleaned from the list of sponsors at the Third International Open Data Conference in Ottawa, Canada. Quite a few were commercial data brokers who demoed their services there, which in most cases were comprised of data collected from developing countries, thanks to cutting-edge data-management hardware and software tools.

For the rest of us at the Ebola data jam in Washington DC on a very snow-heavy Saturday February 21, 2015, our actions were arguably informed by the following assumptions:

1.      That a significant reason why Ebola was raging out of control in the three most affected countries was because of insufficient or no efficacious data on such basic things as number of hospitals and number of beds per hospital.

2.      That if we made said data available directly to policymakers or through program sponsors or implementing partners, they will readily and effectively use them to successfully combat the epidemic.

These assumptions were troubling to say the least. They were stripped of all the political, economic and sociocultural realities not only on the ground but also, it must be said, within some program sponsors themselves, none the least being World Health Organization.  But let us go back and critically examine the aforementioned assumptions.

1.         That a significant reason why Ebola was raging out of control in the three most affected   countries was because of insufficient or no efficacious data on such basic things as number of hospitals and number of beds per hospital.

·         If this is true, why? To what extent might this be due to bad governance, specifically  chronically poor records management by its leaders?

·         Is this in fact true, given that we here in the West (and anyone anywhere else with Internet connectivity) could access the same data that we did?

·         Okay, one may well counter that the problem is not with access to the data per se, but with the technical know-how to reformulate and (re)package them to enable effective use by policymakers. But to what extent would this be true? Examples abound of highly qualified programmers and other IT specialists, most citizens, living and working in the Ebola-affected countries.

2.         That if we made the data available directly to policymakers or through   program sponsors or implementing partners, they will readily and effectively use them to successfully combat the epidemic.

·         Again, assuming this is true, and that the Ebola epidemic was due significantly to poor records management by leaders of the affected countries, isn’t our faith woefully misplaced that the same leaders would put the data to their intended use?

The assumptions underlying these well intentioned and well-funded activities to a large extent inform the very high priorities that program sponsors place on such Open Data-related programs in developing countries as Open Government initiatives and Open Portal initiatives. They reflect in short my observation that data in and on developing countries are not seen as valuable assets in and of themselves. That they are seen merely as tools/indicators of good governance, health, agriculture, education, etc. As a result, program sponsors, instead of helping support the nascent but struggling commercial data brokerage industries in these countries, spend an inordinate amount of resources on the establishment of government institutions geared primarily towards transparency, accountability and good governance.

While these activities are doubtless laudable, their success is doubtful, for many reasons. Among these are the fact that their implementation is disproportionately and ill-advisedly placed in the government sector, where institutions newly established to implement the programs are often staffed with unqualified cronies well paid from donor funds. Said institutions atrophy when funds run out and/or when new political parties come to power, in effect robbing them of the continuity critically required for the programs to succeed and thrive.

The inclusion of non-government actors in this game plan rarely go beyond engagement with local activists and similar civil society members. Doubtless, better recognition and support are needed of local private-sector actors who are more (if not entirely) concerned with making money within the local commercial data brokerage industry, but who, in so doing, are much more likely to help engender the responsible governments that by any indicator we desire, but with much less resources and much more bang for our taxpayer bucks.

Clearly, we need a new paradigm of data management for social change and development. Hopefully there will be room for greater discussion of this in future open data conferences, most significantly the Africa Open Data conference in Dar-Es-Salaam, Tanzania.

Wednesday, June 17, 2015

On Names and Naming #1: Cheat Sheet on Criteria for Assessing Data Efficacy


 This is the first in my series on the accuracy, connotations and denotations of terms/expressions widely known and used in the global society of data management for social change and development. 
*

FORMULA:  Data Efficacy = Data Integrity + Data Actionability

Data Integrity = Reliability + Validity + Comprehensiveness

Therefore Data Efficacy = [ Reliability + Validity + Comprehensiveness] + Actionability.

Reliability: Is what is referred to as the dataset consistently the same?

Validity: Is the dataset accurate?

Comprehensiveness: Are the constituent elements of the dataset sufficient enough to solve the problem?

Actionability: Is the dataset in a form that facilitates quick and productive use of it for your purpose? If not, could the dataset be easily rendered in a format that will facilitate quick and productive use of it for your purpose?
*
DEMO
You have observed that a lot of the citizens in your host town lack vitamin C. An easy fix for that is lemonade. You need lemons to make lemonade. Your host says “no problem! There are lots of lemons in my sister’s orchard about 30 miles away. We will go there next week.”  You said “sure, but in the interim, could you describe for me what a lemon is? I want to make sure we are referring to the same thing.” She does.

Imagine the lemons in this scenario as your dataset.  Now let’s assess its efficacy.

Reliability: You ask the host on several occasions to describe to you what a lemon is. Every time, she gives you the same description. You then ask her neighbors to describe to you what a lemon is. They too give you the same or very similar description. Conclusion: The dataset is reliable.

Validity: Let’s skip this for a minute.

Comprehensiveness: You calculated that you need 100,000 lemons to make the volume of lemonade needed for the amount of vitamin C required. You ask how many lemon trees there are in the orchard and how many lemons each tree produces. They tell you there are 100 trees and that each tree bears 1000 lemons. Conclusion: The dataset is comprehensive.

Actionability:  Time’s short. You need to get the lemons prepared quickly for you to make the lemonade. Your host says “no problem. I will call and tell them to squeeze all the lemons into buckets so that they will have the lemon juice ready when we get there.” On the day of your trip to the orchard they called to tell you that the buckets of lemon juice are ready. Conclusion: The dataset is comprehensive.

[Now back to] Validity: Are the lemons really lemons? When you arrived at the orchard, you were warmly greeted and showed with great pride and enthusiasm the buckets of lemon juice. You looked at the trees and stared into the buckets. They were not lemons. They were clementines. That’s what your hosts call lemons. Clementines. What you know as lemons they call lime. 

Tuesday, June 16, 2015

Is Datum Dead?

I got quite a few well-intentioned comments on my "profligate" and sometimes “incorrect” use of data in my two previous entries.  Some of my colleagues in the academic realm are dismayed by my use of the word a few times as singular instead of plural; e.g. “data is”  instead of  “data are” or “datum is.” A few of my colleagues outside of academe (i.e. practitioners, program sponsors, implementing partners, innovators and entrepreneurs) were equally quick to point out that the phrase the data dictate (v) should have been the data dictates. Obviously, to them data is singular.

Quite a large number of you correctly noted that I was invoking poetic license with the word, pointing out my punctuation of God versus god and Data versus data--e.g. Zeus is god but Data is God, to show that my deified anthropomorphic Data is way more powerful than Zeus.

I know of course that data, like media, is plural, their singular forms being datum and medium respectively. But who says Datum is God? Not a god that evokes awe and respect and total obeisance. Clearly this is an ongoing debate about the role and status of language in society. Dictionary and thesaurus companies owe their raison d’être to the interminably vexing question “when does a word become officially accepted and acceptable [to use]?” But what is unarguable is that it is we as a society who collectively decide, over an extended period of time, when words die and get buried in an etymological graveyard, when others go into comma and get rarely used, and when others morph semantically to gain new life.

To me, datum is in a coma one twitch-of-a-smile short of vegetative. Like medium as singular for [mass/social] media, I use it only to correct students, when in the company of my academic peers or when writing with either as my exclusive target audience.

This brings me to the question for you. Is datum dead? I am literally dying to know. Ha! Literally. Topic for another discussion.


********************************************************************
Join the debate at Facebook or @DataDictate
Post your comments below this entry or directly to us
Send us your article for publication consideration. 

Saturday, June 13, 2015

ABRIDGED: A Call for a New Paradigm of Data Management for Civic Engagement and Sustainable Development


Q.  What’s the Data Dictate (n)?
A.  The data dictate (v).

I have argued in the original version of this entry that Data is God. That what the Data Dictate states unequivocally is that we are data. Who we are, what we are, what we do, with whom, where, when and how. With or without our knowledge, consent or support, without our ability to opt in or opt out, without our ability to prevent it, we are rendered into bits and bytes from preconception to reincarnation. And beyond.

I assert that we have moved from the Information Society to the Data Society. As a result, we (especially those of us in the business of using data for social change and development) should consider a paradigm shift, the implications of which are addressed below.

Our understanding of Data is long overdue for an overhaul. Data is. Period. It is more than a tool to help eradicate or achieve something. It is more than a medium through which we create a platform, product or service, be it for business, aid, activism, governance good and bad, destruction big and small…

This means we need to reconsider conceptual (de)limitations of terms such as “ICT/Open Data for [insert who/what you are/do here]” and their impact on the design, implementation, monitoring and evaluation of (the success or failure of) your program; on the feasibility or viability of your proposed program, product or service. …

This also means that actors big and small in the global community of social change and development must demolish the silos and borders we built and build between and among us if we want to be more agile and efficient in order to increase our chances for success. In other words if you are dealing with data in your venture, it should matter little or not at all what your discipline, field, profession, passion, program, product or service is. This should free your mind and your assets will follow to enable you to partner with anyone anywhere.

Data deserves way more respect than is currently accorded. There appears to be a glaring disparity and dissonance in the valuation of data among and between actors in the global community of social change and development. In developed countries, anything relating to data management is regarded as highly valuable commodity and valued and protected as such. Take for example any social media product or service of your choice and find out how much it is worth, and the plethora of patents, trademarks and copyrights it has and zealously protects, including all your content in it. In fact, you and what you do on/with Facebook and YouTube are exponentially more valuable to Zuckerman and Brin & Page respectively than all their algorithms combined!

But when citizens in developing countries are the intended beneficiaries of products, programs or services, two views dominate: That of the for-profiters and that of the rest of us.  The for-profiters intuitively recognize the value , which is often significantly more than the same intended for those in developed countries---the classic supply-demand principle at work here. They therefore feverishly gobble up all data (open, shared, closed and otherwise) they can find on developing countries, add them to theirs (which were arguably paltry before), obtain and protect various IP rights for them and then package and sell them, often to program sponsors or implementing partners who most likely provided most of the data free of charge in the first place. 

For the rest of us, valuation of the data generated by implementing partners and/or their sponsors rarely go beyond contractual obligations of delimited access and sharing, or of Creative Commons licensing. This view is myopic and misguided. I ardently applaud and support the ingenuity of the data for-profiters targeting developing countries. In fact they play a critical role in the nascent data management economy in developing countries.  It is understandable and no fault of theirs that they are taking advantage of and profiteering handsomely from the data rush to the new frontier.

But the rest of us, especially big program sponsors such as World Bank, USAID, DFID, UNDP and UNESCO, need a whole new and effective game plan to play or referee well in this new frontier. We are in dire need of an equitable global data valuation system that includes data subjects in developing countries. I cite two among the innumerable benefits such valuation system would generate:

·         First, it would be a potential cash-cow for program sponsors and implementing partners. Just because we are non-profit does not mean all that we generate should be free, especially when used by for-profiters. If done right, this ever-replenishing revenue reservoir will help fund the creation of a global data management development index and of data management ministries/secretariats in developing countries.

·         Secondly, it would support the growth and prosperity of local innovators and entrepreneurs in the nascent data management economy in developing countries. This group plays a crucial role in sustainable development and good governance.

Without this new valuation system, we unwittingly support what I call the Second Coming phenomenon: The recolonization of old empires, this time for the exploitation of their data resources.

******************************************************************************
Join the conversation at Facebook or @DataDictate
Post your comments to this entry below or directly to us
Send us your piece for publication consideration. 




Thursday, June 11, 2015

A Call for a New Paradigm of Data Management for Civic Engagement and Sustainable Development


Q.  What’s the Data Dictate (n)?
A.  The data dictate (v).

You would be correct to deduce from this imperial diktat that what we do at work, home, place and everywhere else is influenced by data. That whom we interact with and how are influenced by data.

For better and for worse, we are tagged from cradle to grave. And we have been thus from time immemorial. The need to collect and codify data on and about us right down to the nanomolecular level did not start with Uncle Sam. Or the Internet. This enterprise traces its origin back to Lucy, at that exact moment when she was unable to share with Lucian, Lucien and Lucienne what she saw in their absence by utterances, gestures and gesticulations.  Borne of that frustration was data, rendered first time ever as markings, perhaps on the floor and later on the wall of their cave dwellings.

So yes, we can state that the genesis of language is data. And throughout the trajectory of our evolution, from Lucy “Low-Hands” to Lindsay Lohan, data continue to dictate the birth, growth and inescapable Shumpeterian demise of thing(s) and one(s) every, many and none: Politics. Society. Culture. Commerce.

Particularly commerce. Convincing arguments abound that the origin and growth of codified language is commerce. When the hunter-gather became the dweller, he wanted to patent a potent symbol of his power and prestige; his domesticated animals. But two things must be achieved for that to happen: A mark unique to him, which he stamps on his properties and which his co-dwellers must recognize and agree not to imitate. And a way to count and keep count of—in his presence as well as absence, near and far—his cattle as it multiplies by birth, purchase and wars won, and as it reduces by death, sale and wars lost.

And so did data beget commerce, which begat numbers, which begat codes, which begat the need for authenticity, which begat authentication, which begat verification, which begat power, which begat control, which begat censuses writ large in our history. This desire to collect, count and code has invariably been central to our civilization. There is no shortage of evidence of this in religious texts (e.g. the Book of Numbers in the Old Testament) and various and varying historical narratives of any given civilization (e.g. the Doomsday Book).  

This data management exercise has been for the best for this our homo sapiens civilization; it gave birth to the social security number, credit reference bureaus (e.g. Equifax, Experian and TransUnion) credit ratings agencies (e.g. Fitch, Moody’s and S&P), the media audience measurement industry (e.g. Nielsen, Arbitron and Audit Circulation Bureau), Skype, Google, Facebook, YouTube, Twitter, NSA….

This data management exercise has also been for the worst for this our homo sapiens civilization; it gave birth to the social security number, credit reference bureaus (e.g. Equifax, Experian and TransUnion) credit ratings agencies (e.g. Fitch, Moody’s and S&P), the media audience measurement industry (e.g. Nielsen, Arbitron and Audit Circulation Bureau), Skype, Google, Facebook, YouTube, Twitter, NSA….

The Data Dictate asserts that data dictate.

You would indeed be correct to deduce from this imperial diktat that what we do at work, home, place and everywhere else is influenced by data. That whom we interact with and how we interact with them are influenced by data.

You would be correct, yes. But not entirely. What the Data Dictate states unequivocally is that we are data. Who we are, what we are, what we do, with whom, where, when and how. With or without our knowledge, consent or support, without our ability to opt in or opt out, without our ability to prevent it, we are rendered into bits and bytes. Not from cradle to grave, nor from ejaculation to resurrection, but from preconception to reincarnation. And beyond.

So it would be more accurate to state that Data is more than divine emperor with absolute power. Data is God.  And to God omniscient, omnipotent, who at Her whim and caprice could be omnibenevolent or omnimalevolent, we are nothing, we Her genuflecting minions, but bits and bytes. Whatever we say, do, how, when, where, why, with whom, through what media and with what language, we are rendered ones and zeros, the code that is the word by which we live and die.

Data is code is word.
And in the beginning was the word.
And the word was with God.
And the word was God.

Do I hear you say “word!”?

Data is God is word is code. Code. A noun, a verb. A concept so pregnant with meaning that it demands a whole sanctuary of its own to permit us to unravel its ambivalence and ambiguities enough for deliverance and entry into what I call the Global Data Management Enterprise. More on this anon. Code denotes and connotes mystery and revelation; mysticism and rationalism; order and chaos; acceptance and rejection; pleasure and pain; power and bondage; good and evil; war and peace; construction and destruction; beginning and end…

But if data is code is word is God, what has become of Uncle Sam, given his power, ingenuity and productivity of and with the code? Nothing? Something? Really alive? Really dead? Really powerful? Really powerless?

Good news. Uncle Sam is still alive and kicking. Really. And still powerful. Very powerful.  Very, very powerful. More good news. When it comes to the code, Uncle Sam is god. In fact, in the earlier stages of the development of Data Olympiad, Uncle Sam was Zeus. He still is. But today Uncle Sam is no longer Big Brother, even though he is still watching (and eavesdropping on) us. At some point in the growth of Data Olympiad, Uncle Sam’s protégés became so powerful that they engineered a de facto palace coup. As Zeus is to Jesus, so is Uncle Sam to his nephews and nieces; Microsoft, Experian, Adobe, Google, Facebook, YouTube, Twitter, AT&T, PLA Unit 61398, KPA’s Unit 121, Anonymous, ransomwarers…...and on, and on, and on. More on that anon. Uncle Sam is still watching us. But not without the consent (and oftentimes dissent and obstruction) of his nephews and nieces (among whom are the saintly, good, bad and ugly) who have morphed into Big Mama.

So what’s the coda to all this blog’s blague and blather?

It is with great joy and great grief that I announce the death of Information Society. He is survived by his daughter, Data Society, otherwise known as Global Data Management Enterprise, otherwise known as Big Mama.

And to survive and thrive in Data Society, I softly recommend the following: 

1.      Our understanding of Data is long overdue for an overhaul. Data is. Period. It is more than a tool to help eradicate or achieve something. It is more than a medium through which we create a platform, product or service, be it for business, aid, activism, governance good and bad, destruction big and small…and on, and on, and on. More on this anon.

·         This means we need to reconsider conceptual (de)limitations of terms such as “ICT/Open Data for [insert who/what you are/do here]” and their impact on the design, implementation, monitoring and evaluation of (the success or failure of) your program; on the feasibility or viability of your proposed program, product or service. …and on, and on, and on. More on this anon.

·         This also means that actors big and small in the global community of social change and development must demolish the silos and borders we built and build between and among us if we want to be more agile and efficient in order to increase our chances for success. In other words if you are dealing with data in your venture, it should matter little or not at all what your discipline, field, profession, passion, program, product or service is. This should free your mind and your assets will follow to enable you to partner with anyone anywhere. Handles for this concept are "interdisciplinarity", "transdisciplinarity" and on, and on, and on.... More on this anon.

2.      Data deserves way more respect than is currently accorded. There appears to be a glaring disparity and dissonance in the valuation of data among and between actors in the global community of social change and development. In developed countries, anything relating to data management is regarded as highly valuable commodity and valued and protected as such. Take for example any social media product or service of your choice and find out how much it is worth, and the plethora of patents, trademarks and copyrights it has and zealously protects, including all your content in it. In fact, you and what you do on/with Facebook and YouTube are exponentially more valuable to Zuckerman and Brin & Page respectively than all their codes combined!

But when citizens in developing countries are the intended beneficiaries of products, programs or services, two views dominate: That of the for-profiters and that of the rest of us.  The for-profiters intuitively recognize the value, which is often significantly more than the same intended for those in developed countries---the classic supply-demand principle at work here. They therefore feverishly gobble up all data (open, shared, closed and otherwise) they can find on developing countries, add them to theirs (which were arguably paltry before), obtain and protect various IP rights for them and then package and sell them, often to program funders or implementing partners who provided the data free of charge in the first place. 

For the rest of us, valuation of the data generated by implementing partners and/or their sponsors rarely go beyond contractual obligations of delimited access and sharing, or of Creative Commons licensing. This view is myopic and misguided. I ardently applaud and support the ingenuity of the data for-profiters targeting developing countries. In fact they play a critical role in the new paradigm of development that I hope to advocate soon. It is understandable and no fault of theirs that they are taking advantage of and profiteering beautifully from the data rush to the new frontier.

But the rest of us, especially big program sponsors such as World Bank, USAID, DFID, UNDP and UNESCO, need a whole new and effective game plan to play or referee well in this new frontier. We are in dire need of an equitable global data valuation system that includes data subjects in developing countries. I cite two among the innumerable benefits such valuation system would generate:

First, it would be a potential cash-cow for program sponsors and implementing partners. Just because we are non-profit does not mean all that we generate should be free, especially when used by for-profiters. If done right, this ever-replenishing revenue reservoir will help fund the creation of a global data management development index and of data management ministries/secretariats in developing countries (see below).

Secondly, it would support the growth and prosperity of local innovators and entrepreneurs in the nascent data management economy in developing countries. This group plays a crucial role in sustainable development and good governance.

Without this new valuation system, we unwittingly support what I call the Second Coming phenomenon: The recolonization of old empires, this time for the exploitation of their data resources.

In forthcoming entries, I will share my thoughts and details on key elements of the global data valuation system. I will advocate among other things:

·         The creation of a robust development index that ranks the growth and development of countries according to their data management efficacy, equitability and productivity.

·         The elevation of data management in developing countries to ministerial/secretariat level.

I bid you bonne mine to this mindmeld.
********************************************************************
Join the debate at Facebook or @DataDictate
Post your comments to this entry below or directly to us