Thursday, July 30, 2015

Bridging the Digital Divide in Gigabit Cities

Denise Linn conducted this research as an MPP Candidate at the Harvard Kennedy School. She is currently a Program Analyst at the Smart Chicago Collaborative.

With the rise of coalitions like Next Century Cities and Gig.U and the development of groundbreaking networks in cities like Chattanooga and Kansas City, the buzz surrounding gigabit Internet speeds has swelled in the US. Cities are working closely with companies like Google Fiber or even building out fiber-optic infrastructure themselves. The suggested rewards of these investments include stronger local economies, vibrant tech startup scenes, progress in distance learning, telemedicine, research—and the list goes on.

But when superfast gigabit speeds are available in a city, what does that mean for people beyond tech entrepreneurs and other heavy Internet users? How can cities make sure that technological innovation lifts up the lives of every resident? This all leads to the ultimate question I examined in my recent research: What does the availability of high speed Internet mean for the digital divide?

Unpacking public data can shed some insight on this important issue. The 2013 American Community Survey’s tract and city-level demographic data merged with the Federal Communications Commission’s broadband subscribership data tell us a complex story about what faster speeds do to digital inclusion in metro areas. Though on the surface, both normal cities and gigabit cities do not appear to differ greatly in terms of overall broadband adoption, the data show that there is significant interaction between poverty and gigabit infrastructure. In other words, the presence of gigabit infrastructure has a significant correlation with higher connectivity in lower-income neighborhoods. Poorer cities and poorer census tracts are predicted to fare better when there is gigabit availability.

Why is this? There are a few possible explanations:

  1. Increased competition: It’s possible that faster speeds spur competition, lower prices, and make at-home broadband subscriptions possible for more people.
  2. Greater awareness of why the Internet is important: According to Pew, the number one barrier for broadband adoption in the home is lack of awareness or understanding of how the Internet is relevant to everyday activities. It’s possible that the community organizing process required to build gigabit networks engages low-income neighborhoods and heightens awareness of why the Internet is important throughout a city.
  3. Empowered anchor institutions in low-income areas: Within gigabit cities, anchor institutions—community-based organizations and libraries—deliver critical services to help get people online. In my research I saw interesting outliers—namely, very poor census tracts that were walkable and had easy access to public amenities or programs saw higher rates of Internet connectivity. For example, Hamilton County’s census tract 20 in Chattanooga, TN is both dense and is home to four churches and Howard High School. In 2013, 46% of households in this tract were living in poverty, but over 80% subscribed to broadband service.

The data analysis also points to weaknesses in high-speed Internet cities: broadband adoption in concentrated populations of non-English speakers and communities with low educational attainment. Interestingly, these residents are predicted to be worse off in gigabit cities. This observation points to what many might already suspect—that the relevancy and skill barriers to broadband adoption cannot be solved by faster speeds alone.

Fortunately, cities can understand and take ownership over their own digital divides, whether they are gigabit cities or aspiring gigabit cities. The public sector has a major role to play in digital inclusion. For example, cities can hire a digital inclusion specialist to work full time on the issue or create a grants program for local nonprofits. It’s clear that city governments can set the tone for broadband adoption. You can see my recommended digital inclusion actions for city governments here.

The National League of Cities, in partnership with Next Century Cities and Google Fiber, is conducting a webinar on August 6th to provide practical steps and specific case examples for city governments seeking to heighten their work in this area. Also, cities with great programs or programming ideas will have the opportunity to win a first-ever Digital Inclusion Leadership Award and share their success stories at the NLC conference in November.

To learn more about digital inclusion and dive deeper into the subjects covered in this post, see A Data-Driven Digital Inclusion Strategy for Gigabit Cities, or the summary here.

Wednesday, July 29, 2015

Mapping youth well-being worldwide with open data

Ryan Swanstrom is a blogger at Data Science 101. This post originally appeared on DataKind's blog.

How does mapping child poverty in Washington DC help inform efforts to support child and young adult well being in the UK and Kentucky?

Back in March 2012, a team of DataKind volunteers in Washington DC worked furiously to finish their final presentation at a weekend DataDive. Little did they know, the impact of their work would extend far beyond DC and far beyond the weekend. Their prototyped visualization ultimately became a polished tool that would impact communities worldwide.

DC Action for Children's Data Tools 2.0 is an interactive visualization tool to explore the effects of income, healthcare, neighborhoods, and population on child well-being in the Washington DC area. The source code for Data Tools 2.0 and open data sources have since been used by DataKind UK and Code for America volunteers to benefit their local partners. There is now potential for it to reach even more communities through DataLook's #openimpact Marathon.

See how far a solution can spread when you bring together open data, open code and open hearted volunteers around the world.

What a difference a DataDive makes

DC Action for Children, a Washington DC nonprofit focusing on child well-being, needed help understanding how Washington DC could be one of the most affluent and wealthy cities in the United States, yet have one of the highest child poverty rates. Could mapping child poverty help uncover patterns and insights to drive action to address it?

A team of DataDive volunteers, led by Data Ambassador Sisi Wei, took on the challenge and, in less than 24 hours, created a prototype that wrangled data in a multitude of forms from government agencies, Census and DC Action for Children's own databases.  The 24-hours then evolved into a multi-month DataCorps project involving many DataKind volunteers. The team unveiled a more polished version to a large and influential audience in Washington DC, including the Mayor of DC himself! They then completed the final enhancements to create Data Tools 2.0, which is now live on DC Action for Children’s website.

The project has since released the source code on Github, and the team has continued to collaborate and advance the project to where it is today. In fact, if you’re local, check out the August 5th DataKind DC Meetup to join in and continue improving the tool.

This story alone is incredible and speaks to the incredible commitment of these volunteers and the importance of having a strong partner like DC Action for Children to implement and utilize the work as an integrated part of its mission.

And that's usually where the story ends. Thanks to DataKind’s global network though, the impact of this work was just starting to spread.

A Visualization Goes Viral

Because the visualization used open data (freely available data for public use) and open source software or code (freely available code that can be viewed, modified, and reused), other volunteers could quickly repurpose the work and apply it to their local community.

DataKind UK London DataDive

The first time the visualization was replicated was in October 2014 for The North East Child Poverty Commission. The Commission had a similar challenge of wanting to better understand child poverty in the North East of England. A team at the London DataDive reused the code from DataTools 2.0 and created a similar visualization for the North East of England. This enabled the team to quickly produce valuable results that “thrilled” NECPC. One of the team’s Data Ambassadors continued to work with the organization and has since migrated the visualization to a different platform in Tableau.

DataKind UK Leeds DataDive

In April 2015, DataKind UK hosted another DataDive in Leeds with three charity partners, Volition, Voluntary Action Leeds and the Young Foundation, to tackle the structural causes of inequality in the city. All three charity teams came together to create a visualization tool that allows people to explore features of financial, young NEETs (Not in Education, Employment, or Training) and mental health inequality. But they did not recreate the wheel—they leveraged past work and repurposed code from DC Action for Children. Read more about the event in this recap from DataDive attendee, Andy Dickinson.

Beyond the DataKind Network

Now, it’s great to see a solution scale within an organization’s network, but it’s even more impressive to see it scale beyond, in this case, into Kentucky and maybe one day India or Finland.

#HackForChange with Code For America

In June 2015, the city of Louisville, Kentucky teamed with Civic Data Alliance to host a hackathon in honor of the National Day of Civic Hacking. Kentucky Youth Advocates, a nonprofit organization focused on "making Kentucky the best place in America to be a kid," wanted to visually explore the factors affecting successful children outcomes across Council Districts. There is a large variance in child resources throughout the city, which is having an effect on child well-being. The volunteers repurposed the original code and used local publicly available data to create the Kentucky Youth Advocates Data Visualization, which is now helping the city of Louisville better distribute resources for children.

#openimpact Marathon

DC Action for Children is also one of the projects selected for the #openimpact Marathon hosted by DataLook. The goal of the marathon is to get people and groups to replicate existing data-driven projects for social good. So far, there is interest in replicating the Data Tools 2.0 visualization for child crimes in India and another potential replication for senior citizens in Finland. There is no telling where this visualization will end up helping next. Get involved!

Ok ok, but what is the impact of all this really?

Aren’t these just visualizations? Yes, as any good data scientist knows, data visualizations are not an end in and of themselves. In fact, it’s typically just part of the overall process of gaining insight into data for some larger end goal. Similarly, open data in and of itself does not automatically mean impact. The data has to be easy to access, in the right formats, and people have to apply it to real-world challenges. Just because you build it (or open it), does not necessarily mean impact will come.

Yet visualizations and open data sources are often a critical first step to bigger outcomes. So what makes the difference between a flashy marketing tool and something that will help improve real people’s lives? The strength of the partner organization that will ultimately use it to create change in the world.

Data visualizations, open data and open source code alone are not going to end child poverty. People are going to end child poverty. The strength of the tool itself is less important than the strength of an organization’s strategy of how to use it to inform decision-making and conversation around a given issue.

Thankfully, DC Action for Children has been a tremendous partner and is using Data Tools 2.0 as a key part of its efforts to improve the lives of children in DC. It’s exciting to see the tool now spreading to equally impressive partners around the world.

Monday, June 29, 2015

Data for Good in Bangalore

Miriam Young is a Communications Specialist at DataKind.

At DataKind, we believe the same algorithms and computational techniques that help companies generate profit can help social change organizations increase their impact. As a global nonprofit, we harness the power of data science in the service of humanity by engaging data scientists and social change organizations on projects designed to address critical social issues.

Our global Chapter Network recently wrapped up a marathon of DataDives, helping local organizations with their data challenges over the course of a weekend. This post highlights two of the projects from DataKind Bangalore’s first DataDive earlier this year, where volunteers used data science to help support rural agriculture and combat urban corruption.

Digital Green

Founded in 2008, Digital Green is an international, nonprofit development organization that builds and deploys information and communication technology to amplify the effectiveness of development efforts to affect sustained social change. They have a series of educational videos of agricultural best practices to help farmers in villages succeed.

The Challenge

Help farmers more easily find videos relevant to them by developing a recommendation engine that suggests videos based on open data on local agricultural conditions. The team was working with a collection of videos, each focused on a specific crop, along with descriptions, but each description was in a different regional language. The challenge, then, was parsing and interpreting this information to use it as as a descriptive feature for the video. To add another challenge, they needed geodata with the geographical boundaries of different regions to map the videos to a region with specific soil types and environmental conditions, but the data didn’t exist.

The Solution

The volunteers got to work preparing this dataset and published boundaries of 103,344 indian villages and geocoded 1062 Digital Green villages in Madhya Pradesh(MP) to 22 soil polygons. They then clustered MP districts into 5 agro-climatic clusters based on 179 feature vectors, mapping villages that Digital Green works with into these agro-climatic clusters. Finally, the team developed a Hinglish parser that parses the Hindi titles of available videos and translates them to English to help the recommender system understand which crop the videos relate to.

I Change My City / Janaagraha

Janaagraha was established in 2001 as a nonprofit that aims to combine the efforts of the government and citizens to ensure better quality of life in cities by improving urban infrastructure, services and civic engagement. Their civic portal, IChangeMyCity promotes civic action at a neighborhood level by enabling citizens to report a complaint that then gets upvoted by the community and flagged for government officials to take action.

The Challenge

Deal with duplicate complaints that can clog the system and identify factors that delay open issues from being closed out.

The Solution

To deal with the problem of duplicate complaints, the team used Jaccard similarity and Cosine similarity on vectorized complaints to cluster similar complaints together. Disambiguation was performed by ward and geography. The model they built delivered a precision of more than 90%.

To deal with the problem of identifying factors affecting closure by user and authorities, the team used two approaches. The first approach involved analysis using Decision Trees by capturing attributes like Comments, Vote-ups, Agency ID, Subcategory and so on. The second approach involved logistic regression to predict closure probability. Closure probability was modeled as a function of complaint subcategory, ward, comment velocity, vote-ups and similar other factors.

With these new features, iChangeMyCity will be able to better handle the large volume of incoming requests and Digital Green will be better able to serve farmers.

These initial findings are certainly valuable, but DataDives are actually much bigger than just weekend events. The weeks of preparation that go into them and months of impact that ripple out from them make them a step in an organization’s larger data science journey. This is certainly the case here, as both of these organizations are now exploring long-term projects with DataKind Bangalore to expand on this work.

Stay tuned for updates on these exciting projects to see what happens next!

Interested in getting involved? Find your local chapter and sign up to learn more about our upcoming events.

Wednesday, June 24, 2015

The Price of Data Localization

Forced data localization laws require data be stored in a specific country, rather than in a distributed “cloud” spread across global networks. As we see the development of more cloud-based products and services, these laws run counter to the direction of technological innovation.

In fact, many studies have shown that forced data localization could negatively impact privacy as well as security and integrity of data. Other studies, like one by the European Centre for International Political Economy, have shown that data localization has negative impacts on the economies that require it.

Adding to the mounting evidence against data localization, new research by Leviathan Security Group shows the harms at a smaller scale: direct cost of forced data localization to local businesses, rather than whole economies. The costs can be pretty dramatic:

...[W]e find that for many countries that are considering or have considered forced data localization laws, local companies would be required to pay 30-60% more for their computing needs than if they could go outside the country's borders.

Leviathan looked at the major public cloud providers who allow on-demand self-service provisioning through their infrastructure. The group includes Amazon Web Services, DigitalOcean, Google Compute Engine, HP Public Cloud, Linode, Microsoft Azure, and Rackspace Cloud Servers. Consumers in affected countries might be able to find other cloud providers, but many of these providers don't allow self-service provisioning, instead requiring a confidentiality agreement, a full business-to-business agreement, or other paperwork. In many countries, cloud providers won't be available at all, so businesses must make major capital investments in computer hardware and infrastructure, rather than being able to take advantage of flexible and cost-saving per-use models.

Leviathan created an interactive visualization that allows anyone to compare all the cloud vendors by location and price around the world. You can check out this study and the visualization, along with their previous work on cloud security, at

Monday, June 8, 2015

Smart Maps for Smart Cities: India’s $8 Billion+ Opportunity

Gaurav Gupta is Dalberg's Regional Director for Asia.

Did you know that India is expected to see the greatest migration to cities of any country in the world in the next three decades, with over 400 million new inhabitants moving into urban areas? To accommodate this influx of city dwellers, India’s urban infrastructure will have to grow, too.

That growth has already begun. In the last six years alone, India’s road network has already expanded by one-quarter, while the number of total businesses increased by one-third.

To better understand how smart maps—citizen-centric maps that crowdsource, capture, and share a broad range of detailed data—can help India develop smarter and more efficient cities, our team at Dalberg Global Development Advisors worked with the Confederation of Indian Industry on a new study, Smart Maps for Smart Cities: India’s $8 Billion+ Opportunity. What we found was that even for a select set of use cases, smart maps can help India gain over USD $8 billion in savings and value, save 13,000 lives, and reduce one million metric tons of carbon emissions a year in cities alone. Their aggregate impact is likely to be several multiples higher.

Our research shows that simple improvements in basic maps can lead to significant social impact: smart maps can also help businesses attract more consumers, increase foreign tourist spending and even help women feel safer.

In these quickly changing cityscapes, online tools like maps need to be especially dynamic, able to update faster and quickly expand coverage of local businesses in order to serve as highly useful tools for citizens. Yet today, most cities lack sophisticated online tools that make changing information, like road conditions and new businesses, easy to find online. Only 10-20% of the India’s businesses, for instance, are listed on online maps.

So what will it take to continue developing smart maps to help power these cities? Our study shows that India will need to embrace a new policy framework that truly encourages scalable solutions and innovation by promoting crowdsourcing and creating a single accessible point of contact between government and the local mapping industry.

Friday, June 5, 2015

Moving beyond the binary of connectivity

Back in April, we shared a post from designer and Internet researcher An Xiao Mina about the "sneakernet." She has a new post on The Society Pages in which she sets out to define a concept she calls the binary of connectivity.

But what exactly is this binary of connectivity? Attendees at my talk asked me to define it, and I’d like to propose a working definition:

The connectivity binary is the view that there is a single mode of connecting to the internet — one person, one device, one always-on subscription.

The connectivity binary is grounded in a Western, urban, middle class mode of connectivity; this mode of connecting is seen as the penultimate realization of our relationship to the internet and communications technologies. Thinking in a binary way renders other modes of access invisible, both to makers and influencers on the internet and to advertising engines and big data, and it limits our understanding of the internet and its global impact.

I can imagine at least two axes of a connectivity spectrum: single vs. shared usage, and continuous vs. intermittent access. For many readers of Cyborgology, single usage, continuous access to the web is likely the norm. The most extreme example of this might be iconized in the now infamous image of Robert Scoble wearing Google Glass in the shower–we are always connected, always getting feeds of data our way.

Here’s how other sections of those axes might map to practices I’ve observed in different parts of the world. Imagine these at differing degrees away from the center of a matrix:

  • Shared Usage, Continuous Access: I saved up to buy a laptop with a USB stick that my family of four can use. We take turns using it, and our connection is pretty stable.

  • Single, Intermittent: I have a low-cost Chinese feature phone (maybe a Xiaomi), and I pay a few dollars each month for 10 MB of access. I keep my data plan off most of time.

  • Shared, Intermittent: I walk all day to visit an internet cafe once every few months to check my Facebook account, listen to music on YouTube and practice my typing skills. I don’t own a computer myself.

For the purposes of simplicity, I’m assuming that we’re talking about devices that have one connection. But, of course, some devices have multiple connections (think of a phone with multiple SIMs) and some connections have multiple devices (think of roommates sharing a wifi router).

Read the full post here.

Wednesday, May 27, 2015

Housing Data Hub - from Open Data to Information

Joy Bonaguro Chief Data Officer, City and County of San Francisco. This is a repost from April at announcing the launch of their Housing Data Hub.

Housing is a complex issue and it affects everyone in the City. However, there is not a lot of broadly shared knowledge about the existing portfolio of programs. The Hub puts all housing data in one place, visualizes it, and provides the program context. This is also the first of what we hope to be a series of strategic open data releases over time. Read more about that below or check out the Hub, which took a village to create!

Evolution of Open Data: Strategic Releases

The Housing Data Hub is also born out of a belief that simply publishing data is no longer sufficient. Open data programs need to take on the role of adding value to open data versus simply posting it and hoping for its use. Moreover, we are learning how important context is to understanding government datasets. While metadata is an essential part of context, it’s a starting not endpoint.

For us a strategic release is one or more key datasets + a data product. A data product can be a report, a website, an analysis, a package of visualizations, an get the idea. The key point: you have done something beyond simply publishing the data. You provide context and information that transforms the data into insights or helps inform a conversation. (P.S. That’s also why we are excited about Socrata’s new dataset user experience for our open data platform).

Will we only do strategic releases?

No! First off - it’s a ton of work and requires amazing partnerships. Strategic (or thematic) releases should be a key part of an open data program but not the only part. We will continue to publish datasets per department plans (coming out formally this summer). And we’ll also continue to take data nominations to inform department plans.

We’ll reserve strategic releases to:

  • Address a pressing information gap or need
  • Inform issues of high public interest or concern
  • Tie together disparate data that may otherwise be used in isolation
  • Unpack complex policy areas through the thoughtful dissemination of open data
  • Pair data with the content and domain expertise that we are uniquely positioned to offer (e.g answer the questions we receive over and over again in a scalable way)
  • Build data products that are unlikely to be built by the private sector
  • Solve cross-department reporting challenges

And leverage the open data program to expose the key datasets and provide context and visualizations via data products.

We also think this is a key part of broadening the value of open data. Open data portals have focused more on a technical audience (what we call our citizen programmers). Strategic releases can help democratize how governments disseminate their data for a local audience that may be focused on issues in addition to the apps and services built on government data. It can also be a means to increase internal buyin and support for open data.

Next steps

As part of our rolling release, we will continue to work to automate the datasets feeding the hub. You can read more about our rollout process, inspired by the UK Government Digital Service. We’ll also follow up with technical post on the platform, which is available on GitHub, including how we are consuming the data via our open data APIs.