Blog

Meetup recap | Civic Data Lab: Pioneering Open Data Platforms in India

This is a recap of our latest CKAN Monthly Live meetup with CivicDataLab, an innovative Indian organization focusing on data, tech, and social good. They use open-source tools like CKAN to empower citizens and improve public reforms. For deeper insights, read the full blog post!

CivicDataLab

📖 Useful files:
- Meeting notes
- CivicDataLab's presentation

Introduction

At our most recent CKAN Monthly Live meetup, we had the opportunity to hear from CivicDataLab, a pioneering organization in India that works at the intersection of data, technology, design, and social science. They work to harness the potential of the open-source movement to enable citizens to engage better with public reforms. Their mission revolves around empowering civil society organizations, governments, non-profits, think tanks, media houses, universities, and other stakeholders through data-driven decision-making.

Read on to discover the success stories behind Open Budgets India, Open City, and JusticeHub and how CivicDataLab leveraged CKAN to innovate in public policy and civic engagement - practical insights you can apply yourself!

We were happy to have a panel of dedicated professionals from CivicDataLab joining us: Deepthi Chand, DC (Co-Founder), Abhinav Singh (Tech Lead), Shoaib Ahmed (Associate Lead Engineer - FrontEnd), and Sai Krishna (Data Engineer).

CivicDataLab

The approach

CivicDataLab has a multi-disciplinary approach to solving societal issues. Starting with public finance, the organization has expanded its efforts to include law and justice, climate change, and digital public goods, among other areas. The CivicDataLab model is iterative: begin by opening up datasets, collaborate with various stakeholders to develop tailored data solutions, and then work on enhancing the data literacy of these stakeholders. Importantly, CivicDataLab prioritizes capacity-building for various stakeholders to adopt these data solutions. The cycle then recommences, to identify more data to liberate and new solutions to develop.

Strengthening the Data-Driven Governance

Diverse Initiatives

Open Budgets India (OBI)

CivicDataLab introduced Open Budgets India (OBI), a platform that provides stakeholders with a unified view of budget documents from different levels of the government.

  • Purpose: To improve transparency in India’s governance and budgetary processes.
  • Features:
    • Over 20,000 budget datasets in an open and accessible format.
    • The core is built on CKAN, supplemented by CKAN Pages Plugin for basic navigation.
    • Complex pages for additional resources.
    • Offers an education platform called "Budget Basics" to help users understand budget terminologies.
    • Uses CKAN's organizational features to categorize these large datasets at multiple levels—union budget, state budget, etc.—to make navigation easier for users.

Open Budgets India

Justice Hub

JusticeHub serves as a data exchange platform in the legal ecosystem. It diverges significantly from the OBI platform by taking customization to a higher level, particularly in dataset creation and presentation. It stands out for its customized user interface and metadata designed to meet the needs of various stakeholders.

  • Purpose: To offer a highly customized interface and data organization for various stakeholders in the legal field, enriching both information access and interaction.
  • Features:
    • Customized UI specific to the needs of the legal ecosystem.
    • Focuses on more customized metadata and datasets designed for diverse stakeholder requirements.
    • Advanced dataset creation and presentation capabilities, incorporates special presentation styles and data flows.
    • Features research reports, process methodologies, and a data dictionary to add depth to the data provided.
    • Custom JavaScript to guide user flows and integration with other external systems, including a discourse forum for judicial discussions.
    • Uses CKAN's group feature for improved data categorization.
    • Integrated plugins to elevate the data exploration experience.

Justice Hub

Open City

CivicDataLab's Open City initiative is designed for ordinary citizens. It shifts the focus from merely presenting data to enhancing user engagement and education, particularly for those not traditionally data-savvy.

  • Purpose: To make city data accessible and understandable to the general public, thereby demystifying city-based issues and enhancing civic participation.
  • Features:
    • UI tailored to urban communities.
    • Designed for individuals who may not have data expertise.
    • Focuses on urban ecosystems, with most data coming from Bangalore.
    • Provides more than just raw data, including guides and explainers on city issues.
    • Includes a community layer for managing conversations between users. Community engagement is fueled by integration with WordPress CMS, which handles guides, announcements, and stories.
    • Includes customized data views, with a focus on map-based views. Also integrates grievance mechanisms.

Open City

Challenges

CivicDataLab faced several challenges in making data useful and accessible. First, they need to create platforms that do more than just list data. They have to provide useful insights tailored to different needs. Second, the platforms must be flexible enough to allow custom views based on the user's role or specific case. Third, the way data is organized needs to be more specific to the situation, going beyond basic categories like ‘resources’, 'datasets' etc. Finally, while the data may be stored in standard formats like CSV, it should be displayed in various formats to meet different user needs.

Decoupling CKAN's front end from its back end

During deployments, a key realization was the need for custom presentation methods to effectively reach different stakeholders, especially those who aren't tech-savvy. That’s when they began to explore decoupling CKAN's front end from its back end. This allows for a more customized interface while leveraging CKAN's robust data management strengths.

Use cases

CivicDataLab has gone the extra mile to create specialized dashboards to suit unique needs - Sector Dashboards, Constituency Dashboards, Budgets for Justice, Zombie Tracker, and Data 4 Districts, to name a few. These dashboards enable nuanced, domain-specific analysis and queries, serving as hubs for stakeholder engagement.

  • Union Budget Explorer: A tool that helps in understanding the national budget, offering views that can change dynamically from a finance-oriented view to others.
  • Constituency Dashboard: This explores data at a geographical level, showcasing how different types of data interact within specific constituencies.
  • Budget for Justice: This explores the financial aspects of the judiciary system, offering a different hierarchical structure to explore the data.
  • Zombie Tracker: An example of how the platform can be tailored for extremely specific cases, focusing on a specific issue within the country and offering a data-centric view.

Key Learnings

The key learnings from CivicDataLab's approach include:

  • Data Insights as Important as Data Availability
    • Data insights are equally as important as making the data available to end-users.
  • Domain-Specific Analysis and Queries
    • Platform-specific analyses are crucial. The example of Zombie Tracker demonstrates the need for domain-specific analysis. Avoid a one-size-fits-all approach and tailor data and its presentation to specific user needs, such as specialized queries for targeted segments.
  • Engagement with the End Users
    • Engagement with end-users is essential for the platform's success.
    • Open platforms like Discourse have been leveraged for user engagement, but there's an ongoing exploration to allow users to engage directly within the platform. It's better to limit the necessity for users to navigate multiple platforms for varied functionalities.
  • Data Standardization
    • Data must adhere to common data standards for easier integration across ecosystems. Open Contracting India serves as an example.
  • Data Processing Based on Queries
    • Data processing needs to be adaptable for presentation in various forms, requiring a custom processing layer.
  • Context-Specific API
    • There's a growing need for data to be contextualized based on specifics like budgets, law, and demographics.
    • Content needs to be customizable and specific to various platforms.
  • Data Reports
    • Detailed, time-series data reports are crucial for stakeholders, particularly decision-makers. These should highlight trends and patterns in data and its interaction.
  • Custom Licenses and Data Policies
    • The platforms aim to accommodate multiple types of datasets that come with their own licensing requirements. This means not just adopting universal or open licenses but allowing for custom ones that serve specific use cases.
    • Different data policies are essential for governing how users interact with the data. This can range from who has access to what, to how data can be used or disseminated.
    • Such customizability in licenses and policies is particularly crucial when one platform serves multiple stakeholders, including public entities and private organizations, each with distinct data handling and compliance needs.
  • Usage Reports and Moderation
    • Usage reports offer visibility into how data is being accessed and used. This is particularly important for the data providers, who may need to track how their data is interacted with.
    • Implementing moderation mechanisms to govern data access is essential. For instance, in the realm of sensitive or personally identifiable information, these mechanisms act as a safeguard, ensuring that only authorized users can access certain portions of the data.
  • Data Sharing Agreements
    • Mechanisms are being explored for more efficient and customizable data-sharing agreements to manage how data is disseminated among different entities, ensuring privacy and compliance.

Open Publishing (OPub)

The team is working on a specialized solution called Open Publishing (oPub). This effort is guided by core principles including privacy by design, design for scale, оpen for scale, and a commitment to open-source. The effort is focused on modularity and being data-driven, with the aim of adhering to diverse data standards, indicators, frameworks, custom licenses, etc. The team also envisions implementing customizable feedback loops and alerts, as well as incorporating user insights and real-time data analytics into the platform. The ultimate goal is to facilitate the easier and more robust adoption of the platform across various sectors, including climate, health, legal aid, and governance.

OPub

Conclusion

CivicDataLab's achievements mark a pivotal shift in how data can empower society. Using open data, they've made crucial information accessible to everyone, revolutionizing civic participation and transparency in governance. As they continue to innovate, they're setting a new standard for how data can drive social progress.

CKAN serves as the reliable data management backbone in their ventures, highlighting the software's adaptability for diverse applications. It's not just robust—it's adaptable. CKAN meets the unique needs of various projects, from budget transparency to urban planning. The bottom line? A solid backend like CKAN is the foundation for creating user-focused experiences that truly make an impact. If you want to see how data can revolutionize society, keep an eye on CivicDataLab and the power of CKAN.


Q&A

Q1: How much time did it take to go through all the processes?
A1: The projects have various time spans—eight years for Open Budgets India (OBI), three years for JusticeHub, and two years for Open City. The work presented is an accumulation of long-term learnings.

Q2: For the storytelling, did you use any other visualization tools to help facilitate the storytelling?
A2: WordPress was used for textual content moderation and country-specific content development. For visual elements, D3 was used for Union Budget Explorer, Apache SuperSet for some solutions, and Apache Echarts is being explored for specific use cases that require greater control over visual representation.

Q3: Any of these innovations being open source as well?
A3: All the platforms discussed are open source and come under the umbrella of the Civic Data Lab organization.

Q4: When you engage with end users, how do you define data standardization with them?
A4: Different strategies are adopted depending on the dataset. For procurement data, Open Contracting Partnership standards were employed. For end-users, data standardization is not the focus; instead, they are interested in specific use cases. The objective is to align data standards with the insights that the end users wish to derive.

Q5: What other technologies are you using alongside CKAN?
A5: In addition to CKAN, next.js and react.js are used for custom views. Python service layers are sometimes placed in between to process data. Other visualization tools like D3, eTots, and SuperSet are also used. Django CMS and WordPress have also been deployed on certain platforms.

Q6: Regarding ETL, are you building your own pipelines, or using any technology?
A6: Initially, Airflow was used for ETL tasks. However, to gain more programmatic control over pipelines, the team switched to using Prefect.

Q7: Who are the current users for the platform? Is there a revenue stream or is it a non-profitable contribution to the general public?
A7: The platforms are public goods with no revenue streams derived directly from them. Revenue is primarily generated through engagements with government entities, often centered around data analysis and promoting the open data ecosystem.

Q8: Are you integrating WordPress with CKAN and could you elaborate more on that method?
A8: WordPress is integrated with CKAN at specific touchpoints, primarily for data exchange, such as fetching Groups list or datasets. However, they have not integrated the authentication mechanisms; CKAN remains the primary authentication system.

Q9: What is the approach you take to do the data modeling?
A9: The approach to data modeling is dictated by the platform's specific use case and the insights it aims to provide. For example, in a sector dashboard, the sector becomes the first-class citizen of the data model. The overarching strategy is to develop data models that best serve the users' needs and the insights the platform seeks to convey.