Blog

CKAN 3.0 Product Strategy Research (part 3)

It's time for another dose of insights from the interviews Alexander Gostev has conducted with various stakeholders during the engagement process. Learn what they had to say and how it'll make CKAN 3.0 even better than before. Stay tuned for more updates!

08-CKAN 3-product strategy research-02-01.png

STAKEHOLDER ENGAGEMENT RESULTS 11-15 of 37


Respondent 11: Business Developer

Interview date: 22 June 2022

OVERVIEW

The consulting operates locally. The interviewee’s job is to take care of customers. She’s working mostly on Innovative and complex projects.

  • CKAN is open-source, and it’s a big advantage
    • Sustainability (active community) 🟢
  • A small open data team is there
    • The respondent lost several leads on open data 🔴
    • Smaller cantons don’t have the resources to invest in a comprehensive tech
    • Several cantons asked for CKAN but didn’t move on
  • CKAN is still a tool for advanced users
    • It worked great for national-level projects
    • Fronted decoupling helped a lot for one of their big national-level clients 🟢
  • They use a lot of harvesters 💡
  • Data visualization is an area of improvement.
    • It’s possible to pre-set some dimensions →, but you don’t know the data well 💡
      • Without dimensions, it’s an empty
      • It takes a long while to figure out what’s there
  • Data preview is crucial
    • As a consumer, I don’t want to download CSV or JSON, I want an easy way to preview the data 💡
  • It’s always a question about where to store data. People need access to the primary data. They need a tool to publish your data - you need to store it somehow. 💡
  • In [european country], in the next two years, there would be more data hubs:
    • Users need data hubs more than catalogues. They want to access it through API 💡
    • Not only links to sources
    • Access the data itself
    • It’s not cool how data upload works via CKAN API 🔴
  • Data hub - an application where the primary function is that you can retrieve data without having to know what the source is. So it merges data from several sources. Geo data is a good example as it has a high data standardization; data either have the same structure or… You can merge them easily.
    • Example: Data hub from Google for the automotive industry: Gaia-X
  • With triple stores, you can set a query and can do creative queries and federated queries going to several (triple stores) sources.
    • A lot of federal offices publish data this way (💡💡 client segment workflow)
    • It’s great for advanced developers.
  • To sell CKAN broadly as an internal tool, it lacks
    • User management
    • Security features
  • With the CKAN the issue is that it uses relational databases only 🔴
    • The interviewee concentrated on DCAT-AP standard
      • [european country] adopted and updated d-cat standard
      • Issue: DCAT-AP is a graph, and you can’t implement it as you should turn it into the table
      • DCAT-AP is important is the European standard for open data (💡💡 regulation)
      • DCAT plugin is cumbersome if you’d want to change things in the data model (hard to customize for customer needs), but it’s quick and easy with the graph database
    • It takes a time to establish a model, it should be thought through from the start
      • With graph database projects will be more Agile
      • The interviewee did a prototype with Neo4J, so it changes CKAN so much that it’s not CKAN anymore. It communicates how deeply relational databases are integrated.
  • The common European standard for open (government) data is DCAT-AP. The Swiss flavour of it is DCAT-AP-CH. CKAN could be better at supporting DCAT-AP, since it's such an important standard for European open data.

Software the respondent is looking into

  • Neo4J is the graph database she’s looking into
    • Similar to the Link Data type, triple data stores [+]
    • It has better tools [+]
    • It’s a specific (not universal) query [-]
  • Link Data
    • Used by schema.org [+]
    • It’s compatible [+]
    • Tools aren’t that advanced [-]
  • Entryscape
    • Good for visualization [+]
    • Open source AND open data standards (linked data) [+]
    • Not a large community, basically one company [-]

Success metrics

  • Digitalization / Open data (Chief Digital Officer)
    • The interviewee needs to prove that she is publishing
    • Amount of datasets published
  • Public administration (public servants)
    • Time to find the data
    • Increasing the quality of the output
    • In old days (still somewhere). Call colleagues; yes, we have the data; how to send it to you?
    • It’s saving them a lot of time
  • Utility company with an internal catalogue
    • Legal compliance
    • Time to satisfy the request
    • 1-2 working days instead of 1-2 months

Respondent 12: Social Entrepreneur

Interview date: 27 June 2022

OVERVIEW

The interviewee hosts a catalogue (data hub) around one topic, where 45-50 people/organizations contribute data. She hosts around 200 datasets. The volume of data will increase in the next year. Smaller groups are publishing data and making it publicly available. The respondent wants local governments to make decisions based on her data, it’s her goal.

  • CKAN is also used as an archive. Historical data is important!
  • The interviewee considered several options for a software
    • Had a big steering community
    • CKAN was selected through a rigorous process 🎉
    • She’s happy with what CKAN is doing 🟢
  • Uses a couple of custom plugins
    • Map search
    • Data loader
    • XLS support (doesn’t work well) 🔴

Issues

  • The main challenge - a lot of contributors aren’t sophisticated in digital technology.
    • The interviewee spend a lot of time checking and helping 🔴
    • Current CKAN UI is easy to use for 50% of users but hard for 50% of others 🔴
    • Data set / Resource - people struggling to understand the difference and the management process 🔴
  • The respondent worked with UX designer
    • For data publishers - the process is repetitive
    • Eliminate repetition of the same steps 💡
      • Add the ability to have similar metadata for different datasets 💡
  • One of the reasons - is data ownership and data governance
  • Add the ability to add more info for each organization (to the profile pages) 💡
    • Website field (not intuitive)
    • Photos
    • More Description

How to make CKAN support easier

  • User roles +1
    • Data publisher role
    • Add to all orgs, batch add 💡
    • Our contributors sometimes don’t publish data 💡
  • Have better support for Excel files 💡
    • Lab results, multiple sheets, hard to convert it to CSV
  • Secure data sharing 💡
    • Eliminating steps → creating accounts, adding them to the organization
    • Private datasets
    • Encrypted from unwanted access
    • Private companies - mining company
      • Data to be supplied to small group of people
  • Success metric for interviewee’s customers
    • Number of contributors
    • Downloads, citations, backlinks, requests to the datasets
    • Long-term datasets

Respondent 13: Tech Lead

Interview date: 28 June 2022

OVERVIEW

The interviewee is a Developer, Contributor, Integrator with a tech background. He’s encouraging scientists to use data and teaching them about metadata. Does a lot of work on data-policy, general research data management. Likes CKAN as it’s an easy storage for meta-data 🟢 For the respondent it’s important to meet requirements of a journal for publishing data.

Plugins in use

  • mapping
  • geo-spacial
  • one that allows embedding PDF, Word, Excel files
  • harvesting

CKAN issues

  • Keep updated CKAN extension catalogue is interviewee’s job. But it’s hard to understand that an extension is updated and ready to work. 💡
  • User management / Authentication 💡
    • Group access to a particular dataset
    • Allowing one user to be in a couple of groups
    • Levels of access
  • Migration between Pylons and Flask (2.8-2.9)
    • Documentation sometimes don’t have date 🔴
    • Sometimes he does job twice to cover for both ways for Pylons and Flask 🔴
  • Python 3.0 would be nice to add support faster 🔴
  • If the respondent is doing something simple but unusual and core team could respond in 1-2 weeks 🔴
  • Datasets versioning is important for him

Jobs customers do

  • Publishing geo-spacial data
  • Publishing metadata (sensitive data not posted)

Metric for publishing success

  • Increase in citations
  • Decrease overhead for publishers (accessibility for publishers)


Respondent 14: Open-data team

Interview date: 29 June 2022

OVERVIEW

They oversee open-data for 17 provinces. They harvest data from different sources then to submit to national portal based in the capital. They supply 20% of open-data in [European country]. They manage 8000 datasets and plan to double the quantity over the year 📈 Handling the volume of the data is their main concern.

Areas they suggest to tackle

  • No single admin interface. There should be a back-office to administer all plugins and URL’s at one place
    • It’s no direct link to “all users” 💡
      • You have to type URL, know it
      • You have to go to documentation and find out what is it in the platform
      • All links are in the documentation but not in the interface
  • Sometimes it’s a lack of meta-data
    • Data dictionary is erased every time they update data 💡
    • CKAN handles badly updating datasets
  • Normalization of forms 💡
    • It would be convenient to have a list of what we should add
    • It would be easier for users to have options in the list
    • List of tags
    • Normalization according to national standard
  • Relationships between resources and datasets 💡
  • Searches and filters
    • add AND and OR operators 💡
    • Keywords required to make search better (invisible tags to support searching but not overload user) 💡
  • Have API documentation in Swagger 💡
  • They work with d-cat to create personalized metadata. It’d be nice to have a specific metadata to introduce actualization frequency on each dataset. 💡
    • yearly
    • monthly
    • weekly
  • Possibility to download meta-data for a specific dataset in different formats 💡
    • CSV
    • JSON
    • currently with d-cat you have the full catalogue of datasets for download

Additionally

  • Plugins used
    • Harvest
    • Showcase
    • D-CAT
  • Meta-data
    • To fulfill it they use custom profiles
      • d-cat has standard for data
      • if you meet something out of the standard
      • they use custom profile which is the part of d-cat
  • They work mainly with data administrator profiles
    • Now they want to start with members, they are at the beginning
    • They’re experimenting with roles
    • It makes sense to give a specific task to a member, having more flexibility 💡

Respondent 15: tech, product and business professional insights

Interview date: 1 July 2022

OVERVIEW

  • Open Data ecosystem in USA is captured by vendors - it’s not cool
  • The catalogue now isn’t as transformative as it was when it was initiated
    • Catalogue = open share
    • It’s better if you can provide data and results in the form of a community 💡
    • Deeper experience is needed
  • Who pays for it?
    • Build something that can be sold
    • OK for using is enough, success isn’t in perfect UX
  • Integrating CKAN into a data pipeline will streamline it’s adoption
    • CKAN + Zapier → Zapier snippets 💡
  • JKAN - you can use github pages to dramatically decrease costs of deploying open-data strategy
    • Few cities adopted it
    • Explored it technically and operationally over years
    • The project stuck with GitHub, but it’s perfectly fine
  • Cost of CKAN went down over years 🟢
  • Used CKAN API, was satisfied 6-7 years ago 🟢
  • Basic CKAN provides a good enough dashboards 🟢
08-CKAN 3-product strategy research-02-01.png
CKAN Product strategy