It's time for another dose of insights from the interviews Alexander Gostev has conducted with various stakeholders during the engagement process. Learn what they had to say and how it'll make CKAN 3.0 even better than before. Stay tuned for more updates!
The consulting operates locally. The interviewee’s job is to take care of customers. She’s working mostly on Innovative and complex projects.
CKAN is open-source, and it’s a big advantage
Sustainability (active community) 🟢
A small open data team is there
The respondent lost several leads on open data 🔴
Smaller cantons don’t have the resources to invest in a comprehensive tech
Several cantons asked for CKAN but didn’t move on
CKAN is still a tool for advanced users
It worked great for national-level projects
Fronted decoupling helped a lot for one of their big national-level clients 🟢
They use a lot of harvesters 💡
Data visualization is an area of improvement.
It’s possible to pre-set some dimensions →, but you don’t know the data well 💡
Without dimensions, it’s an empty
It takes a long while to figure out what’s there
Data preview is crucial
As a consumer, I don’t want to download CSV or JSON, I want an easy way to preview the data 💡
It’s always a question about where to store data. People need access to the primary data. They need a tool to publish your data - you need to store it somehow. 💡
In [european country], in the next two years, there would be more data hubs:
Users need data hubs more than catalogues. They want to access it through API 💡
Not only links to sources
Access the data itself
It’s not cool how data upload works via CKAN API 🔴
Data hub - an application where the primary function is that you can retrieve data without having to know what the source is. So it merges data from several sources. Geo data is a good example as it has a high data standardization; data either have the same structure or… You can merge them easily.
Example: Data hub from Google for the automotive industry: Gaia-X
With triple stores, you can set a query and can do creative queries and federated queries going to several (triple stores) sources.
A lot of federal offices publish data this way (💡💡 client segment workflow)
It’s great for advanced developers.
To sell CKAN broadly as an internal tool, it lacks
User management
Security features
With the CKAN the issue is that it uses relational databases only 🔴
The interviewee concentrated on DCAT-AP standard
[european country] adopted and updated d-cat standard
Issue: DCAT-AP is a graph, and you can’t implement it as you should turn it into the table
DCAT-AP is important is the European standard for open data (💡💡 regulation)
DCAT plugin is cumbersome if you’d want to change things in the data model (hard to customize for customer needs), but it’s quick and easy with the graph database
It takes a time to establish a model, it should be thought through from the start
With graph database projects will be more Agile
The interviewee did a prototype with Neo4J, so it changes CKAN so much that it’s not CKAN anymore. It communicates how deeply relational databases are integrated.
The common European standard for open (government) data is DCAT-AP. The Swiss flavour of it is DCAT-AP-CH. CKAN could be better at supporting DCAT-AP, since it's such an important standard for European open data.
Software the respondent is looking into
Neo4J is the graph database she’s looking into
Similar to the Link Data type, triple data stores [+]
Open source AND open data standards (linked data) [+]
Not a large community, basically one company [-]
Success metrics
Digitalization / Open data (Chief Digital Officer)
The interviewee needs to prove that she is publishing
Amount of datasets published
Public administration (public servants)
Time to find the data
Increasing the quality of the output
In old days (still somewhere). Call colleagues; yes, we have the data; how to send it to you?
It’s saving them a lot of time
Utility company with an internal catalogue
Legal compliance
Time to satisfy the request
1-2 working days instead of 1-2 months
Respondent 12: Social Entrepreneur
Interview date: 27 June 2022
OVERVIEW
The interviewee hosts a catalogue (data hub) around one topic, where 45-50 people/organizations contribute data. She hosts around 200 datasets. The volume of data will increase in the next year. Smaller groups are publishing data and making it publicly available. The respondent wants local governments to make decisions based on her data, it’s her goal.
CKAN is also used as an archive. Historical data is important!
The interviewee considered several options for a software
Had a big steering community
CKAN was selected through a rigorous process 🎉
She’s happy with what CKAN is doing 🟢
Uses a couple of custom plugins
Map search
Data loader
XLS support (doesn’t work well) 🔴
Issues
The main challenge - a lot of contributors aren’t sophisticated in digital technology.
The interviewee spend a lot of time checking and helping 🔴
Current CKAN UI is easy to use for 50% of users but hard for 50% of others 🔴
Data set / Resource - people struggling to understand the difference and the management process 🔴
The respondent worked with UX designer
For data publishers - the process is repetitive
Eliminate repetition of the same steps 💡
Add the ability to have similar metadata for different datasets 💡
One of the reasons - is data ownership and data governance
Add the ability to add more info for each organization (to the profile pages) 💡
Website field (not intuitive)
Photos
More Description
How to make CKAN support easier
User roles +1
Data publisher role
Add to all orgs, batch add 💡
Our contributors sometimes don’t publish data 💡
Have better support for Excel files 💡
Lab results, multiple sheets, hard to convert it to CSV
Secure data sharing 💡
Eliminating steps → creating accounts, adding them to the organization
Private datasets
Encrypted from unwanted access
Private companies - mining company
Data to be supplied to small group of people
Success metric for interviewee’s customers
Number of contributors
Downloads, citations, backlinks, requests to the datasets
Long-term datasets
Respondent 13: Tech Lead
Interview date: 28 June 2022
OVERVIEW
The interviewee is a Developer, Contributor, Integrator with a tech background. He’s encouraging scientists to use data and teaching them about metadata. Does a lot of work on data-policy, general research data management. Likes CKAN as it’s an easy storage for meta-data 🟢 For the respondent it’s important to meet requirements of a journal for publishing data.
Plugins in use
mapping
geo-spacial
one that allows embedding PDF, Word, Excel files
harvesting
CKAN issues
Keep updated CKAN extension catalogue is interviewee’s job. But it’s hard to understand that an extension is updated and ready to work. 💡
User management / Authentication 💡
Group access to a particular dataset
Allowing one user to be in a couple of groups
Levels of access
Migration between Pylons and Flask (2.8-2.9)
Documentation sometimes don’t have date 🔴
Sometimes he does job twice to cover for both ways for Pylons and Flask 🔴
Python 3.0 would be nice to add support faster 🔴
If the respondent is doing something simple but unusual and core team could respond in 1-2 weeks 🔴
Datasets versioning is important for him
Jobs customers do
Publishing geo-spacial data
Publishing metadata (sensitive data not posted)
Metric for publishing success
Increase in citations
Decrease overhead for publishers (accessibility for publishers)
Respondent 14: Open-data team
Interview date: 29 June 2022
OVERVIEW
They oversee open-data for 17 provinces. They harvest data from different sources then to submit to national portal based in the capital. They supply 20% of open-data in [European country]. They manage 8000 datasets and plan to double the quantity over the year 📈 Handling the volume of the data is their main concern.
Areas they suggest to tackle
No single admin interface. There should be a back-office to administer all plugins and URL’s at one place
It’s no direct link to “all users” 💡
You have to type URL, know it
You have to go to documentation and find out what is it in the platform
All links are in the documentation but not in the interface
Sometimes it’s a lack of meta-data
Data dictionary is erased every time they update data 💡
CKAN handles badly updating datasets
Normalization of forms 💡
It would be convenient to have a list of what we should add
It would be easier for users to have options in the list