In Category on 11 Oct 2024
Enhancing DCAT support in CKAN (DCAT-AP v3, scheming integration, and more)
A review of the recent developments in CKAN's DCAT support, and how you can get involved
It's unusual for Google to talk much about search feature plans in advance, but in this case I can say with confidence "we are still figuring out the details!", and that the shape of actual real-world data will be a critical part of that. That is why we put up the documentation as early as possible. If all goes according to plan, we will indeed make it substantially easier for people to find datasets via Google; whether that is via the main UI or a dedicated interface (or both) is yet to be determined. Dataset search has various special challenges which is why we need to be non-comital on the details at the stage, and why we hope publishers will engage with the effort even if it's in its early stages...This feature is deployed on the CKAN demo instance, so let’s look at an example. I can use the API to get a dataset as JSON-LD. So for the dataset Energy in Málaga, I could build the URL like that:
{ "@context": { "adms": "http://www.w3.org/ns/adms#", "dcat": "http://www.w3.org/ns/dcat#", "dct": "http://purl.org/dc/terms/", "foaf": "http://xmlns.com/foaf/0.1/", "gsp": "http://www.opengis.net/ont/geosparql#", "locn": "http://www.w3.org/ns/locn#", "owl": "http://www.w3.org/2002/07/owl#", "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#", "rdfs": "http://www.w3.org/2000/01/rdf-schema#", "schema": "http://schema.org/", "skos": "http://www.w3.org/2004/02/skos/core#", "time": "http://www.w3.org/2006/time", "vcard": "http://www.w3.org/2006/vcard/ns#", "xsd": "http://www.w3.org/2001/XMLSchema#" }, "@graph": [ { "@id": "https://demo.ckan.org/dataset/c8689e49-4fb2-43dd-85dd-ee243104a2a9", "@type": "dcat:Dataset", "dcat:contactPoint": { "@id": "_:N71006d3e0205458db0cc7ced676f91e0" }, "dcat:distribution": [ { "@id": "https://demo.ckan.org/dataset/c8689e49-4fb2-43dd-85dd-ee243104a2a9/resource/c3c5b857-24e7-4df7-ae1e-8fbe29db93f3" }, { "@id": "https://demo.ckan.org/dataset/c8689e49-4fb2-43dd-85dd-ee243104a2a9/resource/5ecbfa6c-9ea0-4f5f-9fbe-eb39964c0f7f" }, { "@id": "https://demo.ckan.org/dataset/c8689e49-4fb2-43dd-85dd-ee243104a2a9/resource/b74584c7-9a9a-4528-9c73-dc23b29c084d" } ], "dcat:keyword": [ "energy", "málaga" ], "dct:description": "Some energy related sources from the city of Málaga", "dct:identifier": "c8689e49-4fb2-43dd-85dd-ee243104a2a9", "dct:issued": { "@type": "xsd:dateTime", "@value": "2017-06-25T17:02:11.406471" }, "dct:modified": { "@type": "xsd:dateTime", "@value": "2017-06-25T17:05:24.777086" }, "dct:publisher": { "@id": "https://demo.ckan.org/organization/f0656b3a-9802-46cf-bb19-024573be43ec" }, "dct:title": "Energy in Málaga" }, { "@id": "https://demo.ckan.org/organization/f0656b3a-9802-46cf-bb19-024573be43ec", "@type": "foaf:Organization", "foaf:name": "BigMasterUMA1617" }, { "@id": "https://demo.ckan.org/dataset/c8689e49-4fb2-43dd-85dd-ee243104a2a9/resource/b74584c7-9a9a-4528-9c73-dc23b29c084d", "@type": "dcat:Distribution", "dcat:accessURL": { "@id": "http://datosabiertos.malaga.eu/recursos/energia/ecopuntos/ecoPuntos-23030.csv" }, "dct:description": "Ecopuntos de la ciudad de málaga", "dct:format": "CSV", "dct:title": "Ecopuntos" }, { "@id": "https://demo.ckan.org/dataset/c8689e49-4fb2-43dd-85dd-ee243104a2a9/resource/c3c5b857-24e7-4df7-ae1e-8fbe29db93f3", "@type": "dcat:Distribution", "dcat:accessURL": { "@id": "http://datosabiertos.malaga.eu/recursos/ambiente/telec/201706.csv" }, "dct:description": "Los datos se corresponden a la información que se ha decidido historizar de los sensores instalados en cuadros eléctricos de distintas zonas de Málaga.", "dct:format": "CSV", "dct:title": "Lecturas cuadros eléctricos Junio 2017" }, { "@id": "https://demo.ckan.org/dataset/c8689e49-4fb2-43dd-85dd-ee243104a2a9/resource/5ecbfa6c-9ea0-4f5f-9fbe-eb39964c0f7f", "@type": "dcat:Distribution", "dcat:accessURL": { "@id": "http://datosabiertos.malaga.eu/recursos/ambiente/telec/nodos.csv" }, "dct:description": "Destalle de los cuadros eléctricos con sensores instalados para su gestión remota.", "dct:format": "CSV", "dct:title": "Cuadros eléctricos" }, { "@id": "_:N71006d3e0205458db0cc7ced676f91e0", "@type": "vcard:Organization", "vcard:fn": "Gabriel Requena", "vcard:hasEmail": "gabi@email.com" } ] }Google even provides a Structured Data Testing Tool where you can submit a URL and it will tell you if the data is valid. Of course knowing the CKAN API is good if you’re a developer, but not really the way to go if you want a search engine to find you datasets. So the JSON-LD that you can see above, is already embedded on the dataset page (check out the testing tool with just the dataset URL) . So if you have enabled this feature, every time a search engine visits your portal, it’ll get structured information about the dataset it crawls instead of simply the HTML of the page. Check the documentation for more information, but most importantly: if you’re running CKAN, give it a try! By the way: if you already have a custom profile for ckanext-dcat (e.g. for a DCAT application profile), check my current pull request for a mapping DCAT-AP Switzerland to schema.org/Dataset.
A review of the recent developments in CKAN's DCAT support, and how you can get involved
CKAN 2.11 introduces Table Designer: form builder and enforced validation for your data