Semantic Data Publication
Support data professionals and consumers discovering your data.
Data is useless if it’s too difficult to be used.
Your enterprise’s data is useless if your data professionals don’t know the kind of data you have, what is used, who is using it and how it needs to be protected.
As the volume of data will continue to increase and the sources are diversified, it is essential to enable data professionals to be more productive, helping them:
→ discovering which dataset exists, where it can be found, and where it comes from
→ understanding how data is produced, transformed and consumed across the organisation
→ identifying who is editing, creating data and managing ownership
→ figuring out how to use these datasets to test new hypotheses and generate new insights
A solution is to publish your data as openly as possible with the Linked Data Publication process to allow it to be linked and queried in a standard way.
This process describes and links data using open standards, including RDF (Resource Description Framework) and HTTP (Hypertext Transfer Protocol), making it more accessible for machines and humans to understand and navigate.
Its purpose is to simplify the integration and reuse of data from a wide range of sources, which will increase the value of the data and facilitate the creation of new applications and services.
What technologies do we use for Semantic Data Publication?
We don’t always select the same solution when we have to define a set of tools to publish and share data online. It always depends on the project’s needs and can be influenced by user needs, business needs, the sources of data… etc. And there is more to metadata management than the simple implementation of a data catalog solution. That’s why we value the fact that we can provide our customers with a hybrid solution that perfectly fits their data use case.
Just to name a few technologies we like to use to link and publish datasets:
- RDF (Resource Description Framework)
RDF is a standard for representing data on the web-based on the idea of representing data as a set of triples, consisting of a subject, predicate, and object. These triples can be used to describe the relationships between entities.
- Enterprise Knowledge Graphs
Enterprise Knowledge Graph (EKG) uses a similar model to extract and store data in knowledge graphs, databases specifically designed for storing and querying RDF data. It integrates and accesses information assets within an organization using data and metadata.
Hanami is a data editor solution for data quality validation and lineage traceability of RDF data. It automates the creation of data models and data editing using SHACL.
- Ontologies and Vocabularies
To enable data to be linked and understood consistently, it is important to use existing vocabulary and ontologies, such as DBpedia, schema.org, and FOAF, to describe the data and relationships between entities.
For the implementation of powerful search engines, we use Elasticsearch which provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. It is often used for web search, log analysis, and analyzing large data sets.
- Custom Data catalog development
It is essential to use a performant data catalog to enable the discovery of the data. Based on our experience we can provide the best custom solution to provide a centralized location for publishing and discovering linked data with a performant search engine.
Data catalogs are key to achieving the benefits of data publication
A Data Catalog is a collection of metadata about data sources and the relationships between them. It provides a centralized location for publishing and discovering linked data. It allows organizations to manage and publish their data in a consistent way, making it more discoverable and accessible to others. And it also enables the management of the data lifecycle, such as tracking usage, maintaining quality, and controlling access.
A data catalog typically includes tools for searching and automating the discovery of relevant data sets and can include features for curating the data, organising data sets, and enriching the data sets.
Data catalogs are also crucial to creating a data-driven culture.
For example, for a company with several departments, a data catalogue allows any employee to use resources outside his or her usual sphere of knowledge, incorporating, with confidence, the work done and validated by others. Employees from different departments can share and compare data to create solutions and ideas that benefit everyone.
Another relevant example is data from a government agency that can be linked to data from a private company, enabling the creation of new services that would not be possible with data alone. Several governments have relied on Cognizone to create their ecosystem of applications for Linked Open Legislation: Fedlex, for the Federal Chancellery of Switzerland and Legilux for the Grand Duchy of Luxembourg are great examples of data from different sources being integrated and reused in innovative ways.
By adopting a data catalog, organizations can improve the accessibility, interoperability, and reusability of their data. However, publishing data online and developing efficient applications require many factors to be considered before you can be able to gain new insights and opportunities, and increase transparency and accountability.
We are very familiar with this type of challenge and thanks to our technical knowledge with a particular focus on semantic technologies we can provide you with the solutions to semantically link and share your data while providing safe access to it.
Some examples of Data Platforms and Catalogs we've implemented
We enhance our clients’ data ecosystem by connecting data, strategy, and value.