Unofficial Draft
Copyright © 2026 the document editors/authors. Text is available under the Creative Commons Attribution 4.0 International Public License; additional terms may apply.
This document defines the data model of BELTRANS.
This document is a draft of a potential specification. It has no official standing of any kind and does not represent the support or consensus of any standards organization.
This document was created as part of the BELSPO-funded BRAIN2.0 research project BELTRANS.
The project Intra-Belgian literary translations since 1970 (BELTRANS) studies the untold history of literary translation flows in Belgium between French and Dutch in the period 1970-2020.
As part of the research activities, a corpus of bibliographic and authority metadata was created based on various data sources. For semantic interoperability we used the Resource Description Framework (RDF) to integrate the data.
In essence, we use concepts of the W3C Provenance Ontology (PROV-O) as basis and reuse as much as possible RDF terms from the common schema.org vocabulary and other more specialized vocabularies such as from the Bibframe ontology. Where necessary we defined our own terms.
This document specifies the used RDF terms and provides examples around the different key concepts
as well as how these terms were serialized in one or more RDF named-graphs.
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key words MAY, MUST, MUST NOT, OPTIONAL, RECOMMENDED, REQUIRED, SHALL, SHALL NOT, SHOULD, and SHOULD NOT in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
Conformance requirements are expressed with a combination of descriptive assertions and [RFC2119] terminology.The key words MAY, MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, in the normative parts of this document are to be interpreted as described in RFC 2119.
This data model follows the Open World Assumption (OWA), common in the web and Linked Open Data, hence a missing property does not mean that it does not exist, just that it is not known. For example, in cases where we do not know what the year of publication for a book is, we do not indicate "unknown" or something similar, we rather do not mention the property at all.
Throughout the document, the following terminology is used (sorted alphabetically).
200 Theology. Religions.
or 800 Literature. History and literary criticism.
810 Poetry. or 850 Comics..
"Brussels"@en or "Bruxelles"@fr or
be described with a data type like "2024-01-01"^^xsd:date
ex:kbr, whereas additional location information is in the named graph ex:geo.
ex:book1 a schema:CreativeWork ex:kbr .
ex:book2 a schema:CreativeWork ex:kbr .
ex:book1 schema:locationCreated ex:brussels ex:geo .
ex:brussels a schema:Place ex:geo .
ea8c2233-8694-4e13-998a-c3592eddad5f. The first block consists of random digits, the second block is based on the timestamp,
the third block starts with 4, indicating that this is a UUID of version 4, the fourth block are clock sequence bits, and the last block is random again (as this is version 4, in other versions the last block is based on the MAC address of the machine that generated the UUID).
This sections provides an overview of a translation and furthermore focuses on the following different aspects and links to other concepts: activities, contributors, originals and locations.
The following figure provides an overview of the different concepts and how they relate to each other. In the remainder of the section we briefly discuss the underlying structure and in the next section we zoom into the different aspects of the data model.
Terms from the PROV ontology, PROV-O (based on the PROV data model (PROV-DM)) are used or extended, but moreover the general principles of PROV are applied in the BELTRANS data model:
namely that we have agents, activities and entities.
A PROV agent is the overall concept that we use for actors. Concretely we use the following two classes of schema.org to indicate such agents:
schema:Personschema:OrganizationWe use the following PROV activities that are subclasses we defined on prov:Activity.
btm:TranslationActivity, to be able to specify the creation of a translation based on an original and the related contributor roles.btm:CorrelationActivity: Within BELTRANS data from different sources are integrated by common identifiers (e.g. two book records based on common ISBN).
However, whenever there was no common identifier or where we could not alter one of the data sources,
we used correlation lists. We use this class to indicate that certain entities in our data come from such a manually curated spreadsheet.btm:CorrelationRemovalActivity: Similar to the explanation above, but such activities are used to indicate which manual curated entities should be removed.We use the following PROV entities:
The following sections specify which concrete classes we use to indicate such entities.
We sometimes have to make statements about something based on a certain motivation. For this we make use of the W3C Web Annotations standard.
One of the key concepts of the corpus are Translations. In the following we describe different properties and provide examples.
Types of translations
We reuse the generic class schema:CreativeWork to declare translations.
Furthermore we use the following self-created classes to further describe
specific subsets of translations.
Initially we used the property schema:isPartOf to denote specific subsets,
for example ex:book schema:isPartOf btid:beltransCorpus.
However, applications such as SAMPO-UI require to classes to specify
what is shown in a user interface perspective.
btm:BeltransTranslation (schema:isPartOf btid:beltransCorpus)btm:BeltransGenreTranslation (schema:isPartOf btid:beltransGenre)btm:MultilingualManifestationOne can use a combination of these classes to query different subsets.
For example all translations of BELTRANS genre that also pass the nationality filter (instances of btm:BeltransTranslation AND btm:BeltransGenreTranslation.
The main difference is that in the project we mainly focused on instances of btm:BeltransTranslation with special attention to instances of btm:BeltransGenreTranslation.
This means that instances of those classes did undergo more manual refinements.
Identifiers of translations
Each translation has a unique UUID identifier.
As we integrated data from different data sources, a translation may have more local identifiers.
Additionally books are usually identified by the International Standard Book Number (ISBN),
either in its old variant, the 10 digit ISBN-10 or the modern 13 digit ISBN-13.
The unique BELTRANS ID is indicated with the property dcterms:identifier.
Other identifiers are indicated by using the BIBFRAME ontology.
#
# A book linking to different instances of bf:Identifier
# as well as a direct link to an ISBN-10
#
ex:book a schema:CreativeWork ;
dcterms:identifier "ea8c2233-8694-4e13-998a-c3592eddad5f" ;
bibo:isbn10 "..." ;
bf:identifiedBy ex:bookKBR ;
bf:identifiedBy ex:bookKB ;
bf:identifiedBy ex:bookBnF ;
bf:identifiedBy ex:bookUnesco ;
bf:identifiedBy ex:bookISBN10 .
ex:bookKBR a bf:Identifier ;
rdfs:label "KBR" ;
rdf:value "..." .
ex:bookBnF a bf:Identifier ;
rdfs:label "BnF" ;
rdf:value "..." .
ex:bookUnesco a bf:Identifier ;
rdfs:label "Unesco" ;
rdf:value "..." .
ex:bookISBN10 a bf:Identifier ;
rdfs:label "ISBN-10" ;
rdf:value "..." .
The BIBFRAME ontology also has specific subclasses of bf:Identifier, such as bf:Isni which then does not require a dedicated rdfs:label (as the name is implied by the specific class).
However, not all identifiers have a specific subclass and because we want a generic
SPARQL query to obtain all identifiers and their name we always indicate the generic class bf:Identifier
as well as the related rdfs:label and rdf:value.
We indicate one or more genres of translations with the property schema:about
and a URI that represents a genre of the Belgian Bibliography.
The URIs are build based on the internal KBR LEXICON code.
For example LEXICON_000000090 for 850 Comics..
In our PROV-O based data model, translations are the result of a translation activity.
Todo
A translation links to contributors in a redundant way.
schema:author, schema:translator and schema:publisher.prov:Association instances with the property prov:hadRole and the related translation activity.Currently all three ways of indicating contributors are explicitly part of the RDF data. These annotations were created with a single RML mapping file. Another option would be to only indicate the role using the third technique and derive the first two automatically via SPARQL INSERT queries or, on query time, with reasoning rules.
#
# Definition of a translation with direct links to contributor with role-specific attributes
#
ex:book a schema:CreativeWork ;
prov:wasGeneratedBy ex:bookTranslationActivity ;
schema:author ex:person1 ;
marcrel:aut ex:person1 ;
marcrel:ill ex:person2 .
#
# The translation activity linking to one association per contributor
#
ex:bookTranslationActivity a prov:Activity ;
prov:generated ex:book ;
prov:qualifiedAssociation ex:bookPerson1Assoc ;
prov:qualifiedAssociation ex:bookPerson2Assoc .
#
# Associations indicating in which role a contributor contributed to an activity
#
ex:bookPerson1Assoc a prov:Association ;
prov:hadRole btid:role_aut ;
prov:activity ex:bookTranslationActivity .
ex:bookPerson2Assoc a prov:Association ;
prov:hadRole btid:role_ill ;
prov:activity ex:bookTranslationActivity .
Location information is either stored as literal values or after an enrichment step by using URL entities.
A translation has the following literal properties that we did not yet discuss in detail:
schema:nameschema:datePublishedbibo:isbn10bibo:isbn13rdfs:commentrdfs:labelPersons and Organizations
Mention the different roles and how roles are assigned based on specific properties or via general prov:role association
btm:hasNameVariantbtm:hasPseudonymbtm:isPseudonymOfHow geo information is encoded
btm:isoCodebtm:matchCandidateDuring the project we used several named-graphs to store the data. However, this makes SPARQL queries more complex and hence we also provide a serialization in a single graph for easier accessibility.
During the project we used several named-graphs to store the data.
i.e. statements about a resource
We used the following named graphs to store (generated) RDF per data source.
http://master-datahttp://isni-sruhttp://kbr-syracusehttp://kbr-linked-authoritieshttp://kbr-originalshttp://bnf-originalshttp://bnf-publicationshttp://kb-publicationshttp://kb-linked-authoritieshttp://kb-originalshttp://unescoThe following named graphs contain the integrated data, i.e. the BELTRANS database with dedicated records, referring to source records in the respective data source named graphs.
http://beltrans-manifestationshttp://beltrans-contributorshttp://beltrans-geohttp://beltrans-originalshttp://beltrans-worksFor example, based on properties with textual location information such as ex:book schema:locationCreated "Brussel", stored in one named graph,
we had a Python script that created structured referenced data which we stored in another named graph, e.g. ex:book schema:locationCreated ex:brussel . ex:brussel rdf:type schema:Place ; rdfs:label "Brussel" ..
In a SPARQL query one can indicate which schema:locationCreated you are interested in by specifying one of the named graphs.
The consolidated corpus data are available in a single graph.
This required a data migration from the more-rich multi-graph setup to a single graph
in a semantically sound way.
For example, querying schema:locationCreated values from our corpus in a single graph
would result in literal values such as "Brussels" as well as instances of schema:Place
if we simply would copy everything over.
Hence we have to make sure that we get uniform query results by for example using different properties
or decide to only migrate one of the representations, in this example this is either the literal or the object.
Referenced in: