The Semantic Web and Cultural Heritage: Ontologies and Technologies Help in Accessing Museum Information

Oreste Signore, <oreste@w3.org>
CNR-ISTI
Area della Ricerca CNR - via Moruzzi, 1 - 56124 Pisa


Information Technology for the Virtual Museum

December 6-7, 2006 - Sønderborg, Denmark


Slides: http://www.weblab.isti.cnr.it/talks/2006/itvm2006/

Cover page Weblab Logo Valid XHTML 1.0! Valid CSS!

Content

Content

Information integration: role of ontologies

Interoperability (technological)

Interoperability (semantic)

Cultural heritage applications: content issues

Cultural heritage applications: users' issues

Conceptual problems

Implementation issues (summary)

Links

Some links are more equal than others

Dictionaries and authority files

Dictionaries

Authority files

Using the Italian Authors Dictionary

In the World Wide Web there is also a multilinguality issue to consider

Navigating a thesaurus

Temporal algebra

Temporal algebra: chronological order

Interaction metaphors

Basically:

Can be implemented creating a "conceptual level" supporting intensional links

Example of (historical) map interaction metaphor

An example of (iconographic) association

Power (and danger) of association!

Information Integration

Standard vocabularies
  • definition difficult and time consuming
  • once defined, standards don't adapt well
  • people don't implement standards correctly anyway
Common schema
  • in principle the simplest way
  • different schemas, different cultural traditions
  • failure!
Metadata level
  • a typical example: Dublin Core
  • the number of metadata vocabularies will continue to grow (M. Doerr)
  • doubtful metadata vocabularies can exploit the full richness of possible associations

Metadata vs ontology

A base for understanding

Core metadata
  • intended for integration
  • created, edited, viewed by humans
  • human factors play a primary role
Core ontology
  • underlying formal model for tools that integrate source data and perform a variety of extended functions
  • higher levels of complexity are tolerable
  • completeness and logical correctness are the driving forces
  • base for deriving knowledge

CIDOC-CRM is a formal ontology which can be used to perform reasoning.

What is an ontology? (1)

Neches et al. (1991)

An ontology defines the basic terms and relations comprising the vocabulary of a topic area as well as the rules for combining terms and relations to define extensions to the vocabulary.

Gruber (1993)

An ontology is an explicit representation of a conceptualization

Borst (1997)

Ontologies are defined as a formal specification of a shared conceptualization

Studer et al. (1998) (Merging and explaining Gruber and Borst)

An ontology is a formal, explicit specification of a shared conceptualisation. A 'conceptualisation' refers to an abstract model of some phenomenon in the world by having identified the relevant concepts of that phenomenon. 'Explicit' means that the type of concepts used, and the constraints on their use are explicitly defined. For example, in medical domains, the concepts are diseases and symptoms, the relations between them are causal and a constraint is that a disease cannot cause itself. 'Formal' refers to the fact that the ontology should be machine readable, which excludes natural language. 'Shared' reflects the notion that an ontology captures consensual knowledge, that is, it is not private to some individual, but accepted by a group.

What is an ontology? (2)

Guarino

A logical theory which gives an explicit, partial account of a conceptualization

A set of logical axioms designed to account for the intended meaning of a vocabulary.

A specific artifact designed with the purpose of expressing the intended meaning of a vocabulary

Jim Hendler

A set of knowledge terms, including the vocabulary, the semantic interconnections and some simple rules of inference and logic for some particular topic

Agreement or disagreement?

Classification of ontologies

Lightweight
  • mainly taxonomies
  • include concepts, concept taxonomies, relationships between concepts, properties that describe concepts
Heavyweight
  • provide more restrictions on domain semantics
  • add axioms and constraints to lightweight ontologies
  • axioms and constraints clarify the intended meaning of the terms gathered on the ontology

Ontologies: knowledge modeling techniques

Highly informal
expressed in natural language (hence not machine readable)
Semi-informal
restricted and structured form of natural language
Semi-formal
artificial and formally defined language (e.g. OWL)
Rigorously formal
meticulously defined terms with formal semantics, theorems and proofs of properties such as soundness and completeness

Levels of knowledge representation

What is the best?

Motivations for CIDOC CRM

(quoted from M. Doerr)

CIDOC CRM is ...

CIDOC CRM terminology (partial)

CIDOC CRM terminology (partial)

Properties can have properties, such as in the case of an Activity (E7) carried out (P14) by an Actor (E39).

CIDOC CRM: reasoning about spatial information

CIDOC CRM: reasoning about temporal information

CIDOC CRM: termini ante quem post quem

Content

Semantic Web in a nutshell

The Semantic Web stack

immagine dei livelli del semantic web

Semantic Web is ...
  • a metadata based infrastructure for reasoning on the Web
  • an extension, not a replacement of the current web

What are Metadata?

What is RDF?

RDF Data Model

The fundamental notion is:

An RDF statement

The person identified by the Codice Fiscale SGNRST99A99X111Y has Name Oreste Signore, Email oreste@w3.org, and Affiliation C.N.R..
The http://www.w3c.it/Oreste/DocX resource has Author this person.

RDF structured property diagram

RDF Schemas

Classes, Resources, ...

... and something more ...

RDFS is useful, but does not solve all the issues.
Complex applications may want more possibilities.

Reasoning
can a program reason about some terms? E.g.:
  • "if «A» is father of «B» and «B» is father of «C», is «A» grand-parent of «C»?"
  • obviously true for humans, not obvious for a program ...
  • ... programs should be able to deduce such statements
Equivalences
  • if somebody else defines a set of terms: are they the same?
  • obvious issue in an international context
Classes and constraints
  • construct classes, not just name them
  • restrict a property range when used for a specific class
  • etc.

Ontologies

Ontologies are on the Web

Here comes the Web Ontology Language (OWL)

OWL: three sublanguages

OWL Lite
supports those users primarily needing a classification hierarchy and simple constraints. Provides a quick migration path for thesauri and other taxonomies. Owl Lite also has a lower formal complexity than OWL DL
OWL DL
supports those users who want the maximum expressiveness while retaining computational completeness (all conclusions are guaranteed to be computable) and decidability (all computations will finish in finite time)
OWL Full
for users who want maximum expressiveness and the syntactic freedom of RDF with no computational guarantees

A sample: Epitaphios GE34604

epitaphios

Formal description using CIDOC-CRM

Epitaphios GE34604 (Entity Iconographic Object)
    is identified by
      TA 959a (entity Object Identifier)
      GE 34604 (entity Object Identifier)
    preferred identifier is
      GE 34604 (entity Object Identifier)
  has type
    ecclesiastical embroidery
    liturgical cloth
  current owner
    Museum Benaki (Legal Body)
  has type
    private museum
  has contact points
    <Ifigenia Dionissiadu> ifi@benaki.gr (entity Contact Point)
    Koumbari Street 1, Athens (entity Address)
...

(1998 Martin Doerr and Ifigenia Dionissiadou)

As a collection of triples

In machine domain! Not friendly to humans
(rdf:#type is a shorthand for: http://www.w3.org/1999/02/22-rdf-syntax-ns#type)

subject (or resource) predicate (or property) value (or object)
Epitaphios GE34604 rdf:#type cidoc:#E84.Information_Carrier
ecclesiastical embroidery rdf:#type cidoc:#E55.Type
Epitaphios GE34604 cidoc:#P2F.hastype ecclesiastical embroidery
liturgical cloth rdf:#type cidoc:#E55.Type
Epitaphios GE34604 cidoc:#P2F.hastype liturgical cloth
Museum Benaki rdf:#type cidoc:#E40.Legal_Body
Epitaphios GE34604 cidoc:#P52F.has_current_owner Museum Benaki
Creation of Epitaphios GE34604 rdf:#type cidoc:#E12.Production_Event
handwork rdf:#type cidoc:#E55.Type
Creation of Epitaphios GE34604 cidoc:#P2F.has_type handwork
none rdf:#type cidoc:#E52.Time-Span
none cidoc:#P81F.ongoing_throughout "1682"@en
none cidoc:#P82F.at_some_time_within "1682"@en
Creation of Epitaphios GE34604 cidoc:#P4F.has_time-span none
Istanbul rdf:#type cidoc:#E53.Place
Creation of Epitaphios GE34604 cidoc:#P7F.took_place_at Istanbul
Epitaphios GE34604 cidoc:#P108B.was_produced_by Creation of Epitaphios GE34604
... ... ...

A graph visualization

epitaphiosClasses

Content

Scenarios

Searching vs. Linking

Back to the "roots" of the web

The typical search scenario

Effective search is important ...
... but reducing complexity in query formulation and supporting user's interests is more important

Searching the Web

Present traditional approach
  • based on term matching
  • ranking (tf/idf or Google PageRank)
  • syntactic approach
Semantic Search
  • an application of Semantic Web to search
  • users prefer formulate queries using high level semantic concepts, more consistent with standards and tacit knowledge
  • navigational search: the user provides the search engine a phrase or combination of words which s/he expects to find in the documents. The user is using the search engine as a navigation tool to navigate to a particular intended document.
  • research search: the user provides the search engine with a phrase which is intended to denote an object about which the user is trying to gather/research information. There is no particular document which the user knows about that s/he is trying to get to. Rather, the user is trying to locate a number of documents which together will give him/her the information s/he is trying to find.

Semantic Search and ontologies

A possible scenario

Semantic annotation of documents
  • using CIDOC-CRM as a reference ontology
  • annotation can reside anywhere
  • intelligent agents can use the annotation
User mental model
  • preferred interaction metaphors
  • stating classess and properties of the ontology the user is interested in (weighted)
Architecture
  • browser enriched by reasoner and finder
  • reasoner compares userModel and currentResource
  • reasoner looks at a trusted ontology and trusted data
  • finder searches appropriately the Web

Examples of user interaction

Suppose you are located on a resource ...

Temporal browsing
  • events in a neighbour of time of creation
  • objects created in the same period
Spatial browsing (from place of production)
  • objects created in the same place or nearby
  • painting depicting the same place
Spatio-temporal browsing
  • artists alive or active in a time interval around time of creation and in the area
Iconographical
  • related subject
... and more ...
  • the limit is your fantasy (or your needs)

Conclusion

Thanks for your attention

Questions?


If it isn't on the web it doesn't exist ...

... you will find on the WebLab (http://www.weblab.isti.cnr.it/) web site
the slides (http://www.weblab.isti.cnr.it/talks/2006/itvm2006/)