Content
- Interoperability
- Scholars' needs
- Information integration
- Ontologies and knowledge representation
- CIDOC CRM
- Semantic Web in a nutshell
- An example
- A possible scenario
Content
Information integration: role
of ontologies
- Interoperability
- Scholars' needs
- Information integration
- Ontologies and knowledge representation
- CIDOC CRM
Interoperability (technological)
- Interoperability is:
- applications can exchange data and services in a
consistent and effective way
- facing different hardware and software platforms
- ...
- .. a key success factor ...
- Some advantages:
-
saving of investments (cope with hw/sw
evolution)
-
enlarging the market (compatibility with
other vendors solutions)
- Is a real quality issue
- The key point: a consistent
framework/technology
- Interoperability is not a mere technological
issue ...
Interoperability (semantic)
-
Web for Everyone: access to everyone,
overcoming differences in culture, language, education,
ability, material resources, and physical limitations
of users on all continents
- Consider "cultural barriers"
- A message is:
-
Content: the true content of the message,
the originator wants to communicate;
-
Structure: the way the information is
organized (e.g. title, author, body, signature)
-
Presentation: the way the information is
presented to the user (fonts, colours, page layout,
etc.)
-
Semantic interoperability is a must
- W3C technologies are supporting our needs
Cultural heritage applications: content issues
-
Associativity
- geographic context
- cultural context
- temporal context
- subject represented
- ...
-
Multidisciplinarity
- history
- geography
- religion
- ...
-
Many possible associations among documents
- different links leading to different document
types
Cultural heritage applications: users' issues
- Different users:
- professionals
- practitioners
- researchers
- tourists (with different competences)
- specific interests in contexts (spatial,
temporal, cultural, etc.)
- ...
- Different search needs:
- gathering information about an object
- very specific (high precision)
- looking for loose or marginal references (high
recall)
-
different needs in different cases for the
same user
- Need for adaptive and intelligent
systems
Conceptual problems
-
Data structuring
- importance of the conceptual level
- explicit vs. implicit semantics and knowledge
-
Language normalization
-
Differences among schools, cultures,
traditions
- all well established
- in many cases complete agreement on
semantics of information elements ...
- ... but differences in information
structure
-
Spatio-temporal context
- storing time-dependent info
- display chronologically consistent maps
Implementation issues (summary)
- Structural vs. associative links
- Dictionaries and "authority files"
- Navigating thesauri
- Temporal algebra
- Interaction metaphors
- User interface
Links
-
Structural
- inherent to the specific item
- easy implementation
-
extensional links
-
Associative
- high value
- enable to put info in the right
spatio-temporal context
- high number, driven by many different
associatice mechamism
-
intensional links
- can be implemented via a "conceptual
level"
- user can navigate from the "data level"
up to the "conceptual level" and viceversa
-
both levels navigable
Some links are more equal than others
Dictionaries and authority files
Dictionaries
- needed to formulate "precise" queries
- reduced risk of "silence" in searching
- scholars must perform a not always simple
task
-
no implementation problems
Authority files
-
more complex than dictionaries
-
not just a list of terms
- identifying the correct term to use for searching
requires browsing and selecting of a set of
records
Using the Italian Authors Dictionary
In the World Wide Web there is also a
multilinguality issue to consider
Navigating a thesaurus
- A thesaurus is a simple and straightforward
implementation of the "concept space".
- A thesaurus can be represented as a graph:
-
nodes = thesaurus terms
-
edges = semantic associations
- User must be able to:
-
display the graph
-
filtering it according to the associations
relevant to her/his interest
-
move on the graph (getting explanations
and images, if significant)
-
select terms to use for querying
Temporal algebra: chronological order
-
Ordering etherogeneous dates is a non trivial
semantic issue
- Are the following dates in chronological order?
- 1st January 1800
- probably 2 February 1800 (anyway between January
and April 1800)
- beginning year 1800
- mid year 1800
- first half of 19th century
- probably in 1810 (certainly after end of year
1800 and before end year 1832)
- 19th century
Interaction metaphors
Basically:
-
map (geographic or topographic)
-
time
-
classification system
-
searching by access points
- ... and their combinations
Can be implemented creating a "conceptual level"
supporting intensional links
Example of (historical) map interaction metaphor
- localities stored with:
-
geographic reference
-
historical name (toghether with time span)
-
political jurisdiction (toghether with
time span)
-
ecclesiastical jurisdiction (toghether
with time span)
-
bibliographic references for each info
- quite simple database schema
-
Getty TGN was born from here! (search on the
Library of Congress for the book published in 1989)
- a sample
(query about political jurisdiction)
An example of (iconographic) association
- After query execution, user selects a record
(id=0900119742): “Ercole che libera Esione”.
- The image
- From the result set, selects a new item
(0900119743): “Adamo e Eva con i figli Caino e
Abele”.
- The image
- User can find the Iconclass notation (71 A
72 1)
- Query to the iconographic classification system for
“71 A 72 1”. Selecting the associated keyword
"child" can get additional
notations referred by the same keyword
- Moving from the notation “11 HH (MARY
MAGDALENE) 51 2” gets its full description.
Afterwards, moving on the classification tree,
points to “11 HH (MARY MAGDALENE)” and save
it as query term
- Continuing the navigation, user points to
“11 HH
(MARY MAGDALENE) 36” (penitent Saint Mary
Magdalene).
- The selected term has been added as an additional
constraint to the previous query. Query is executed and
results
returned
- The image
Power (and danger) of association!
Information Integration
-
Standard vocabularies
-
- definition difficult and time
consuming
- once defined, standards don't adapt well
- people don't implement standards
correctly anyway
-
Common schema
-
- in principle the simplest way
-
different schemas, different cultural
traditions
-
failure!
-
Metadata level
-
- a typical example: Dublin Core
- the number of metadata vocabularies will
continue to grow (M. Doerr)
-
doubtful metadata vocabularies can exploit
the full richness of possible
associations
Metadata vs ontology
A base for understanding
-
Core metadata
-
- intended for integration
- created, edited, viewed by humans
-
human factors play a primary role
-
Core ontology
-
- underlying formal model for tools that
integrate source data and perform a variety of
extended functions
- higher levels of complexity are
tolerable
-
completeness and logical
correctness are the driving forces
- base for deriving knowledge
CIDOC-CRM is a formal ontology which can be used
to perform reasoning.
What is an ontology? (1)
Neches et al. (1991)
An ontology defines the basic terms and relations
comprising the vocabulary of a topic area as
well as the rules for combining terms and
relations to define extensions to the
vocabulary.
Gruber (1993)
An ontology is an explicit representation of a
conceptualization
Borst (1997)
Ontologies are defined as a formal specification of a
shared conceptualization
Studer et al. (1998) (Merging and explaining Gruber and
Borst)
An ontology is a formal, explicit specification of a
shared conceptualisation. A
'conceptualisation' refers to an abstract
model of some phenomenon in the world by having
identified the relevant concepts of that phenomenon.
'Explicit' means that the type of concepts
used, and the constraints on their use are explicitly
defined. For example, in medical domains, the concepts
are diseases and symptoms, the relations between them
are causal and a constraint is that a disease cannot
cause itself. 'Formal' refers to the fact that
the ontology should be machine readable, which excludes
natural language. 'Shared' reflects the notion
that an ontology captures consensual knowledge, that
is, it is not private to some individual, but accepted
by a group.
What is an ontology? (2)
Guarino
A logical theory which gives an explicit, partial
account of a conceptualization
A set of logical axioms designed to account
for the intended meaning of a vocabulary.
A specific artifact designed with the purpose
of expressing the intended meaning of a
vocabulary
Jim Hendler
A set of knowledge terms, including the
vocabulary, the semantic interconnections and
some simple rules of inference and logic for
some particular topic
Agreement or disagreement?
-
Different definition, but consensus
among the ontology community
- An ontology includes:
- terms explicitly defined
-
knowledge we can infer
- An ontology aim to capture consensual
knowledge, to reuse and share across software
applications and by groups of people.
Classification of ontologies
-
Lightweight
-
- mainly taxonomies
- include concepts, concept taxonomies,
relationships between concepts,
properties that describe concepts
-
Heavyweight
-
- provide more restrictions on domain
semantics
- add axioms and constraints to
lightweight ontologies
- axioms and constraints clarify the intended
meaning of the terms gathered on the ontology
Ontologies: knowledge modeling techniques
-
Highly informal
-
expressed in natural language (hence not
machine readable)
-
Semi-informal
-
restricted and structured form of natural
language
-
Semi-formal
-
artificial and formally defined language (e.g.
OWL)
-
Rigorously formal
-
meticulously defined terms with formal
semantics, theorems and proofs of
properties such as soundness and completeness
Levels of knowledge representation
- The degree of formalization of concepts and
their relations varies considerably between various
domains of knowledge
-
Lower end
- lexicons and simple taxonomies (ordered
classification system where terms are related
hierarchically)
- example: Iconclass
-
Middle level
- thesauri (controlled vocabularies that are
structured to show relationships between terms and
concepts, and, for example, allow for retrieving them
from a database)
- example: Art & Architecture Thesaurus (
AAT)
-
High end
- axiomatised logic theories, which include rules
to ensure the well-formedness and logical validity of
statements expressed in the language of the
scientific discipline
- example: CIDOC object-oriented Conceptual
Reference Model (CIDOC CRM)
What is the best?
- Semi-formal ontologies prooved to be effective for
information integration
- Semi-formal ontologies have a development effort
significantly smaller and are more abundant and
more useful than formal ontologies.
- Semi-formal ontologies can accommodate
partial (incomplete) and possibly
inconsistent information, especially in the
assertions of an ontology.
- The GoPubMed
example
- "Little semantics goes a long way" (Jim
Hendler)
Motivations for CIDOC CRM
-
Reality of semantic interoperability is
frustrating:
- many standards
- many proprietary metadata
- many proprietary data structures
- many terminology systems
-
Core systems like Dublin Core represent a
common denominator by far too small to fulfil
advanced requirements
-
Overstretching of semantics in order to
capture complex semantics leads to further loss of
meaning, even though most of the contents encoded
in the various structures seems to be pretty well
comprehensive to common sense
(
quoted from M. Doerr)
CIDOC CRM is ...
- A collaboration with the International
Council of Museums
- An ontology of 81 classes and 132
properties for culture and more
- With the capacity to explain many (meta)data
formats
- Accepted by ISO TC46 in Sept. 2000, now
ISO/CD 21127 accepted Committee Draft, proposed as DIS
- Serving as:
-
intellectual guide to create schemata,
formats, profiles
- A language for analysis of existing sources for
integration "Identify elements with common
meaning"
-
Transportation format for data integration
/ migration / Internet
CIDOC CRM terminology (partial)
-
class: identified by numbers preceded by the
letter "E"
-
subclass: specialization of another class (its
superclass)
-
superclass: generalization of one or more
other classes (its subclasses)
-
multiple inheritance: a class may have more
than one immediate superclass
-
property: identified by numbers preceded by
the letter "P" (define relationship between
two classes)
-
subproperty: specialization of another
property (its superproperty)
-
superproperty: a generalization of one or more
other properties (its subproperties)
- …
CIDOC CRM terminology (partial)
-
intension (of a class or property): its
intended meaning
-
extension (of a class): the set of real life
instances belonging to the class that fulfil the
criteria of its intension (with the "Open World
Assumption")
-
Open World Assumption: information stored is
incomplete relative to the universe of discourse they
intend to describe
Properties can have properties, such as in the
case of an Activity (E7) carried out
(P14) by an Actor (E39).
- Example: the painting of the Sistine Chapel (E7)
was carried out by (P14.1) Michelangelo Buonarroti (E21)
in the role of master craftsman (E55)
- (Note that E21 Person is a subclass of E39
Actor)
CIDOC CRM: reasoning about spatial information
CIDOC CRM: reasoning about temporal information
CIDOC CRM: termini ante quem post quem
Content
Semantic Web in a
nutshell
- Metadata
- RDF
- RDFS
- OWL
- An example
The Semantic Web stack
-
Semantic Web is ...
-
- a metadata based infrastructure for
reasoning on the Web
- an extension, not a replacement of the
current web
What are Metadata?
- (Machine understandable) info about a web
resourse or something else
- ... data about data
- ... intelligent software agents can make use
of them to make the best use of the resources available
on the Web
- ... data ...
- ... that can be described by other metadata
...
What is RDF?
- basis for coding, exchanging and
reusing structured metadata
- allows interoperability among applications
exchanging machine-understandable information on
the web
RDF Data Model
The fundamental notion is:
-
Statement: a tuple of a
subject (or resource), a predicate
(or property) and a value (or object).
- sometimes referred as: (s-p-o) triple
An RDF statement
The person identified by the Codice Fiscale
SGNRST99A99X111Y has Name Oreste
Signore, Email oreste@w3.org, and
Affiliation C.N.R..
The http://www.w3c.it/Oreste/DocX resource has
Author this person.
RDF Schemas
- Adding metadata and using it from a program works ...
- ... provided the program knows what terms to
use!
- When we use terms like:
- person
- has Author
- property of
- ...
- Are they all known? Are they all
correct?
- This is where RDF Schemas come in
- officially: RDF Vocabulary Description
Language
Classes, Resources, ...
- Think of well known in traditional
ontologies:
- use the term "mammal"
- "every dog is a mammal"
- "Attila is a dog"
- RDFS defines the terms of resources and
classes:
- everything in RDF is a "resource"
- "classes" are also resources, but ...
- they are also a collection of possible resources
(i.e. individuals), as, for example: "mammal", "dog"
-
Relationships are defined among
classes/resources:
- "typing": an individual belongs to a
specific class
(e.g.: "Attila is a dog")
- "subclassing": instance of one is also
an instance of the other
(e.g.: "every dog is a mammal")
... and something more ...
RDFS is useful, but does not solve all the issues.
Complex applications may want more
possibilities.
-
Reasoning
-
can a program reason about some terms? E.g.:
- "if «A» is father of «B»
and «B» is father of «C», is
«A» grand-parent of «C»?"
- obviously true for humans, not obvious for a
program ...
- ... programs should be able to deduce
such statements
-
Equivalences
-
- if somebody else defines a set of terms: are they
the same?
- obvious issue in an international context
-
Classes and constraints
-
-
construct classes, not just name them
- restrict a property range when used for a
specific class
- etc.
Ontologies
Ontologies are on the Web
- The Semantic Web needs a support of
ontologies:
"defines the concepts and relationships used to
describe and represent an area of knowledge"
- We need a Web Ontologies Language to define:
- the terminology used in a specific
context
- more constraints on properties
- the logical characteristics of
properties
- the equivalence of terms across
ontologies
- etc.
- Language should be a compromise between
- rich semantics for meaningful applications
- feasibility, implementability
Here comes the Web Ontology Language (OWL)
- A layer on top of RDFS with additional
possibilities
- Outcome of various projects (DAML, OIL)
OWL: three sublanguages
-
OWL Lite
-
supports those users primarily needing a
classification hierarchy and simple
constraints. Provides a quick migration path for
thesauri and other taxonomies. Owl Lite also has a
lower formal complexity than OWL DL
-
OWL DL
-
supports those users who want the maximum
expressiveness while retaining computational
completeness (all conclusions are guaranteed to be
computable) and decidability (all computations
will finish in finite time)
-
OWL Full
-
for users who want maximum expressiveness and
the syntactic freedom of RDF with no computational
guarantees
A sample: Epitaphios GE34604
Formal description using CIDOC-CRM
Epitaphios GE34604 (Entity Iconographic Object)
is identified by
TA 959a (entity Object Identifier)
GE 34604 (entity Object Identifier)
preferred identifier is
GE 34604 (entity Object Identifier)
has type
ecclesiastical embroidery
liturgical cloth
current owner
Museum Benaki (Legal Body)
has type
private museum
has contact points
<Ifigenia Dionissiadu> ifi@benaki.gr (entity Contact Point)
Koumbari Street 1, Athens (entity Address)
...
(1998
Martin Doerr and Ifigenia Dionissiadou)
As a collection of triples
In machine domain! Not friendly to humans
(rdf:#type is a shorthand for:
http://www.w3.org/1999/02/22-rdf-syntax-ns#type)
|
subject (or resource)
|
predicate (or property)
|
value (or object)
|
|
Epitaphios GE34604
|
rdf:#type
|
cidoc:#E84.Information_Carrier
|
|
ecclesiastical embroidery
|
rdf:#type
|
cidoc:#E55.Type
|
|
Epitaphios GE34604
|
cidoc:#P2F.hastype
|
ecclesiastical embroidery
|
|
liturgical cloth
|
rdf:#type
|
cidoc:#E55.Type
|
|
Epitaphios GE34604
|
cidoc:#P2F.hastype
|
liturgical cloth
|
|
Museum Benaki
|
rdf:#type
|
cidoc:#E40.Legal_Body
|
|
Epitaphios GE34604
|
cidoc:#P52F.has_current_owner
|
Museum Benaki
|
|
Creation of Epitaphios GE34604
|
rdf:#type
|
cidoc:#E12.Production_Event
|
|
handwork
|
rdf:#type
|
cidoc:#E55.Type
|
|
Creation of Epitaphios GE34604
|
cidoc:#P2F.has_type
|
handwork
|
|
none
|
rdf:#type
|
cidoc:#E52.Time-Span
|
|
none
|
cidoc:#P81F.ongoing_throughout
|
"1682"@en
|
|
none
|
cidoc:#P82F.at_some_time_within
|
"1682"@en
|
|
Creation of Epitaphios GE34604
|
cidoc:#P4F.has_time-span
|
none
|
|
Istanbul
|
rdf:#type
|
cidoc:#E53.Place
|
|
Creation of Epitaphios GE34604
|
cidoc:#P7F.took_place_at
|
Istanbul
|
|
Epitaphios GE34604
|
cidoc:#P108B.was_produced_by
|
Creation of Epitaphios GE34604
|
|
...
|
...
|
...
|
A graph visualization
Content
Scenarios
- Searching and linking
- Semantic search
- Ontology driven access to information
Searching vs. Linking
Back to the "roots" of the web
-
association is the most important thing
- (semantically rich) links are the
real added value
- in the original proposal links had
semantics!
The typical search scenario
-
search the web
-
browse results
- from one ot them, follow links
-
perhaps, return to the list of results and
resume the browse and link process ...
- ... or issue a new query
Effective search is important ...
... but reducing complexity in query formulation
and supporting user's interests is more
important
Searching the Web
-
Present traditional approach
-
- based on term matching
-
ranking (tf/idf or Google PageRank)
-
syntactic approach
-
Semantic Search
-
- an application of Semantic Web to search
- users prefer formulate queries using high level
semantic concepts, more consistent with
standards and tacit knowledge
-
navigational search: the user provides the
search engine a phrase or combination of
words which s/he expects to find in the
documents. The user is using the search engine as a
navigation tool to navigate to a
particular intended document.
-
research search: the user provides the
search engine with a phrase which is intended to
denote an object about which the user is
trying to gather/research information.
There is no particular document which the
user knows about that s/he is trying to get to.
Rather, the user is trying to locate a number
of documents which together will give
him/her the information s/he is trying to
find.
Semantic Search and ontologies
- Better results due to the availability of machine
understandable structured knowledge
- Better identification of concepts to search
for
-
Enriching result list
-
Text understanding and processing
- Need for supporting ontologies
- Ontological modeling of user interests
-
Ontology driven access to (museum) information
A possible scenario
-
Semantic annotation of documents
-
- using CIDOC-CRM as a reference ontology
- annotation can reside anywhere
-
intelligent agents can use the annotation
-
User mental model
-
- preferred interaction metaphors
- stating classess and properties of the ontology
the user is interested in
(weighted)
-
Architecture
-
- browser enriched by reasoner and
finder
- reasoner compares userModel and
currentResource
- reasoner looks at a trusted ontology and
trusted data
- finder searches appropriately the
Web
Examples of user interaction
Suppose you are located on a resource ...
-
Temporal browsing
-
- events in a neighbour of time of creation
- objects created in the same period
-
Spatial browsing (from place of production)
-
- objects created in the same place or nearby
- painting depicting the same place
-
Spatio-temporal browsing
-
- artists alive or active in a time interval around
time of creation and in the area
-
Iconographical
-
-
... and more ...
-
- the limit is your fantasy (or your needs)
Conclusion
- Effective access to museum information is a
challenge
-
Information integration is fundamental
-
Ontologies can effectively support
information integration and reasoning
-
Semantic Web technologies support
distributed ontologies, exporting and
sharing of knowledge
-
Semantic search is based on availability of
ontologies