An interactive pattern-based approach
**** Hidden Message ***** State of the artPattern Approach Contribution
Framework
An interactive pattern-based approach for extracting
non-taxonomic relations from texts
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac
IRIT-IC3 Toulouse, France
OLP - 22/07/ 2008
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
What do we build ontologies for ?
Information Retrieval
semantic automatic indexation of texts
need for ontologies
with a structure (concepts, relations, instances)
a large lexical component
Ontology and Texts
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
What do we build ontologies for ?
Information Retrieval
semantic automatic indexation of texts
need for ontologies
with a structure (concepts, relations, instances)
a large lexical component
Ontology and Texts
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
How to build an ontology ?
General Issues
how to assist the ontologist ?
how to articulate learning strategies and manual technics ?
how to exploit text richness (and how to deal with specific text issues) ?
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
How to build an ontology ?
General Issues
how to assist the ontologist ?
how to articulate learning strategies and manual technics ?
how to exploit text richness (and how to deal with specific text issues) ?
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
How to build an ontology ?
General Issues
how to assist the ontologist ?
how to articulate learning strategies and manual technics ?
how to exploit text richness (and how to deal with specific text issues) ?
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
How to build an ontology ?
General Issues
how to assist the ontologist ?
how to articulate learning strategies and manual technics ?
how to exploit text richness (and how to deal with specific text issues) ?
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
How to build an ontology ?
General Issues
how to assist the ontologist ?
how to articulate learning strategies and manual technics ?
how to exploit text richness (and how to deal with specific text issues) ?
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
How to build an ontology ?
General Issues
how to assist the ontologist ?
how to articulate learning strategies and manual technics ?
how to exploit text richness (and how to deal with specific text issues) ?
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Progression
1 State of the art
Semantic relations
Relation Extraction
Pattern Approaches
2 Pattern Approach Contribution
Cameleon
Evalution
Going further evaluation
3 Framework
General Overview
Processing Steps
Interactive Interfaces
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
1 State of the art
Semantic relations
Relation Extraction
Pattern Approaches
2 Pattern Approach Contribution
3 Framework
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
Relations within Ontologies
What is a relation ?
aircraft personnel
person
passenger
crew
pilot
Boeing Airbus
A320
taxonomic relations
transversal relations
What is "really" a relation ?
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
Relations within Ontologies
What is a relation ?
aircraft personnel
person
passenger
crew
pilot
Boeing Airbus
A320
taxonomic relations
transversal relations
What is "really" a relation ?
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
Relations within Ontologies
What is a relation ?
aircraft personnel
person
passenger
crew
pilot
Boeing Airbus
A320
DRIVE
(pilot, steer,
operate,...)
taxonomic relations
transversal relations
What is "really" a relation ?
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
Relations within Ontologies
What is a relation ?
aircraft personnel
person
passenger
crew
pilot
Boeing Airbus
A320
DRIVE
(pilot, steer,
operate,...)
taxonomic relations
transversal relations
What is "really" a relation ?
FlightPhases pilot
TakingOff
InFlight Landing
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
Relations within Ontologies
What is a relation ?
aircraft personnel
person
passenger
crew
pilot
Boeing Airbus
A320
DRIVE
(pilot, steer,
operate,...)
taxonomic relations
transversal relations
What is "really" a relation ?
FlightPhases pilot
TakingOff
InFlight Landing
BEGIN
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
Relations within Ontologies
What is a relation ?
aircraft personnel
person
passenger
crew
pilot
Boeing Airbus
A320
DRIVE
(pilot, steer,
operate,...)
taxonomic relations
transversal relations
What is "really" a relation ?
FlightPhases pilot
TakingOff
InFlight Landing
BEGIN
REFUSE
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
Relations within Ontologies
What is a relation ?
aircraft personnel
person
passenger
crew
pilot
Boeing Airbus
A320
DRIVE
(pilot, steer,
operate,...)
taxonomic relations
transversal relations
What is "really" a relation ?
FlightPhases pilot
TakingOff
InFlight Landing
BEGIN
REFUSE
END
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
Relations within Ontologies
What is a relation ?
aircraft personnel
person
passenger
crew
pilot
Boeing Airbus
A320
DRIVE
(pilot, steer,
operate,...)
taxonomic relations
transversal relations
What is "really" a relation ?
FlightPhases pilot
TakingOff
InFlight Landing
BEGIN
INITIATE
REFUSE
END
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
Relations within Ontologies
What is a relation ?
aircraft personnel
person
passenger
crew
pilot
Boeing Airbus
A320
DRIVE
(pilot, steer,
operate,...)
taxonomic relations
transversal relations
What is "really" a relation ?
FlightPhases pilot
TakingOff
InFlight Landing
BEGIN
INITIATE
REFUSE
END
IS DOING
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
Relation extraction from texts
Linguistics Issues
polysemy :“plant"
metonymy : “the pilot climbed to height 600 m”
anaphora : “The Airbus A310 is a medium to long-range widebody airliner manufactured
by Airbus SAS. It was the second model to be introduced by Airbus after the A300”
Main conceptual issue
“American General Electric CF6-50
engines powered the A300”
“The Airbus A318 Elite will be
powered by CFM engines”
POWER(ci , cj )
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
Relation extraction from texts
Linguistics Issues
polysemy :“plant"
metonymy : “the pilot climbed to height 600 m”
anaphora : “The Airbus A310 is a medium to long-range widebody airliner manufactured
by Airbus SAS. It was the second model to be introduced by Airbus after the A300”
Main conceptual issue
Aircraft
Component Aircraft
CFM-Engine
Engine
A300
Airbus
CF6-50
A320Family
A318
“American General Electric CF6-50
engines powered the A300”
“The Airbus A318 Elite will be
powered by CFM engines”
POWER(ci , cj )
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
Relation extraction from texts
Linguistics Issues
polysemy :“plant"
metonymy : “the pilot climbed to height 600 m”
anaphora : “The Airbus A310 is a medium to long-range widebody airliner manufactured
by Airbus SAS. It was the second model to be introduced by Airbus after the A300”
Main conceptual issue
Aircraft
Component Aircraft
CFM-Engine
Engine
A300
Airbus
CF6-50
A320Family
A318
POWER
“American General Electric CF6-50
engines powered the A300”
“The Airbus A318 Elite will be
powered by CFM engines”
POWER(ci , cj )
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
Relation extraction from texts
Linguistics Issues
polysemy :“plant"
metonymy : “the pilot climbed to height 600 m”
anaphora : “The Airbus A310 is a medium to long-range widebody airliner manufactured
by Airbus SAS. It was the second model to be introduced by Airbus after the A300”
Main conceptual issue
Aircraft
Component Aircraft
CFM-Engine
Engine
A300
Airbus
CF6-50
A320Family
A318
POWER
“American General Electric CF6-50
engines powered the A300”
“The Airbus A318 Elite will be
powered by CFM engines”
POWER(ci , cj )
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
Relation extraction from texts
Linguistics Issues
polysemy :“plant"
metonymy : “the pilot climbed to height 600 m”
anaphora : “The Airbus A310 is a medium to long-range widebody airliner manufactured
by Airbus SAS. It was the second model to be introduced by Airbus after the A300”
Main conceptual issue
Aircraft
Component Aircraft
CFM-Engine
Engine
A300
Airbus
CF6-50
A320Family
A318
POWER “American General Electric CF6-50
engines powered the A300”
“The Airbus A318 Elite will be
powered by CFM engines”
POWER(ci , cj )
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
Relation extraction from texts
Linguistics Issues
polysemy :“plant"
metonymy : “the pilot climbed to height 600 m”
anaphora : “The Airbus A310 is a medium to long-range widebody airliner manufactured
by Airbus SAS. It was the second model to be introduced by Airbus after the A300”
Main conceptual issue
Aircraft
Component Aircraft
CFM-Engine
Engine
A300
Airbus
CF6-50
A320Family
A318
POWER
“American General Electric CF6-50
engines powered the A300”
“The Airbus A318 Elite will be
powered by CFM engines”
POWER(ci , cj )
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
Existing Approaches
engineering approaches
ressources exploitation
manual approaches
expert building
fusion approaches
ontology maping
dictionnaries, texts
thesaurus, lexicon...
data bases and semistructured
data
NLP Statistics &Learning
hybrid approache approaches
Distributionnal Analysis
Pattern Approaches
* clustering
* neuronal network
* bayesian network
* Markov
* genetic algorithms
* Kohonen maps
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
Linguistic Approaches
Distributional Approaches
terms sharing similar contexts may be related
ontology - taxonomy insert in X, organize in X,
structure in X, built X, etc.
Pattern-based Approaches
morphosyntactic reproducible pattern
ONTOLOGY ...is a kind of...,
...is richer than... TAXONOMY
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
Linguistic Approaches
Distributional Approaches
terms sharing similar contexts may be related
ontology - taxonomy insert in X, organize in X,
structure in X, built X, etc.
Pattern-based Approaches
morphosyntactic reproducible pattern
ONTOLOGY ...is a kind of...,
...is richer than... TAXONOMY
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
Pattern based Approaches : Hearst Proposal
“...identify a set of lexico-syntactic patterns that are easily recognisable, that occur frequently and
across text genre boundaries, and that indisputably indicate the lexical relation of interest...”
Illustration
"The bow lute, such as the Bambara ndang, is plucked [...]"
hyponym relation inherent to construction : “X such as Y".
identification of potential patterns for a specific kind of semantic relation
resulted from the human interpretation of their occurrences
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
Pattern based relation extraction
Example
“The heads report to the vice president”
lexical pattern : X report to Y
lexico-syntactic pattern :
PREPORT
1 : X report to the Y
PREPORT
2 : X VB IN DT Y
PREPORT
3 : X MD VB to DT Y
PREPORT
3 a : X MD VB to DT Y
PREPORT
4 : X MD to the Y
PREPORT
5 : X (JJ) ? VB to the Y
PREPORT
6 : X MD VB to (**) ? Y
compromise between
constrained and marginal structures
excessively permissive structures
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
Pattern based relation extraction
Example
“The heads report to the vice president”
lexical pattern : X report to Y
lexico-syntactic pattern :
PREPORT
1 : X report to the Y
PREPORT
2 : X VB IN DT Y
PREPORT
3 : X MD VB to DT Y
PREPORT
3 a : X MD VB to DT Y
PREPORT
4 : X MD to the Y
PREPORT
5 : X (JJ) ? VB to the Y
PREPORT
6 : X MD VB to (**) ? Y
compromise between
constrained and marginal structures
excessively permissive structures
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
Pattern based relation extraction
Example
“The heads report to the vice president”
lexical pattern : X report to Y
lexico-syntactic pattern :
PREPORT
1 : X report to the Y
PREPORT
2 : X VB IN DT Y
PREPORT
3 : X MD VB to DT Y
PREPORT
3 a : X MD VB to DT Y
PREPORT
4 : X MD to the Y
PREPORT
5 : X (JJ) ? VB to the Y
PREPORT
6 : X MD VB to (**) ? Y
compromise between
constrained and marginal structures
excessively permissive structures
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
Pattern based relation extraction
Example
“The heads report to the vice president”
lexical pattern : X report to Y
lexico-syntactic pattern :
PREPORT
1 : X report to the Y
PREPORT
2 : X VB IN DT Y
PREPORT
3 : X MD VB to DT Y
PREPORT
3 a : X MD VB to DT Y
PREPORT
4 : X MD to the Y
PREPORT
5 : X (JJ) ? VB to the Y
PREPORT
6 : X MD VB to (**) ? Y
compromise between
constrained and marginal structures
excessively permissive structures
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
Pattern based relation extraction
Example
“The heads report to the vice president”
lexical pattern : X report to Y
lexico-syntactic pattern :
PREPORT
1 : X report to the Y
PREPORT
2 : X VB IN DT Y
PREPORT
3 : X MD VB to DT Y
PREPORT
3 a : X MD VB to DT Y
PREPORT
4 : X MD to the Y
PREPORT
5 : X (JJ) ? VB to the Y
PREPORT
6 : X MD VB to (**) ? Y
compromise between
constrained and marginal structures
excessively permissive structures
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
Pattern based relation extraction
Example
“The heads report to the vice president”
lexical pattern : X report to Y
lexico-syntactic pattern :
PREPORT
1 : X report to the Y
PREPORT
2 : X VB IN DT Y
PREPORT
3 : X MD VB to DT Y
PREPORT
3 a : X MD VB to DT Y
PREPORT
4 : X MD to the Y
PREPORT
5 : X (JJ) ? VB to the Y
PREPORT
6 : X MD VB to (**) ? Y
compromise between
constrained and marginal structures
excessively permissive structures
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
Pattern based relation extraction
Example
“The heads report to the vice president”
lexical pattern : X report to Y
lexico-syntactic pattern :
PREPORT
1 : X report to the Y
PREPORT
2 : X VB IN DT Y
PREPORT
3 : X MD VB to DT Y
PREPORT
3 a : X MD VB to DT Y
PREPORT
4 : X MD to the Y
PREPORT
5 : X (JJ) ? VB to the Y
PREPORT
6 : X MD VB to (**) ? Y
compromise between
constrained and marginal structures
excessively permissive structures
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
Pattern based relation extraction
Example
“The heads report to the vice president”
lexical pattern : X report to Y
lexico-syntactic pattern :
PREPORT
1 : X report to the Y
PREPORT
2 : X VB IN DT Y
PREPORT
3 : X MD VB to DT Y
PREPORT
3 a : X MD VB to DT Y
PREPORT
4 : X MD to the Y
PREPORT
5 : X (JJ) ? VB to the Y
PREPORT
6 : X MD VB to (**) ? Y
compromise between
constrained and marginal structures
excessively permissive structures
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
Pattern based relation extraction
Example
“The heads report to the vice president”
lexical pattern : X report to Y
lexico-syntactic pattern :
PREPORT
1 : X report to the Y
PREPORT
2 : X VB IN DT Y
PREPORT
3 : X MD VB to DT Y
PREPORT
3 a : X MD VB to DT Y
PREPORT
4 : X MD to the Y
PREPORT
5 : X (JJ) ? VB to the Y
PREPORT
6 : X MD VB to (**) ? Y
compromise between
constrained and marginal structures
excessively permissive structures
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
Pattern based relation extraction
Example
“The heads report to the vice president”
lexical pattern : X report to Y
lexico-syntactic pattern :
PREPORT
1 : X report to the Y
PREPORT
2 : X VB IN DT Y
PREPORT
3 : X MD VB to DT Y
PREPORT
3 a : X MD VB to DT Y
PREPORT
4 : X MD to the Y
PREPORT
5 : X (JJ) ? VB to the Y
PREPORT
6 : X MD VB to (**) ? Y
“The heads will directly report to the executive vice president”
compromise between
constrained and marginal structures
excessively permissive structures
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Semantic relations
Relation Extraction
Pattern Approaches
Patterns for ontology enrichment
NLP Preprocessing
tokenization
lemmatization
POS tagging
Processing steps
1 Pattern building
2 Pattern matching process
3 Mapping of the terms related by the pattern to ontoly lexicon
Identification of the concepts
Identification of the structural level
Hypothesis
several patterns express the same relation
possible relation types depend on the corpus
for a given relation type, patterns depend on the corpus
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Cameleon
Evalution
Going further evaluation
1 State of the art
2 Pattern Approach Contribution
Cameleon
Evalution
Going further evaluation
3 Framework
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Cameleon
Evalution
Going further evaluation
A support for relation extraction
2 - Concept and
relation definition
1 – Identify
semantically
rich phrases
Corpus
Lexical relations
in phrases Conceptual relations Formal Relations
4 - Knowledge
representation
Def-concept A:B
role a-pour-partie
role
3 - Normalization Text selection defines a
corpus
NLP helps during step 1
Human interpretation is
required for steps 2 and 3
Caméléon : supports
steps 1 and 2
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Cameleon
Evalution
Going further evaluation
Pattern definition in Caméléon
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Cameleon
Evalution
Going further evaluation
Pattern definition in Caméléon
BEGIN
Context
X
Context
Y
END
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Cameleon
Evalution
Going further evaluation
Pattern definition in Caméléon
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Cameleon
Evalution
Going further evaluation
Evaluation
Method
Define 71 reusable patterns for French (adapted from linguistic studies)
Relation types : hyperonymy, meronymy, definition, ...
Select 8 corpora in 8 different domains of 3 genres
Technical writings , Scientific papers, Handbooks
Evaluation measures
Recall : only available for definition patterns
Precision : correct phrases / matched phrases
No measure of the relevance for the ontology
Results
Dependency of a pattern efficiency and meaning on the textual genre
Requires manual evaluation
Precise but not very productive (silence)
Adapted to well written, pedagogical texts
Requires to capitalize patterns and experiments with pattern
performances
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Cameleon
Evalution
Going further evaluation
Conclusions from the evaluation
Claims about pattern reuse
There is no generic (universal) pattern
But any pattern should be reusable after adaptation
Claims about pattern manual definition
Manual pattern definition is time consuming
Pattern definition requires linguistic competency
Reuse of already defined patterns reduces this cost
Patterns adaptation to each corpus improves efficiency
Claims about pattern evaluation
Capitalize previous uses of patterns together with the corpus and evaluation
scores
Pattern efficiency may vary a lot according to the corpus
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
Cameleon
Evalution
Going further evaluation
Our goals from the first evaluation
Answers to evaluation
Claims about pattern reuse
validation of "generic" patterns through uses
construction of annotated pattern base depending on domain
Claims about pattern manual definition
learning to improve time pattern acquisition
NLP assistance to user at each step (building and validating)
capitalization of patterns
Toward more assistance
=> how to improve the existant tool with learning and NLP technics ?
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
General Overview
Processing Steps
Interactive Interfaces
1 State of the art
2 Pattern Approach Contribution
3 Framework
General Overview
Processing Steps
Interactive Interfaces
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
General Overview
Processing Steps
Interactive Interfaces
Processing Steps
User validation
Corpus
Ontology
Pattern
Base
Enriched Ontology
Ontology Integration
∂REL(Ci, Cj) Watson
INPUT
OUTPUT
tagged
text
Pattern
Base
Enriched Pattern Base
NLP chain
Pattern Matching
Rel(ti, tj)
Relation Proposal
REL
Pattern Proposal
PRELi
Ontology enrichment Pattern Base enrichment
Identified Relation Unknow Relation
Concept Identification
RELC(Ci, Cj)
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
General Overview
Processing Steps
Interactive Interfaces
Input
User validation
Corpus
Ontology
Pattern
Base
Enriched Ontology
Ontology Integration
∂REL(Ci, Cj) Watson
INPUT
OUTPUT
tagged
text
Pattern
Base
Enriched Pattern Base
NLP chain
Pattern Matching
Rel(ti, tj)
Relation Proposal
REL
Pattern Proposal
PRELi
Ontology enrichment Pattern Base enrichment
Identified Relation Unknow Relation
Concept Identification
RELC(Ci, Cj)
Required Ressources
Lightweight Ontology
Pattern Base
Corpus
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
General Overview
Processing Steps
Interactive Interfaces
Inputs
A lightweight Ontology defined with
a structure : tuple S := {C,R,, @R}
a lexicon : a tuple L : {LC, LR, F,G}
For our approach : at least a hierarchy of concepts defined with
a lexicon L0 := {LC, F} a structure S0 := {C, }.
A corpus
extracted by experts from existing document collections
tagged with a morpho-syntactic analysis (tozenisation, lemmatization
and POS)
A pattern base
storing existing patterns (mostly adapted from Caméléon)
organised according to the relation captured by the patterns (relation
type, label ...)
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
General Overview
Processing Steps
Interactive Interfaces
Overview on the relation identification process
Pattern
Base
Ontology Integration
∂REL(Ci, Cj) Watson
tagged
text
Pattern Matching
Rel(ti, tj)
Relation Proposal
REL
Pattern Proposal
PRELi
Ontology enrichment Pattern Base enrichment
Identified Relation Unknow Relation
Concept Identification
RELC(Ci, Cj)
For each pair of distinct ontology concepts (ci ,cj ), we look for each sentence
s containing ti ,tj (labels of ci and cj defined in the lexicon of the ontology).
If a base pattern PRELi can be matched
on the sentence s, we store the relation
REL , RELc (ci , cj ) and s.
Else, we search for a relation that could
be defined in an existing ontology. If
such a relation is found, we store
RELnew , RELc
new (ci , cj ) and s.
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
General Overview
Processing Steps
Interactive Interfaces
Relation proposal
Pattern
Base
Ontology Integration
∂REL(Ci, Cj) Watson
tagged
text
Pattern Matching
Rel(ti, tj)
Relation Proposal
REL
Pattern Proposal
PRELi
Ontology enrichment Pattern Base enrichment
Identified Relation Unknow Relation
Concept Identification
RELC(Ci, Cj)
1 Detection of co-occurring concepts (ci , cj )
in sentences but no relations extracted
after pattern matching
2 Search for a relation between the
concepts in an existing ontology
3 Proposition of the relation RELnew
between (ci , cj )
Example between the concepts “Person” and “Company”
“Airbus Names New Chief
Managers for A380, A320
Programs”
“ Aircraft maker Airbus has
named Mario Heinen senior vice
president and chief manager of
the A380 aircraft program”
“Laurence Barron, the vice senior
president of Airbus”
=> Watson proposes the
“Work_for” relation.
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
General Overview
Processing Steps
Interactive Interfaces
Pattern proposal
Propositon of a set of patterns
Analysis of each sentence containing a concept pair RELnew (ci , cj )
Proposition of patterns according to each element known by the system
(grammatical category, semantic category or lemma).
“Work_for”
“Airbus Names New Chief Managers for A380, A320 Programs”
“ Aircraft maker Airbus has named Mario Heinen senior vice president and chief manager of the
A380 aircraft program”
Proposed patterns :
1 X name Y
2 X name (NP) ? (NP) ? Y
3 X (VB=choose) Y
4 X (MD) ? (VB=choose) Y
5 X name NE_Person Y (with NE_Person semantic class of Named Person)
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
General Overview
Processing Steps
Interactive Interfaces
The ontologist’s validation (1)
Relation validation
Relation validation :
1 For each relation detected (REL and RELnew ), the system displays
the relation labels
the pair of concepts related
the sentences where they occur.
2 The user can either decide to
add the relation between ci and cj
add the relation between an ancestor of ci and an ancestor of cj
add the relation between a concept linked to ci in the ontology and a
concept linked to ci
reject the relation for the pair
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
General Overview
Processing Steps
Interactive Interfaces
The ontologist’s validation (2)
Pattern validation
1 For each new relation RELnew , the sytem displays the relation and all the
pairs of concepts RELc
new (ci , cj ).
2 The user is asked to validate the relevance of the relation.
3 If he validates the relation, the system proposes the set of patterns
generated for the relation and the sentences.
4 The user validates the relevance of the patterns. The patterns validated
by the user are added to the base.
When?
AFTER the processing in order to let the ontologist validate with an overview
on the propositions
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
General Overview
Processing Steps
Interactive Interfaces
Output
User validation
Corpus
Ontology
Pattern
Base
Enriched Ontology
Ontology Integration
∂REL(Ci, Cj) Watson
INPUT
OUTPUT
tagged
text
Pattern
Base
Enriched Pattern Base
NLP chain
Pattern Matching
Rel(ti, tj)
Relation Proposal
REL
Pattern Proposal
PRELi
Ontology enrichment Pattern Base enrichment
Identified Relation Unknow Relation
Concept Identification
RELC(Ci, Cj)
Required Ressources
Enriched Ontology
EnrichedPattern Base
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
General Overview
Processing Steps
Interactive Interfaces
The need for a interactive interface
Camel Léon
Parcourir
Parcourir
Corpus
EADS
Ontology WebContent
OK
INPUT
Text
Corpus
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
General Overview
Processing Steps
Interactive Interfaces
The need for a interactive interface
Camel Léon
Number of labels : Visualization
Pattern Matching
Labels discovery
320
Number of pairs : 73
Aircraft
Pilot
Airbus
Company
Seat
A320
Nb Term 1 Term 2
3
1
5
Export
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
General Overview
Processing Steps
Interactive Interfaces
The need for a interactive interface
Camel Léon
Number of pairs :
Ontology Building
Pattern Matching
73
Number of successful matching : 35
Aircraft
Pilot
Airbus
Company
Controller
A320
Relation Term 1 Term 2
BUY
COMMUNICATE
OWN
Number of orphan pairs : 38
Relation Discovery
Sumary New Relations Relation Discovery
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
General Overview
Processing Steps
Interactive Interfaces
The need for a interactive interface
Camel Léon
Ontology Building
Pattern Matching
Aircraft
Pilot
Airbus
Company
Controller
A320
Relation Term 1 Term 2
BUY
COMMUNICATE
OWN
Number of orphan pairs : 38
Relation Discovery
Summary New Relations Relation Discovery
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
General Overview
Processing Steps
Interactive Interfaces
The need for a interactive interface
Relation Integration
Relation REPORT
Terms Context Pattern OK
Head
Vice_President
Co-Pilote
Local_Government
Executive_Head
Pilote
• Term1 Context Pattern
Heas will report to
the vice-president
and the pilot
Context
VB ? TO
VB Adj
VMOD VB
Add
Term 1 Term 2
More option...
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
General Overview
Processing Steps
Interactive Interfaces
The WebContent project : a case study
Project for building a computing environment to
explore and use the Web Semantic technologies for
applications
Inputs
a lightweight Ontology dedicated to aeronotics
a corpus of news releases
a pattern base composed of patterns adapted from Caméléon
Examples of extracted relations
Relation ci cj Examples of textual segment
Comm. Pilote Controller “the controller instructed
the pilot"
Build Aircraft Plant “Airbus is expected to begin
Manufacturer to build plant"
Build Factory Plane “the plant will have the
capacity to assemble four"
aircrafts per month"
Own Airway Plane “the Airbus A320 belonging
Company to Armenian Airlines"
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
State of the art
Pattern Approach Contribution
Framework
General Overview
Processing Steps
Interactive Interfaces
Conclusion
"Work in process"
Evaluation by users and tasks
Main concerns
being modest
concerned by quality more than quantity
not building the whole ontology, but enriching an existing hierachy of concepts
being ambitious
covering all the steps of Ontology Enrichment (Pattern Identification,
Representation of relations)
Future works
Interface Implementation
User validation (comparison Caméléon I - Caméléon II)
Improvement of the pattern set proposal process
Marie Chagnoux, Nathalie Hernandez, Nathalie Aussenac Pattern-based Approach
页:
[1]