Basic queries
Odinson basic queries allow for specifying a condition for the start of a "path", a valid end of the path, and the traversals that are licensed for the path itself. These conditions can be specified in terms of token constraints (surface patterns), path constraints (graph traversals), or both.
Surface pattens
An example of a surface pattern is shown here:
[tag=/N.*/] and [lemma=dog]
This pattern will match any occurrence in the corpus of a noun (as specified by the tag beginning with N) followed by and, and finally being followed immediately by a word whose lemma is dog.
Named Captures
To capture aspects of the match, we can add a named capture. To do this, you need to specify the name of the capture and surround the portion of the pattern that is to be captured with
(?<name> ... )
. For example, when this query:
(?<animal> [tag=/N.*/]) and [lemma=dog]
is applied to the sentence "I like cats and dogs", the system
will find the mention "cats and dogs", and this mention would contain a named capture with the label animal
containing "cats".
Adding syntax through graph traversals
Here is an example of a pattern that captures a subject-verb-object relation involving phosphorylation:
(?<controller> [entity=PROTEIN]) <nsubj phosphorylates >dobj (?<theme> [entity=PROTEIN])
This pattern will look for a sentence in which a token tagged as a PROTEIN (though a hypothetical NER component) is the subject of the verb "phosphorylates", and in which that same verb has a direct object which is also tagged as a PROTEIN. To put it another way, reading the pattern from left-to-right, Odinson will look for a token tagged as a PROTEIN, try to traverse backwards against an incoming nsubj
dependency arc, land on "phosphorylates", and then traverse an outgoing dobj
dependency arc to land on a token also tagged as a PROTEIN. If it finds such a sentence, the first PROTEIN will be extracted with the label controller
, and the second will have the label theme
(because of the named captures).
Combining representations
Note that in Odinson, patterns can hop between surface and syntax representations arbitrarily often, as is done in this query:
Jack and Jill <nsubj went >nmod_up [] to fetch >dobj >nmod_of water
which has a successful match in the sentence "Jack and Jill went up the hill to fetch a pail of water."