Annie-based Annotation Format

The webLyzard/WISDOM annotation format is based on the data structures used by the GATE project. A detailed description of these data structures can be found in the Gate Documentation on Language Resources: Corpora, Documents and Annotations.

Classes

  • Annotation Set(type:String) - an Annotation Set contains “n” Annotations
  • Annotation(start:int, end:int, type:String, feature=Map<String, String>)

Sentence-level annotations

Running example:

Andreas Wieland, CEO, Hamilton Bonaduz AG said: «We are very excited ...
012345678901234567890123456789012345678901234567890123456789012345678901
0.........1.........2.........3.........4.........5.........6.........7.

* Definition of the used JSON Fields * * sentence: the sentence’s MD5 sum * start: the annotation’s start position within the sentence * end: the annotation’s end position within the sentence * type: the annotation type * features: a dictionary of annotation features

Geonames

[{
        "start":31,
        "end":38,
        "sentence": "777081b7ebe4a99b598ac2384483b4ab",
        "type":"ch.htwchur.wisdom.entityLyzard.GeoEntity",
        "features":{
                "entities":[{
                        "confidence":7.0,
                        "url":"http://sws.geonames.org/2661453/",
                        "preferredName":"Bonaduz"
                },{
                        "confidence":6.0,
                        "url":"http://sws.geonames.org/7285286/",
                        "preferredName":"Bonaduz"
                }],
                "profile":"Cities.CH.de"
        }
}]

People

[{
        "start":0,
        "end":15,
        "sentence": "777081b7ebe4a99b598ac2384483b4ab",
        "type":"ch.htwchur.wisdom.entityLyzard.PersonEntity",
        "features":{
                "entities":[{
                        "confidence":1646.4685797722482,
                        "url":"http://www.semanticlab.net/proj/wisdom/ofwi/person/Andreas_Wieland_(014204)",
                        "preferredName":"Andreas Wieland"
                },{
                        "confidence":2214.9741075564775,
                        "url":"http://www.semanticlab.net/proj/wisdom/ofwi/person/Andreas_Wieland_(059264)",
                        "preferredName":"Andreas Wieland"
                },{
                        "confidence":1646.4685797722482,
                        "url":"http://www.semanticlab.net/proj/wisdom/ofwi/person/Andreas_Wieland_(047517)",
                        "preferredName":"Andreas Wieland"
                },{
                        "confidence":1646.4685797722482,
                        "url":"http://www.semanticlab.net/proj/wisdom/ofwi/person/Andreas_Wieland_(050939)",
                        "preferredName":"Andreas Wieland"
                },{
                        "confidence":2165.3683447585117,
                        "url":"http://www.semanticlab.net/proj/wisdom/ofwi/person/Andreas_Wieland_(049748)",
                        "preferredName":"Andreas Wieland"
                }],
                "profile":"ofwi.people"
        }
}]

Organizations

[{
        "start":22,
        "end":41,
        "sentence": "777081b7ebe4a99b598ac2384483b4ab",
        "type":"ch.htwchur.wisdom.entityLyzard.OrganizationEntity",
        "features":{
                "entities":[{
                        "confidence":438.9253911579335,
                        "url":"http://www.semanticlab.net/proj/wisdom/ofwi/teledata/company/7246",
                        "preferredName":"Hamilton Bonaduz AG"
                }],
                "profile":"ofwi.organizations"
        }
}]

Part-of-speech Tags

Please refer to used part-of-speech (POS) tags for a list of the POS-Tags used within webLyzard.

Anna is a student.
012345678901234567
0.........1.......
[
  {sentence="fbb1a44c0d422e496d87c3c8d23b4480", start=0, end=3,  type="Token", features={ 'POS': 'NN' } }
  {sentence="fbb1a44c0d422e496d87c3c8d23b4480", start=5, end=6,  type="Token", features={ 'POS': 'VRB' } }
  {sentence="fbb1a44c0d422e496d87c3c8d23b4480", start=8, end=8,  type="Token", features={ 'POS': 'ART' } }
  ...
]