weblyzard_api Package

weblyzard_api Package

The webLyzard API package.

Provides support for webLyzard web services. Please refer to client Module for a list of available web services.

xml_content Module

Created on Feb, 27 2013

Handles the new (http://www.weblyzard.com/wl/2013#) weblyzard XML format.

Functions added:
  • support for sentence tokens and pos iterators
Remove functions:
  • compatibility fixes for namespaces, encodings etc.
  • support for the old POS tags mapping.
class weblyzard_api.xml_content.LabeledDependency(parent, pos, label)

Bases: tuple

label

Alias for field number 2

parent

Alias for field number 0

pos

Alias for field number 1

class weblyzard_api.xml_content.Sentence(md5sum=None, pos=None, sem_orient=None, significance=None, token=None, value=None, is_title=False, dependency=None)[source]

Bases: object

The sentence class used for accessing single sentences.

Note

the class provides convenient properties for accessing pos tags and tokens:

  • s.sentence: sentence text
  • s.tokens : provides a list of tokens (e.g. [‘A’, ‘new’, ‘day’])
  • s.pos_tags: provides a list of pos tags (e.g. [‘DET’, ‘CC’, ‘NN’])
as_dict()[source]
Returns:a dictionary representation of the sentence object.
dependency_list
Returns:the dependencies of the sentence as a list of LabeledDependency objects
Return type:list of :py:class: weblyzard_api.xml_content.LabeledDependency objects
>>> s = Sentence(pos='RB PRP MD', dependency='1:SUB -1:ROOT 1:OBJ')
>>> s.dependency_list
[LabeledDependency(parent='1', pos='RB', label='SUB'), LabeledDependency(parent='-1', pos='PRP', label='ROOT'), LabeledDependency(parent='1', pos='MD', label='OBJ')]
get_dependency_list()[source]
Returns:the dependencies of the sentence as a list of LabeledDependency objects
Return type:list of :py:class: weblyzard_api.xml_content.LabeledDependency objects
>>> s = Sentence(pos='RB PRP MD', dependency='1:SUB -1:ROOT 1:OBJ')
>>> s.dependency_list
[LabeledDependency(parent='1', pos='RB', label='SUB'), LabeledDependency(parent='-1', pos='PRP', label='ROOT'), LabeledDependency(parent='1', pos='MD', label='OBJ')]
get_pos_tags()[source]

Get the POS Tags as list.

>>> sentence = Sentence(pos = 'PRP ADV NN')
>>> sentence.get_pos_tags()
['PRP', 'ADV', 'NN']
get_pos_tags_list()[source]
Returns:list of the sentence’s POS tags
>>> sentence = Sentence(pos = 'PRP ADV NN')
>>> sentence.get_pos_tags_list()
['PRP', 'ADV', 'NN']
get_pos_tags_string()[source]
Returns:String of the sentence’s POS tags
>>> sentence = Sentence(pos = 'PRP ADV NN')
>>> sentence.get_pos_tags_string()
'PRP ADV NN'
get_sentence()[source]
get_tokens()[source]
Returns:an iterator providing the sentence’s tokens
pos_tag_string
Returns:String of the sentence’s POS tags
>>> sentence = Sentence(pos = 'PRP ADV NN')
>>> sentence.get_pos_tags_string()
'PRP ADV NN'
pos_tags

Get the POS Tags as list.

>>> sentence = Sentence(pos = 'PRP ADV NN')
>>> sentence.get_pos_tags()
['PRP', 'ADV', 'NN']
pos_tags_list
Returns:list of the sentence’s POS tags
>>> sentence = Sentence(pos = 'PRP ADV NN')
>>> sentence.get_pos_tags_list()
['PRP', 'ADV', 'NN']
sentence
set_dependency_list(dependencies)[source]

Takes a list of weblyzard_api.xml_content.LabeledDependency

Parameters:dependencies (list) – The dependencies to set for this sentence.

Note

The list must contain items of the type weblyzard_api.xml_content.LabeledDependency

>>> s = Sentence(pos='RB PRP MD', dependency='1:SUB -1:ROOT 1:OBJ')
>>> s.dependency_list
[LabeledDependency(parent='1', pos='RB', label='SUB'), LabeledDependency(parent='-1', pos='PRP', label='ROOT'), LabeledDependency(parent='1', pos='MD', label='OBJ')]
>>> s.dependency_list = [LabeledDependency(parent='-1', pos='MD', label='ROOT'), ]
>>> s.dependency_list
[LabeledDependency(parent='-1', pos='MD', label='ROOT')]
set_pos_tags(new_pos_tags)[source]
set_pos_tags_list(pos_tags_list)[source]
set_pos_tags_string(new_value)[source]
set_sentence(new_sentence)[source]
tokens
Returns:an iterator providing the sentence’s tokens
class weblyzard_api.xml_content.XMLContent(xml_content, remove_duplicates=True)[source]

Bases: object

SUPPORTED_XML_VERSIONS = {'deprecated': <class 'weblyzard_api.xml_content.parsers.xml_deprecated.XMLDeprecated'>, 2005: <class 'weblyzard_api.xml_content.parsers.xml_2005.XML2005'>, 2013: <class 'weblyzard_api.xml_content.parsers.xml_2013.XML2013'>}
add_attribute(key, value)[source]
classmethod apply_dict_mapping(attributes, mapping=None)[source]
as_dict(mapping=None, ignore_non_sentence=False, add_titles_to_sentences=False)[source]

convert the XML content to a dictionary.

Parameters:
  • mapping – an optional mapping by which to restrict/rename the returned dictionary
  • ignore_non_sentence – if true, sentences without without POS tags are omitted from the result
content_id
content_type
classmethod convert(xml_content, target_version)[source]
get_content_id()[source]
get_content_type()[source]
get_lang()[source]
get_nilsimsa()[source]
get_plain_text()[source]
Returns:the plain text of the XML content
get_sentences()[source]
classmethod get_text(text)[source]
Returns:the utf-8 encoded text
get_title()[source]
get_xml_document(header_fields='all', sentence_attributes=('pos_tags', 'sem_orient', 'significance', 'md5sum', 'pos', 'token', 'dependency'), xml_version=2013)[source]
Parameters:
  • header_fields – the header_fields to include
  • sentence_attributes – sentence attributes to include
  • xml_version – version of the webLyzard XML format to use (XML2005.VERSION, XML2013.VERSION)
Returns:

the XML representation of the webLyzard XML object

classmethod get_xml_version(xml_content)[source]
lang
nilsimsa
classmethod parse_xml_content(xml_content, remove_duplicates=True)[source]
plain_text
Returns:the plain text of the XML content
sentences
title
update_attributes(new_attributes)[source]

updates the existing attributes with new ones

update_sentences(sentences)[source]

updates the values of the existing sentences. if the list of sentence object is empty, sentence_objects will be set to the new sentences.

Parameters:sentences – list of Sentence objects

Warning

this function will not add new sentences