LogosLink User's Manual · LogosLink version 2.0.0.2

Dataset

A dataset in LogosLink is a collection of information that you work with as a whole.

There are two major kinds of datasets: models and corpora. In turn, there are different kinds of models: contexts, ontologies, argumentation models, question sets and agency models.

Details

A dataset is composed of a collection of elements plus the relationships between them. Different kinds of datasets contain different kinds of elements. For example, corpora contain documents, labels and authors, whereas ontologies contain categories, atoms and properties. Although each kind of dataset is different, they all share some commonalities that are described below.

Kinds of datasets

There are the following kinds of datasets:

  • Corpus. A corpus is a collection of documents and related elements (labels, topics and authors, mostly) that you can use to analyse discourses across multiple sources, genres and speakers. A corpus is an excellent way to consolidate the analyses of multiple texts and study them together. A corpus is stored as a directory on your computer. All the files and sub-directories in the corpus are automatically managed by LogosLink. A corpus can own many dependent models.
  • Model. A model is a representation of a discourse or a set of discourses from a specific point of view. A model is stored as a file on disk, either its own file, or embedded in the file of another dataset (see below). There are different kinds of models:
    • Context. A context is a model that represents the overall social environment where your discourses take place. A context is composed of elements such as themes, positions and agents.
    • Ontology. An ontology is a model that represents what a text, or a collection of texts, are about. An ontology is composed of elements such as categories, atoms, properties and associations.
    • Argumentation model. An argumentation model represents how a text justifies what it says. An argumentation model is composed of elements such as locutions, transitions, propositions, inferences and conflicts.
    • Question set. A question set contains organised questions that you want to ask a text in order to obtain an agency model (see below). A question set is composed of elements such as entity lists, question groups and questions.
    • Agency model. An agency model represents the beliefs, desires and intentions of the speakers in the text. An agency model is composed of elements such as entities, responses and response parts.
Dependent, independent, stand-alone and embedded models

Models can be dependent or independent:

  • An independent model does not depend on other datasets. That is, there are no references to this model from other datasets. You are free to rename or move the file where an independent model is stored, as this will not break any references to it.
  • A dependent model depends on another dataset, called its owner. Dependent models can be stand-alone or embedded:
    • A stand-alone dependent model is stored in its own file. However, because there is another dataset that keeps a reference to it, you should never rename or move this file. This is the case with models that belong to a corpus. The corpus manages these files, and you access these models through the corpus.
    • An embedded dependent model is stored as part of its owner's file. That is, there is no file on disk for an embedded model. This is case, for example, with a corpus ontology, which is embedded in the corpus database file; or an agency model's question set, which is stored as part of the agency model file. You access embedded models through the owner dataset.

The pages for each particualr kind of dataset provide additional details on whether it can be dependent or independent, and in what scenarios.

See Also


Contents distributed under a Creative Commons Attribution 4.0 International License · About · Terms of Use · Contact Us · last updated on 19/12/2024 13:17