LogosLink User's Manual
·
LogosLink version 2.0.0
Corpus Elements
Corpus elements are the atomic components of a corpus.
There are multiple kinds of corpus elements.
In addition, corpus elements are connected via many kinds of relationships, forming a mesh that you can navigate and explore.
Details
There are multiple kinds of corpus elements, which are described in the following sections.
In addition to documents, a corpus may contain document references, folders, labels, topics, authors, option lists, custom properties and collections.
Documents constitute the main elements in a corpus.
A document consists of some content (a piece of text) plus some data such as its title, publication date, language, etc.
In addition, each document in a corpus may have a set of associated files:
-
A text file, which contains the plain-text content of the document.
Most operations that LogosLink can perform on documents operate on this text file.
-
A source file, which is the original source of the document.
This can be any format, such as PDF, a web page, an audio or video file, etc.
LogosLink can extract text from most source file formats, so you can obtain text files from source files automatically most of the time by using the Extract Text button in the Document or Active Documents ribbon tabs of the Corpus window.
- A context file, which contains the dependent context for the document, if there is one.
- A ontology file, which contains the dependent ontology for the document, if there is one.
- A argumentation model file, which contains the dependent argumentation model for the document, if there is one.
- A agency model file, which contains the dependent agency model for the document, if there is one.
Documents may hold references to other documents.
Document references are useful to record the fact that a document explicitly refers to another, for example through citation, parody or translation.
Folders allow you to organise documents in a hierarchy of nested containers, very much like you do with the folders on your computer.
Corpus folders may correspond to physical folders (or directories) on your computer, but they don't need to.
You can define as many folders as you want, and nest them in a hierarchy as deep as you need.
Folders constitute the home of documents.
Although you can move documents from a folder to another, corpus folders are designed to be stable, so once you put a document in a folder, you shouldn't need to move it very often.
You can think of folders as the primary and most basic mechanism for providing structure to your corpus.
Bear in mind that you may not need folders at all.
A small corpus can be fully stored in a single root folder.
You only need folders if you have a corpus with a complex structure or that needs partitioning in clearly different subsets.
In addition to their name, folders may have some additional data such as a description.
Labels allow you to categorise or classify documents in any manner.
A label is just a short name tag that you can attach to documents, very much like a hashtag in social networks.
Labels are lightweight, so you can define as many as you want, and apply them to many documents.
For example, you may have a label "pending" to mark documents you still need to revise, or a label "feminism" to classify documents that are related to feminism.
In addition to their name, labels may have some additional data such as a description.
Topics also work as a a categorisation mechanism for documents, but have some important differences in relation to labels.
Most importantly, topics can be organised as a hierarchy, so a topic can have sub-topics, and each of these sub-sub-topics, etc.
Also, a topic may have a dependent context, ontology and question set.
For example, you may define a "Second World War" topic in a corpus about 20th century wars, and develop an ontology for that topic that describes the major entities and relationships in World War II.
In addition, and unlike labels, topics can have custom properties.
Like labels, you can define as many topics as you want, and apply them to many documents.
You can think of topics as the main way to create major compartments or divisions within a corpus, whereas labels are finer-grained classifiers.
Once a corpus is created, its topics rarely change much.
However, its labels are often created, altered and deleted as the work with the corpus progresses.
Finally, and like with labels, topics may have some additional data in addition to their name, such as a description.
Authors represent the people who created the documents in the corpus.
You use authors to identify who created each document.
Depending on your project, working with auhors may be more or less important.
You can define as many authors as needed, and assign them to documents as necessary.
Authors may have given and family names, a nickname, a date of birth, a country of origin and other data.
Option lists are lists of terms that LogosLink uses to categorise other elements.
For example, there is a Document Kinds option list that can enumerate what kinds of documents (such as reports, interviews, articles, etc.) there are in the corpus.
You can add, modify and remove items to option lists through the Option Lists button in the Corpus ribbon tab of the Corpus window.
Custom properties are properties of documents, topics or authors that you define to address your needs.
Please see the topic on custom properties for details.
Collections are groups of documents that you can treat as a whole for some operations.
Please see the topic on collections for details.
See Also
Contents distributed under a Creative Commons Attribution 4.0 International License
·
About
·
Terms of Use
·
Contact Us
·
last updated on 02/01/2025 11:53