LogosLink User's Manual · LogosLink version 2.0.1

Add and Remove Documents in a Corpus

A corpus is basically a collection of documents that you can analyse as a whole. A corpus resides on a directory, which is set on creation. At any point, you can add files to that directory and sync the corpus to them.

Follow these steps to add and sync documents in a corpus:

  1. Open LogosLink Desktop.
  2. Open a corpus by clicking on the Open button in the Home ribbon tab. Alternatively, open the backstage and click on Open to select or browse for a corpus. LogosLink Desktop shows the corpus in a new window.
  3. Open the backstage and click on Properties to display the corpus properties.
  4. Locate the Saved to field and click on the button to navigate to the corpus directory. A File Explorer window will show, highlighting the directory.
  5. Double click the directory to browse its contents. You will see the files that make up the corpus, including source, text and dependent model files..
  6. Add some new files to the corpus directory or any of its sub-directories, as needed.
  7. Close the File Explorer and switch back to the corpus window in LogosLink Desktop.
  8. Make sure that the corpus is in read/write mode. You can check and change this via the Access Mode button in the Home ribbon tab.
  9. Close the backstage if still open.
  10. Switch to the Insert ribbon tab.
  11. Click on Sync Documents. The Sync Documents dialog box will show.
  12. Choose the sync options as you wish:
    • Select the directory that you want to sync. If you select (base), the base directory will be synced. You can also check the Recurse box if you want to sync any sub-directories.
    • Make sure Add new documents is checked to add new documents to the corpus from any new files.
    • Select a Label if you want to label any newly created documents so that you can easily find them later. Leave it as (none) if you don't want to label new documents.
    • Check Auto title to set the titles of any new documents automatically. A number of mechanisms are used to set the title, such as getting it from the PDF or HTML file, or from one of the first lines in a text file. You can set the line number where the title is located in text files, or leave it as zero to have LogosLink Desktop guess it automatically.
    • Check Auto source id to set the documents source id automatically from the files.
    • Check Extract text to extract any text content from HTML, PDF or other source files into a text file. You can also check Clean up text to clean up the extracted or existing text. There are additional text clean-up options that you can set.
    • Check Determine language to have LogosLink Desktop automatically determine the language for each text file.
    • Finally, check Add empty folders if you want LogosLink Desktop to add folders for empty sub-directories, and Delete nonexistent documents to delete documents in the corpus that are not backed by any files on disk.
  13. When you are ready, click OK to sync the documents. LogosLink Desktop will sync them and show the updated document list in the Documents pane on the left.

Of course, you can also delete documents from the corpus Follow these steps to delete documents:

  1. Open LogosLink Desktop.
  2. Open a corpus by clicking on the Open button in the Home ribbon tab. Alternatively, open the backstage and click on Open to select or browse for a corpus. LogosLink Desktop shows the corpus in a new window.
  3. Make sure that the corpus is in read/write mode. You can check and change this via the Access Mode button in the Home ribbon tab.
  4. Select the documents you want to delete in the Documents pane on the left.
  5. Right-click the selected documents to display the context menu. Click Delete on the menu.
  6. LogosLink Desktop will ask you to confirm the deletion. You can choose to delete the files associated to the documents as well, or only the documents in the corpus, leaving the files on disk.
  7. Click the desired option. LogosLink Desktop will delete the documents and show the updated list in the Documents pane on the left.

Contents distributed under a Creative Commons Attribution 4.0 International License · About · Terms of Use · Contact Us · last updated on 21/04/2025 16:00