API Reference

lsa-program main part are the parsers that convert the different kinds of database documents into manageable dictionaries that only contain the interest metadata fields. Those are implemented in the record module.

The record API

The main class of the record API is the RecordParser class, which outlines an api that parses data out of a raw string, or raw data structure into a dictionary with the desired interest fields, details about how to extract that information will go into the RecordParser child classes.

Furthermore, the RecordParser is complemented by the RecordIterator class, that outlines an interface to iterate over a file containing several records and returning (yielding) all the records in a memory efficient fashion.

Implementations of the record API

Parser implementations

Iterator implementations

The utility module

The scripts package

Database manipulation

Utils to work with a mongo database, it contains a global connection to the database so that a new one is not created with every request which is a huge overhead. Furthermore it has a tool to use a pymongo collection as a context manager.

lsa.scripts.dbutil.collection(name, dbname='program', delete=True)

Yields a mongo collection with name the given name in the specified database it has the advantage of not having to create the collection everywhere in the program.

  • name (str like) – name of the collection
  • dbname (str like) – name of the database to get the collection from
  • delete (bool) – either delete the content of the collection or not

collection as a context manager

Return type:



Prepends ‘lsa-‘ to the given name, so that all collections for the lsa program have consistent names.

Script entry points

The entry points are organized in modules, this leads to some code duplication but it can be reduced in the future. The populate script, which yields the lsapopulate command is located in the populate module, and contains the information descripted bellow.

The model script, which yields the lsamodel command is located in the model module, and contains the information descripted bellow.

The query script, which yields the lsaquery command is located in the query module, and contains the information descripted bellow.