minsci.xmu package

Submodules

minsci.xmu.fields module

Reads and returns information about EMu’s schema

class minsci.xmu.fields.XMuFields(schema_path=None, whitelist=None, blacklist=None, cache=True, verbose=False)[source]

Bases: object

Reads and stores metadata about fields in EMu

Parameters:
  • schema_path (str) – path to EMu schema file. If None, looks for a copy of the schema stored in files.
  • whitelist (list) – list of EMu modules to include. If None, anything not on the blacklist is included.
  • blacklist (list) – list of EMu modules to exclude. If None, no modules are excluded.
  • cache (str) – path to cache file. If specified, script will check there for a cache file and create one if it isn’t found.
  • verbose (bool) – triggers verbose output
schema

dict – path-keyed dicts of field data

tables

dict – module-keyed lists of paths to tables

map_tables

dict – path-keyed lists of paths to tables

verbose

bool – triggers verbose output

add_table(columns)[source]

Update table containers with new table

Parameters:columns (list) – columns in the table being added
get(*args)[source]

Return data for an EMu export path

Modified from DeepDict.pull() to jump to a different module when a reference is encountered.

Parameters:*args – the path to a value in the dictionary, with one component of that path per arg
Returns:Dictionary with information about the given path
static get_xpath(*args)[source]

Reformat plain-text path to xpath

Parameters:path (str) – an XMuFields path
Returns:Path string reformatted as in an EMu export
static read_fields(fp)[source]

Reads paths from the schema in an EMu XML export

Parameters:fp (str) – path to the EMu XML report
Returns:List of paths in the EMu schema
set_alias(alias, path)[source]

Add alias: path to self.schema

Parameters:
  • alias (str) – name of alias
  • path (str) – path to alias
minsci.xmu.fields.is_reference(*args)[source]

Checks whether a path is a reference

Parameters:path (str) – period-delimited path to a given field
Returns:Boolean
minsci.xmu.fields.is_table(*args)[source]

Checks whether a path points to a table

Parameters:path (str) – period-delimited path to a given field
Returns:Boolean

minsci.xmu.xmu module

Reads and writes XML formatted for Axiell EMu

class minsci.xmu.xmu.ABCEncoder(*args, **kwargs)[source]

Bases: json.encoder.JSONEncoder

default(abc)[source]

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)
class minsci.xmu.xmu.Grid(fields, operator)

Bases: tuple

fields

Alias for field number 0

operator

Alias for field number 1

class minsci.xmu.xmu.XMu(path, fields=None, container=None, module=None)[source]

Bases: object

Read and search XML export files from EMu

fields

XMuFields – based on fields kwarg

module

str – name of base module

record

dict – the currently active record

schema

dict – XMuFields.schema

tables

dict – XMuFields.tables

verbose

bool – triggers verbose output

xpaths

list – paths from source file

Parameters:
  • path (str) – path to EMu XML report or directory containing multiple reports. If multiple reports are found, they are handled from newest to oldest.
  • fields (XMuFields) – contains data about field
  • container (DeepDict) – class to use to store EMu data
autoiterate(keep=None, **kwargs)[source]

Automatically iterates over the source file and caches the result

container(*args)[source]

Wraps dict in custom container with attributes needed for export

fast_iter(func=None, report=0, skip=0, limit=0, callback=None, callback_kwargs=None, **kwargs)[source]

Use callback to iterate through an EMu export file

Parameters:
  • func (function) – name of iteration function
  • report (int) – number of records at which to report progress. If 0, no progress report is made.
  • skip (int) – number of records to skip before processing
  • limit (int) – number of record at which to stop processing the file
  • callback (function) – name of function to run upon completion
Returns:

Boolean indicating whether the entire file was processed successfully.

finalize()[source]

Placeholder for finalize method run at end of iteration

find(rec, *args)[source]

Return value(s) for a given path in the EMu XML export

Parameters:
  • rec (lxml.etree.ElementTree) – XML formatted for EMu
  • *args (str) – strings comprising the path to a field
Returns:

String (for atomic field) or list (for table) containing value(s) along the path given by *args. Blank rows that follow the last populated row in a table are not populated!

harmonize(new_val, old_val, path, action='fill')[source]

Harmonize new values with existing values on the same path

Parameters:
  • new_val (str) – new or replacement value
  • old_val (str) – existing value
  • path (str) – path to field in XMuSchema
  • action – can be one of ‘fill’ (add new value if blank), ‘append’ (append new value using either a new row or delimiter), or ‘replace’. The default is fill.
Returns:

Tuple containing (revised value, update boolean)

iterate(element)[source]

Placeholder for iteration method

load(fp=None)[source]

Load data from json file created by self.save

parse(element)[source]

Converts XML record to XMu dictionary

read(root, keys=None, result=None, counter=None)[source]

Read an EMu XML record to a dictionary

This is much faster than iterating through the XMu.xpaths list.

Parameters:
  • root (lxml.etree) – an EMu XML record
  • keys (list) – parents of the current key
  • result (XMuRecord) – path-keyed representation of root updated as the record is read
  • counter (dict) – tracks row counts by path
Returns:

Path-keyed dictionary representing root

read1(root, keys=None, result=None, counter=None)[source]

Read an EMu XML record to a dictionary

This is much faster than iterating through the XMu.xpaths list.

Parameters:
  • root (lxml.etree) – an EMu XML record
  • keys (list) – parents of the current key
  • result (XMuRecord) – path-keyed representation of root updated as the record is read
  • counter (dict) – tracks row counts by path
Returns:

Path-keyed dictionary representing root

save(fp=None)[source]

Save attributes listed in the self.keep as json

set_carryover(*args)[source]

Update the list of carryover attributes

set_keep(fields)[source]

Sets the attributes to load/save when using JSON functions

minsci.xmu.xmu.check_columns(*args)[source]

Check if columns in the same table are the same length

Parameters:*args – Lists of value for each column
minsci.xmu.xmu.check_table(rec, *args)[source]

Check that the columns in a table are all the same length

minsci.xmu.xmu.emuize(records, module=None)[source]

Checks record set and formats as EMu XML

Parameters:
  • records (list) – list of records
  • module (str) – name of module
minsci.xmu.xmu.write(fp, records, module=None)[source]

Convenience function for formatting and writing EMu XML

Parameters:
  • fp (str) – path to file
  • records (list) – list of XMuRecord() objects
  • module (str) – name of module

minsci.xmu.xmungo module

Reads data from NMNH MongoDB collections database

class minsci.xmu.xmungo.MongoBot(username, password, instance=None, container=None)[source]

Bases: object

Contains methods to connect and interact with NMNH MongoDB

static change_password(username, database)[source]

Changes password on db

connect(nickname)[source]

Store connection to a server in a dict

set_collection(instance, collection)[source]

Selects the collection to use

sync(sync_from, sync_to, collection, query=None)[source]

Synchronizes development server to production

class minsci.xmu.xmungo.MongoDoc(*args, **kwargs)[source]

Bases: dict

Dict sublass with methods supporting Mongo-style paths

getpath(path, default=None)[source]

Retrieves value from Mongo-style path

pprint()[source]

Pretty prints the dict

class minsci.xmu.xmungo.XMungo(*args, **kwargs)[source]

Bases: minsci.xmu.xmungo.MongoBot

Contains methods to interact with Mongo data using XMu tools

container(*args)[source]

Wraps dict in custom container with attributes needed for export

fast_iter(query=None, func=None, report=0, skip=0, limit=0, callback=None, **kwargs)[source]

Use function to iterate through a MongoDB record set

This method reproduces most (but not all) of the functionality of the XMu.fast_iter() method.

Parameters:
  • func (function) – name of iteration function
  • report (int) – number of records at which to report progress. If 0, no progress report is made.
  • limit (int) – number of record at which to stop
  • callback (function) – name of function to run upon completion
Returns:

Boolean indicating whether the entire record set was processed successfully.

finalize()[source]

Placeholder for finalize method run at end of iteration

iterate(element)[source]

Placeholder for iteration method

load()[source]

Load data from json file created by self.save

parse(doc)[source]

Converts Mongo document to XMu dictionary

save()[source]

Save attributes listed in the self.keep as json

set_keep(fields)[source]

Sets the attributes to load/save when using JSON functions

set_skip(skip)[source]

Sets the attributes to load/save when using JSON functions

minsci.xmu.xmungo.mongo2xmu(doc, container)[source]

Maps Mongo document to EMu XML format

Parameters:doc (dict) – sample data from mongodb
Returns:Sample data as container

Module contents

Provides tools to read, write, and otherwise process EMu XML files