minsci.xmu.tools package

Submodules

minsci.xmu.tools.audits module

Tools to parse audits data from EMu

class minsci.xmu.tools.audits.Auditor(*args, **kwargs)[source]

Bases: minsci.xmu.xmu.XMu

Processes an eaudits XML export into HTML for easy viewing

Parameters:
  • keep (int) – percent of records to include in the output
  • whitelist (list) – list of fields to output. All other fields will be ignored. The whitelist supersedes the blacklist.
  • blacklist (list) – list of fields to exclude from output. all other fields will be included.
  • modules (list) – list of modules to include in the report
  • users (list) – list of users to include in the report
keep

int – percent of records to include in the output

whitelist

list – list of fields to output. All other fields will be ignored. The whitelist supersedes the blacklist.

blacklist

list – list of fields to exclude from output. all other fields will be included.

combine(records=None, keep_all=False)[source]

Parses audit records into HTML

finalize()[source]

Placeholder for finalize method run at end of iteration

static format_value(field, val)[source]

Formats values pulled from the old/new table for printing

iterate(element)[source]

Groups audit records by module and irn

itermodified(element)[source]
to_html(rec)[source]

Converts an audit record to HTML for display

write_html(fp, html=None)[source]

Writes HTML document containing the HTMLized records

minsci.xmu.tools.describer module

Tools to describe and link multimedia using data from ecatalogue

class minsci.xmu.tools.describer.Description(object, caption, keywords, summary)

Bases: tuple

caption

Alias for field number 1

keywords

Alias for field number 2

object

Alias for field number 0

summary

Alias for field number 3

minsci.xmu.tools.describer.clean_caption(caption)[source]

Cleans vestigial phrases from caption

minsci.xmu.tools.describer.format_caption(descriptors)[source]

Formats caption based on the information in descriptors

minsci.xmu.tools.describer.format_colors(rec)[source]

Formats colors

minsci.xmu.tools.describer.format_gems(rec)[source]

Formats setting and cut of jewellery

minsci.xmu.tools.describer.format_locality(country, state, county)[source]

Formats locality info as a comma-delimited string

minsci.xmu.tools.describer.format_modifier(modifier)[source]

Formats a string as a compound modifier

minsci.xmu.tools.describer.get_caption(rec=None, descriptors=None)[source]

Derives a simple descripton of an object

minsci.xmu.tools.describer.get_descriptors(rec)[source]

Parses basic descriptive information about a record into a dict

minsci.xmu.tools.describer.get_keywords(rec=None, descriptors=None)[source]

Sets multimedia keywords for the given object

minsci.xmu.tools.describer.get_tags(rec=None, descriptors=None)[source]

Sets tags with special information useful in identifying objects

minsci.xmu.tools.describer.is_adverb(word)[source]

Simplistically checks if a word is an adverb

minsci.xmu.tools.describer.is_multiple(phrase)[source]

Simplistically checks if a phrase contains multiple items

minsci.xmu.tools.describer.summarize(rec)[source]

Summarizes basic information about an object

minsci.xmu.tools.groups module

Writes import for egroups based on a list of irns

minsci.xmu.tools.groups.write_group(module, irns, fp='group.xml', irn=None, name=None)[source]

Create EMu import for egroups based on a list of irns

Parameters:
  • module (str) – the backend name of the module (ecatalogue, eparties, etc)
  • irns (list) – list of irns to include in the group
  • fp (str) – path to write import file to
  • irn (int or str) – irn of existing group. Either this or name must be specified.
  • name (str) – name of new group. Either this or irn must be specified.

minsci.xmu.tools.legacy module

Tools to parse legacy data from an EMu export

class minsci.xmu.tools.legacy.Legacy(*args, **kwargs)[source]

Bases: minsci.xmu.xmu.XMu

Methods to parse legacy data from an EMu export

group()[source]

Create group of problematic records

iterate(element)[source]

Compares current and legacy data

class minsci.xmu.tools.legacy.Result(result, emu_value, orig_value)

Bases: tuple

emu_value

Alias for field number 1

orig_value

Alias for field number 2

result

Alias for field number 0

minsci.xmu.tools.legacy.create_receipt(fp, contents, creator, module, rec_id=None, title=None)[source]

Creates a receipt for a record

Parameters:
  • fp (str) – the path to the receipt file
  • contents (list) – a list of the form [‘# File metadata’, ‘key: val’, …]
  • creator (str) – the name of the cataloger/record creator
  • module (str) – the name of the module
  • rec_id (str) – the identifier of the record (usually a catalog number or irn)
  • title (str) – the title of the multimedia resource
minsci.xmu.tools.legacy.skip(rec, *args, **kwargs)[source]

Placeholder function for fields that have not been mapped

minsci.xmu.tools.legacy.standardize(val)[source]

Normalize a string to improve comparisons

minsci.xmu.tools.legacy.verify_analysis(rec, orig)[source]

Verifies chemical analysis against legacy data

minsci.xmu.tools.legacy.verify_country(rec, orig)[source]

Verifies country against legacy data

minsci.xmu.tools.legacy.verify_event(rec, orig)[source]

Verifies collection event against legacy data

minsci.xmu.tools.legacy.verify_mine(rec, orig)[source]

Verifies mine name against legacy data

minsci.xmu.tools.legacy.verify_ocean(rec, orig)[source]

Verifies ocean against legacy data

minsci.xmu.tools.legacy.verify_state(rec, orig)[source]

Verifies state/province against legacy data

minsci.xmu.tools.legacy.verify_taxon(rec, orig)[source]

Verifies classification against legacy data

minsci.xmu.tools.legacy.verify_volcano(rec, orig)[source]

Verifies volcano name against legacy data

minsci.xmu.tools.mapper module

Alias handling for processing EMu data that doesn’t use full paths

class minsci.xmu.tools.mapper.FieldMapper(module)[source]

Bases: object

Map field aliases to full paths in EMu

aliases

dict – maps aliases to full paths in EMu schema

module

str – the name of the EMu module being matched against

references

dict – maps aliases to reference fields

schema

dict – the EMu schema

tables

dict – maps columns to table fields

Parameters:module (str) – the name of the module being matched against
expand(rec)[source]

Expand fields in record based on known aliases

This should be used instead of the DeepDict.expand() function for records constructed from spreadsheets using the Mineral Sciences alias set.

Parameters:rec (dict) – record data
get_alias(path)[source]

Returns the alias for a given path

Parameters:path (str) – the full path to an EMu field
Returns:Alias for a given path, if it exists
get_data(*args)[source]

Returns data for a given path or alias

get_path(alias, schema_path=False)[source]

Returns the path for a given alias

Parameters:
  • alias (str) – the alias for a given path
  • schema_path (bool) – if true, uses the format needed for schema
Returns:

If schema_path is True, returns a list containing the path. If not, returns a tuple with the path formatted for schema.

get_references(fields)[source]

Map columns in references

Parameters:fields (list) – list of fields and aliases
Returns:List of references
get_tables(fields)[source]

Map columns in tables

Parameters:fields (list) – list of fields and aliases
Returns:List of tables
read_aliases(module)[source]

Read aliases for the given module from file

Parameters:module (str) – the backend name of an EMu module
Returns:A dict mapping aliases to paths
set_alias(alias, path)[source]

Sets the path for a given alias in class-wide lookups

Parameters:
  • alias (str) – the alias to assign to the given path
  • path (str or iterable) – the full path

minsci.xmu.tools.matcher module

Tools to match EMu records for making attachments

class minsci.xmu.tools.matcher.Matcher(module, include=None, exclude=None)[source]

Bases: minsci.xmu.xmu.XMu

Match data from a given record to existing EMu records

fields

list – the subset of EMu fields used to perform the match. If fields is None, all fields in the source will be considered.

from_json

bool – specifies whether fields lookup was read from a pre-existing JSON file

module

str – the name of the module

new

list – records that do not exist in EMu

attach(rec, fields, mapper)[source]

Attach a record from another module to the provided record

Parameters:
  • rec (XMuRecord) – an expanded XMu record
  • mapper (Mapper) – a Mapper object for the current record
iterate(element)[source]

Populate dict used for matching

keyer(rec)[source]

Format a value as a standard key to use for matching

Parameters:rec (XMuRecord) – the record to match or match against
Returns:A JSON-encoded string representing the desired fields from the source record
match(match_data, match_once=False)[source]

Match record against the existing record set

Parameters:
  • match_data (dict) – object data
  • match_once (bool) – if true, the record in the match dictionary will be deleted once it is matched
Returns:

Record modified to to include irn if match can be made

minsci.xmu.tools.matcher.rower(rec, cols)[source]

Group data from different fields into rows

minsci.xmu.tools.matcher.standardize_taxon(species)[source]

Standardize formatting of classification to improve matching

minsci.xmu.tools.operations module

Schedules operations using the EMu Operations module

class minsci.xmu.tools.operations.Operation(*args, **kwargs)[source]

Bases: minsci.xmu.xmu.XMu

Contains methods to construct an Operations import file

read_notes(element)[source]

Read notes from EMu record

retired_merged(element, lookup)[source]

Writes import to retire merged records

Parameters:
  • element (etree.XML) – an EMu record as XML
  • lookup (dict) – contains existing notes keyed to irn
minsci.xmu.tools.operations.delete(module, username, irns_to_delete, name='Delete', date=None, delay=0)[source]

Creates an operation to delete a set of records

Parameters:
  • module (str) – the backend name for an EMu module
  • username (str) – the user whose account will be used to import and run the operation
  • irns_to_delete (list) – list of irns to delete
  • name_key (str) – the EMu field name, if any, used to name the operation
  • date (datetime.datetime) – the base date/time for a set of operations
  • delay (int) – the number of seconds to between operations
Returns:

xmu.DeepDict object containing the delete operation

minsci.xmu.tools.operations.merge(module, username, primary, duplicates, mask=None, date=None, delay=0)[source]

Creates an operation to merge a set of duplicates

Parameters:
  • module (str) – the backend name for an EMu module
  • username (str) – the user whose account will be used to import and run the operation
  • primary (int) – the irn of the primary record (i.e., the record to merged the duplicated into)
  • duplicates (list) – list of irns to merge into the primary record
  • name_key (str) – the EMu field name, if any, used to name the operation
  • date (datetime.datetime) – the base date/time for a set of operations
  • delay (int) – the number of seconds to between operations
Returns:

xmu.DeepDict object containing the merge operation

minsci.xmu.tools.operations.retire_merged(merged_path, record_path, output='retire.xml')[source]

Write EMu import to retire merged records

Parameters:
  • merged_path (str) –
  • record_path (str) – path to records(?)
  • output (str) – path to EMu import file containing the retired records
minsci.xmu.tools.operations.write_operation(func, module, username, records, date, outpath='operations.xml')[source]

Writes an EMu import file containing a set of operations

Parameters:
  • func (callable) – the function used to create the operation
  • module (str) – the backend name for an EMu module
  • username (str) – the user whose account will be used to import and run the operation
  • records (list) – a list of records to be operated upon
  • date (mixed) – the date on which to run the operation as either a datetime.datetime object or a parseable date string
  • outpath (str) – the path to which to write the import file

Module contents

Provides tools to work with specific data and operations in EMu

Examples of special cases include embedding metadata in images; writing descriptive titles and captions; working with audits, legacy data, and groups; and scheduling operations.