minsci.xmu.containers package

Submodules

minsci.xmu.containers.auditrecord module

Subclass of DeepDict with methods specific to eaudits

class minsci.xmu.containers.auditrecord.AuditRecord(*args)[source]

Bases: minsci.xmu.containers.xmurecord.XMuRecord

Contains methods for reading data from EMu XML exports

parse_changes(whitelist=None, blacklist=None)[source]

Parse values in the old and new values table in Audits

Parameters:rec (xmu.DeepDict) – data from eaudits
Returns:List of named tuples containing the old and new values
parse_field(field)[source]

Parse values found in a single field in the old/new table

class minsci.xmu.containers.auditrecord.Change(field, old, new)

Bases: tuple

field

Alias for field number 0

new

Alias for field number 2

old

Alias for field number 1

minsci.xmu.containers.bibliorecord module

Subclass of XMuRecord with methods specific to emultimedia

class minsci.xmu.containers.bibliorecord.BiblioRecord(*args, **kwargs)[source]

Bases: minsci.xmu.containers.xmurecord.XMuRecord

Subclass of XMuRecord with methods specific to ebibliography

static clean_reference(ref)[source]

Cleans punctuation, etc. left behind when ref has empty fields FIXME: This is horrible

static format_name(last, first, middle, use_initials=True, mask='{last}, {first} {middle}')[source]

Formats a name according to the given mask

format_reference()[source]

Formats reference according to publication type

get_authors()[source]

Formats a list of author names

get_prefix()[source]

Get prefix based on record type or keys

get_source()[source]

Determines the type of the source/parent publication

is_biblio()[source]

Checks if record is a reference

minsci.xmu.containers.mediarecord module

Subclass of XMuRecord with methods specific to emultimedia

class minsci.xmu.containers.mediarecord.EmbedFromEMu(*args, **kwargs)[source]

Bases: minsci.xmu.tools.multimedia.embedder.Embedder

Tools to embed metadata into a file based on existing data in EMu

get_caption(rec)[source]

Placeholder function returning the caption

Placeholder function returning copyright info

get_creator(rec)[source]

Placeholder function returning the creator

get_credit_line(rec)[source]

Returns short credit line

get_date_created(rec)[source]

Placeholder function returning the date created

get_datetime_created(rec)[source]

Placeholder function returning the full date and time created

get_headline(rec)[source]

Placeholder function returning the headline

get_inventory_numbers(rec)[source]

Returns a list of catalog numbers

get_job_id(rec)[source]

Returns the import identifier

get_keywords(rec)[source]

Returns a list of keywords

get_media_topics(rec)[source]

Returns relevant media topics

get_object_name(rec, mask='include_code')[source]

Returns the photo identifier or list of pictured objects

get_object_numbers(rec)[source]

Returns list of catalog numbers

get_object_sources(rec, source='NMNH-Smithsonian Institution')[source]

Returns list with museum name

get_object_titles(rec)[source]

Returns list of object titles

get_object_urls(rec)[source]

Returns list of object URLs

static get_objects(rec, field='MulTitle')[source]

Returns list of catalog numbers parsed from MulTitle

get_source(rec)[source]

Returns source of the multimedia file

get_special_instructions(rec)[source]

Returns long credit line for special instructions

get_subjects(rec)[source]

Returns media topics for this record

get_time_created(rec)[source]

Placeholder function returning the time created

get_transmission_reference(rec)[source]

Returns the import identifier

set_job_id(job_id)[source]

Sets job id manually for images not imported into EMu yet

class minsci.xmu.containers.mediarecord.MediaFile(irn, filename, path, hash, size, width, height, is_image, row)

Bases: tuple

filename

Alias for field number 1

hash

Alias for field number 3

height

Alias for field number 6

irn

Alias for field number 0

is_image

Alias for field number 7

path

Alias for field number 2

row

Alias for field number 8

size

Alias for field number 4

width

Alias for field number 5

class minsci.xmu.containers.mediarecord.MediaRecord(*args)[source]

Bases: minsci.xmu.containers.xmurecord.XMuRecord

Subclass of XMuRecord with methods specific to emultimedia

add_cataloger(cataloger)[source]

Add a Cataloger instance to the MediaRecord

add_embedder(embedder, **kwargs)[source]

Create an Embedder instance for the MediaRecord

check_filename(primary=True)[source]

Verifies that filename follows best practices

copy_to(path, overwrite=False, verify_image=False)[source]

Copies the primary file to a new location

Parameters:
  • path (str) – the directory to copy the image to
  • overwrite (bool) – specifies whether to overwrite existing file
  • verify_master (bool) – specifies whether to verify copied file
embed_metadata(verify_image=True)[source]

Updates metadata in the primary and supplementary images

fix_filename(fn=None)[source]

Fixes filename to conform with best practices

get_all_media()[source]

Gets the filepaths for all media in this record

get_catalog_numbers(field='MulTitle', **kwargs)[source]

Find catalog numbers in the given field

get_photo_numbers()[source]

Gets the photo number

get_primary()[source]

Gets properties for the primary asset

get_supplementary()[source]

Gets supplementary assets and their basic properites

match(ignore_suffix=False)[source]

Returns list of catalog objects matching data in MulTitle

match_and_fill(strict=True)[source]

Updates record if unique match in catalog found

match_one()[source]

Returns a matching catalog object if exactly one match found

set_default(key)[source]
set_filename(mask)[source]
set_mask(key, mask)[source]
smart_caption()[source]

Derives image caption from catalog

smart_collections()[source]

Populates DetCollectionName_tab based on catalog record

smart_keywords(whitelist=None)[source]

Derives keywords from catalog

smart_note()[source]

Updates note based on catalog record

Populates DetRelation_tab with info about matching catalog records

smart_title()[source]

Derives image title from catalog

strip_derived()[source]

Strips fields derived by EMu from the record

verify_import(images, strict=True, test=False)[source]

Verifies import against path

verify_master(media=None)[source]

Verifies download/copy of master file by comparing hashes

minsci.xmu.containers.minscirecord module

Subclass of XMuRecord with methods specific to Mineral Sciences

class minsci.xmu.containers.minscirecord.MinSciRecord(*args)[source]

Bases: minsci.xmu.containers.xmurecord.XMuRecord

Subclass of XMuRecord with methods specific to Mineral Sciences

antmet = <_sre.SRE_Pattern object>
describe()[source]

Derives a short description of the object suitable for a caption

geotree = None
get_age(pretty_print=True)[source]

Gets geological age as string

get_catalog_number(include_code=True, include_div=False)[source]

Returns the catalog number of the current object

get_catnum(include_code=True, include_div=False)[source]

Returns the catalog number of the current object

get_classification(standardized=True)[source]

Gets classification of object based on record

Parameters:standardized (bool) – if True, use GeoTaxa to try to group and standardize classification terms
Returns:List of classification terms
get_collectors()[source]

Gets all the collector’s field numbers for a record

get_field_numbers()[source]

Gets all the collector’s field numbers for a record

get_guid(kind='EZID', allow_multiple=False)[source]

Gets value from the GUID table for a given key

Parameters:
  • kind (str) – name of GUID
  • allow_multiple (bool) – if False, raises error if multiple values with same type are found
Returns:

First match from the GUID table for the key (if allow_multiple is False) or the full set of matches (if allow_multiple is True)

get_identifier(include_code=True, include_div=False, force_catnum=False)[source]

Derives sample identifier based on record

Parameters:
  • include_code (bool) – specifies whether to include museum code
  • include_div (bool) – specifies whetehr to include division
Returns:

String of NMNH catalog number or Antarctic meteorite number

get_name(taxa=None, force_derived=False)[source]

Derives object name based on record

Parameters:taxa (list) – list of taxa. Determined automatically if omitted.
Returns:String with object name
get_political_geography()[source]

Gets political geographic info for an object

Returns:List of place names in order of decreasing specificity
get_stratigraphy(pretty_print=True)[source]

Gets stratigraphy as string

is_antarctic(metname=None)[source]

Checks if record is an Antarctic meteorite based on regex pattern

summarize()[source]

Derives and formats basic information about an object

minsci.xmu.containers.taxonrecord module

minsci.xmu.containers.xmurecord module

Subclass of DeepDict with methods specific to XMu

class minsci.xmu.containers.xmurecord.Row(irn, field, row, val)

Bases: tuple

field

Alias for field number 1

irn

Alias for field number 0

row

Alias for field number 2

val

Alias for field number 3

class minsci.xmu.containers.xmurecord.XMuRecord(*args)[source]

Bases: minsci.dicts.deepdict.DeepDict

Contains methods for reading data from EMu XML exports

add(path, val, delim='|')[source]
delete_row(key, i)[source]

Deletes the row matching the given index

delete_rows(key, indexes=None, conditions=None)[source]

Deletes any rows matching the given conditions from a table

expand(keep_empty=False)[source]

Expands and verifies a flattened record

finalize(*args, **kwargs)[source]

Runs any functions that require a carryover attribute

get_created_time(timezone_id='US/Eastern', mask=None)[source]

Gets datetime of record creation

get_current_weight(decimal_places=2)[source]

Gets the current weight of the object

Parameters:decimal_places (int) – the number of decimal places to which to round the weight
Returns:Unicode-encoded string with the weight and unit, if any
get_date(date_from, date_to=None, date_format='%Y-%m-%d')[source]

Returns dates and date ranges

Parameters:
  • date_from (mixed) – path to date from field
  • date_to (mixed) – path to date to field
  • date_format (str) – formatting mask for date
Returns:

Date or date range as a string

get_datetime(date_from, date_to=None, date_modifier=None, time_from=None, time_to=None, time_modifier=None, conjunction=' to ', format='%Y%m%dT%H%M%S')[source]
get_guid(kind='EZID', allow_multiple=False)[source]

Gets value from the GUID table for a given key

Parameters:
  • kind (str) – name of GUID
  • allow_multiple (bool) – if False, raises error if multiple values with same type are found
Returns:

First match from the GUID table for the key (if allow_multiple is False) or the full set of matches (if allow_multiple is True)

get_location(current=False, keyword=None)[source]

Returns the current or permanent location of a specimen

get_matching_rows(match, label_field, value_field)[source]

Helper function to find rows in any table matching a kind/label

Parameters:
  • match (str) – the name of the label to match
  • label_field (str) – field in a table containing the label
  • value_field (str) – field in a table containing the value
Returns:

List of values matching the match string

get_modified_time(timezone_id='US/Eastern', mask=None)[source]

Gets datetime of last modification

get_notes(kind)[source]

Return the note matching the given kind

get_paths(rec=None, path=None, paths=None)[source]
get_reference(*args)[source]

Returns a list of values corresponding to the table rows

Parameters:*args – the path to a value in the dictionary, with one component of that path per arg.
Returns:If the last arg is a field (as opposed to a reference table), this function will return a list of values, one per row. If the last arg is a reference table, it will return a list of XMuRecords.
get_rows(*args)[source]

Returns a list of values corresponding to the table rows

Parameters:*args – the path to a value in the dictionary, with one component of that path per arg
Returns:List of values, one per row
get_table(*path)[source]

Returns the table to which the field specified in path belongs

get_url()[source]

Gets the ark link to this record

is_new(found)[source]

Checks if current module:irn exists in found

Parameters:found (dict) – marks irns already found as True
Returns:Boolean expressing if the current record has already been seen

This method can be invoked manually inside the XMu subclass when reading XML exports from a directory containing multiple, potentially overlapping record sets to prevent (a) the same record from being read twice or (b) an older version of a record from overwriting a more recent one.

setdefault(k[, d]) → D.get(k,d), also set D[k]=d if k not in D[source]
simple_pull(path)[source]

Returns data from path in DeepDict

Parameters:path (mixed) – the path to an EMu field as a string or list
Returns:Value for the given path
smart_pull(*args, **kwargs)[source]

Pull data from the record, formatting the result based on the path

Parameters:*args – the path to a value in the dictionary, with one component of that path per arg. If args[0] contains one or more dots, the path will be expanded from that and ignore subsequent args.
Returns:An atomic field returns a string A reference pointing to a single field returns a string A simple table returns a list of values A reference table that specifies a field returns a list A reference table returns a list of XMuRecord objects A nested table returns a list of lists
Return type:Value for the given path, formatted as follows
to_refine()[source]

Maps EMu data to Google Refine

FIXME: Needs to be cleaned up and tested

unwrap()[source]

Removes outermost level of XMuRecord

This simplifies the paths needed to pull data from the record. The record will need to be wrapped again before writing to XML.

Returns:Unwrapped XMuRecord. In a typical use case, this means the paths used to retrieve data do not need to include the module name.
verify()[source]
wrap(module)[source]

Wraps the XMuRecord with name of module

Parameters:module (str) – name of module to use as key
Returns:Wrapped XMuRecord. In a typical use case, this means the paths used to retrieve data need to include the module name.
zip(*args)[source]

Zips the set of lists, padding each list to the max length

minsci.xmu.containers.xmurecord.standardize(val)[source]

Standardize the format of a value

Module contents

Defines module-specific containers for working with EMu data