PyBagIt 1.5 Documentation

Project Website: https://ahankinson.github.io/pybagit

Code hosting: https://github.com/ahankinson/pybagit

BagIt 0.96 Spec: http://www.digitalpreservation.gov/library/resources/tools/docs/bagitspec.pdf

This module simplifies creation and management of BagIt files. Version 1.0 of this module conforms to the BagIt v0.96 specification.

This is a "loosely monitored" bag system. The constructor simply specifies a folder to monitor. Files can be added and removed from the data directory at will, and the update() method for that bag will automatically checksum and add all the appropriate files to the manifest files.

See the examples directory in the source distribution for more information.

Constructor

pybagit.bagit.BagIt(self, bag, validate=False, extended=True, fetch=False)

A bag instance is constructed by passing in either:

Optional parameters are:

Example

    import pybagit.bagit as b
    
    # path to non-existent folder. The bag structure will be created with the name "folder".
    bag = b.BagIt('/path/to/non/existent/folder')
    
    # path to an existing folder. Upon running the update() method it will ensure that the bag is valid in the 'file'
    bag = b.BagIt('/path/to/existing/folder')
    
    # path to an existing compressed file. 'tgz' and 'zip' are supported.
    bag = b.BagIt('/path/to/file.tgz')
    
    # optional parameters. 
    # Creates a minimal bag instance.
    bag = b.BagIt('/path/to/folder', extended=False)
    
    # Validates a bag's structure
    bag = b.BagIt('/path/to/folder', validate=True)
    
    # Fetches files
    bag = b.BagIt('/path/to/folder', fetch=True)

Public Methods

instance.is_valid()

Returns True if there are no validation errors reported.

instance.is_extended()

Returns True if the bag contains the optional files 'bag-info.txt', 'fetch.txt', or 'tagmanifest-(sha1|md5).txt'

instance.get_bag_info()

Returns a dictionary containing:

instance.get_data_directory()

Returns the absolute path to the bag's data directory.

instance.get_hash_encoding()

Returns the bag's checksum encoding scheme.

instance.set_hash_encoding(algorithm)

Sets the bag's checksum hash algorithm. Must be either sha1 or md5.

instance.show_bag_info()

Prints a number of bag properties, e.g.

Bag Version: 0.96
Tag File Encoding: utf-8
Bag Type: Extended

                    Manifest Contents
CHECKSUMS                                 FILENAMES
    ------------------------------------------------------------------------------
422936e456d17cf39341598ca377cbc5dd049364  data/subdir/subsubdir4/testfile.txt
ab2903263a468a76615a363cea207b0bc87b0449  data/subdir/subsubdir1/testfile.txt
05b920f51b05bb135c9e9f2d69650384216e7cdc  data/subdir/subsubdir3/testfile.txt
0b3394596f8f1429efa019016207ad9b61d1c573  data/subdir/subsubdir2/testfile.txt

                    Tag Manifest Contents
CHECKSUMS                                 FILENAMES
    ------------------------------------------------------------------------------
b73a8865fbaa633ebeb283828d7986420135db47  manifest-sha1.txt
da39a3ee5e6b4b0d3255bfef95601890afd80709  bag-info.txt
da39a3ee5e6b4b0d3255bfef95601890afd80709  fetch.txt
a7b95616bf7307fc398baeb04ce60a88ed370f51  bagit.txt

Bag Errors
--------------------------------------------------
No Errors.

instance.get_bag_contents()

Returns a list with the absolute paths of all the files in the data directory.

instance.get_bag_errors(validate=False)

Returns a list of all the bag errors. If validate=True it will run the validate() method to verify the integrity first.

instance.validate()

Runs the bag validator on the contents of the bag. This method:

instance.update()

This method is used whenever something has been added or removed from the bag. It contains many sub-processes for ensuring a valid bag is produced:

instance.fetch(validate_downloads=False)

Downloads every entry in the fetch.txt file. If validate_downloads=True it will run the update() and validate() methods automatically.

instance.add_fetch_entries(fetch_entries, append=True)

Writes new entries to the fetch.txt file. fetch_entries is a list containing the URL and the path relative to the data directory for the file. For example:

    entries = [{'url':'http://www.example.com/path/to/file1.txt', 
            'filename':'data/path/to/file.txt'},    
            {'url':'http://www.example.com/path/to/file2.txt', 
            'filename':'data/another/path/for/file.txt'}]
    
    bag.add_fetch_entries(entries)

If append=False the whole fetch.txt file is overwritten with the entries; if append=True it will be added to the existing entries.

instance.package(destination, method="tgz")

Compresses a bag and copies it to destination. method can be either "tgz" (default) or "zip". Note: Files with tgz compression are saved with the ".tgz" extension, not a tar.gz extension.

Public Properties

Note: These properties are exposed for convenience, but you should not set any properties by changing these. Properties that can be changed have appropriate set_...() methods. (c.f. set_hash_encoding()). All others are maintained by the state of the files in the bag itself and are verified when update() is called.

instance.bag_directory

Absolute path to the bag directory

instance.extended

True if the bag is extended; False if not.

instance.hash_encoding

sha1 or md5. Default is sha1

instance.bag_major_version

Returns the major version number as declared in bagit.txt. Any files created by this package will be default to v0.96, so this property would return 0.

instance.bag_minor_version

Returns the minor version number as declared in bagit.txt. Any files created by this package will be default to v0.96, so this property would return 96.

instance.tag_file_encoding

Returns the tag file encoding as declared in bagit.txt. Any files created by this package will default to utf-8.

instance.data_directory

Absolute path to the bag's data directory.

instance.bagit_file

Absolute path to the bagit.txt file.

instance.manifest_file

Absolute path to the manifest-(sha1|md5).txt file.

instance.tag_manifest_file

Absolute path to the tagmanifest-(sha1|md5).txt file. None if it doesn't exist.

instance.fetch_file

Absolute path to the fetch.txt file. None if it doesn't exist.

instance.baginfo_file

Absolute path to the bag-info.txt file. None if it doesn't exist.

instance.manifest_contents

A dictionary containing the manifest file contents.

instance.tag_manifest_contents

A dictionary containing the tagmanifest file contents

instance.fetch_contents

A dictionary containing the fetch.txt file contents

instance.baginfo_contents

A dictionary containing the bag-info.txt file contents

instance.bag_compression

If the bag originates from a compressed file, this is set to either tgz or zip. None if the bag is not compressed. Note: This property does not determine the subsequent bag compression format. If the file originates from a zipfile and package() is called without method="zip" it will return a .tgz file.

instance.bag_errors

A list of all bag validation errors. All errors are in tuple format, e.g.:

    ('data/path/to/file.txt', 'Incorrect filename or checksum in manifest')