Project Website: https://ahankinson.github.io/pybagit
Code hosting: https://github.com/ahankinson/pybagit
BagIt 0.96 Spec: http://www.digitalpreservation.gov/library/resources/tools/docs/bagitspec.pdf
This module simplifies creation and management of BagIt files. Version 1.0 of this module conforms to the BagIt v0.96 specification.
This is a "loosely monitored" bag system. The constructor simply specifies a folder to monitor. Files can be added and removed from the data directory at will, and the update() method for that bag will automatically checksum and add all the appropriate files to the manifest files.
See the examples directory in the source distribution for more information.
pybagit.bagit.BagIt(self, bag, validate=False, extended=True, fetch=False)
A bag instance is constructed by passing in either:
package() method on this bag once you are finished to save it to a permanent location once you have finished.
Optional parameters are:
validate: If True, will run validate on the bag to check it for errors. Note that this will run the checksum on all files, so it may take a while if there are many files or they are especially big. Default is Falseextended: If True it will ensure that the optional 'bag-info.txt', 'fetch.txt' and 'tagmanifest-(sha1|md5).txt' files are created. Default is Truefetch: If True it will download all the files specified in the 'fetch.txt' file. Default is False.
import pybagit.bagit as b
# path to non-existent folder. The bag structure will be created with the name "folder".
bag = b.BagIt('/path/to/non/existent/folder')
# path to an existing folder. Upon running the update() method it will ensure that the bag is valid in the 'file'
bag = b.BagIt('/path/to/existing/folder')
# path to an existing compressed file. 'tgz' and 'zip' are supported.
bag = b.BagIt('/path/to/file.tgz')
# optional parameters.
# Creates a minimal bag instance.
bag = b.BagIt('/path/to/folder', extended=False)
# Validates a bag's structure
bag = b.BagIt('/path/to/folder', validate=True)
# Fetches files
bag = b.BagIt('/path/to/folder', fetch=True)
instance.is_valid()Returns True if there are no validation errors reported.
instance.is_extended()Returns True if the bag contains the optional files 'bag-info.txt', 'fetch.txt', or 'tagmanifest-(sha1|md5).txt'
instance.get_bag_info()Returns a dictionary containing:
version: The Bag's version, as reported in 'bagit.txt'encoding: The Bag's tag files encoding, e.g. 'utf-8'hash: The Bag's checksum algorightm, e.g. 'sha1'instance.get_data_directory()Returns the absolute path to the bag's data directory.
instance.get_hash_encoding()Returns the bag's checksum encoding scheme.
instance.set_hash_encoding(algorithm)Sets the bag's checksum hash algorithm. Must be either sha1 or md5.
instance.show_bag_info()Prints a number of bag properties, e.g.
Bag Version: 0.96
Tag File Encoding: utf-8
Bag Type: Extended
Manifest Contents
CHECKSUMS FILENAMES
------------------------------------------------------------------------------
422936e456d17cf39341598ca377cbc5dd049364 data/subdir/subsubdir4/testfile.txt
ab2903263a468a76615a363cea207b0bc87b0449 data/subdir/subsubdir1/testfile.txt
05b920f51b05bb135c9e9f2d69650384216e7cdc data/subdir/subsubdir3/testfile.txt
0b3394596f8f1429efa019016207ad9b61d1c573 data/subdir/subsubdir2/testfile.txt
Tag Manifest Contents
CHECKSUMS FILENAMES
------------------------------------------------------------------------------
b73a8865fbaa633ebeb283828d7986420135db47 manifest-sha1.txt
da39a3ee5e6b4b0d3255bfef95601890afd80709 bag-info.txt
da39a3ee5e6b4b0d3255bfef95601890afd80709 fetch.txt
a7b95616bf7307fc398baeb04ce60a88ed370f51 bagit.txt
Bag Errors
--------------------------------------------------
No Errors.
instance.get_bag_contents()Returns a list with the absolute paths of all the files in the data directory.
instance.get_bag_errors(validate=False)Returns a list of all the bag errors. If validate=True it will run the validate() method to verify the integrity first.
instance.validate()Runs the bag validator on the contents of the bag. This method:
instance.update()This method is used whenever something has been added or removed from the bag. It contains many sub-processes for ensuring a valid bag is produced:
instance.fetch(validate_downloads=False)Downloads every entry in the fetch.txt file. If validate_downloads=True it will run the update() and validate() methods automatically.
instance.add_fetch_entries(fetch_entries, append=True)Writes new entries to the fetch.txt file. fetch_entries is a list containing the URL and the path relative to the data directory for the file. For example:
entries = [{'url':'http://www.example.com/path/to/file1.txt',
'filename':'data/path/to/file.txt'},
{'url':'http://www.example.com/path/to/file2.txt',
'filename':'data/another/path/for/file.txt'}]
bag.add_fetch_entries(entries)
If append=False the whole fetch.txt file is overwritten with the entries; if append=True it will be added to the existing entries.
instance.package(destination, method="tgz")Compresses a bag and copies it to destination. method can be either "tgz" (default) or "zip". Note: Files with tgz compression are saved with the ".tgz" extension, not a tar.gz extension.
Note: These properties are exposed for convenience, but you should not set any properties by changing these. Properties that can be changed have appropriate set_...() methods. (c.f. set_hash_encoding()). All others are maintained by the state of the files in the bag itself and are verified when update() is called.
instance.bag_directoryAbsolute path to the bag directory
instance.extendedTrue if the bag is extended; False if not.
instance.hash_encodingsha1 or md5. Default is sha1
instance.bag_major_versionReturns the major version number as declared in bagit.txt. Any files created by this package will be default to v0.96, so this property would return 0.
instance.bag_minor_versionReturns the minor version number as declared in bagit.txt. Any files created by this package will be default to v0.96, so this property would return 96.
instance.tag_file_encodingReturns the tag file encoding as declared in bagit.txt. Any files created by this package will default to utf-8.
instance.data_directoryAbsolute path to the bag's data directory.
instance.bagit_fileAbsolute path to the bagit.txt file.
instance.manifest_fileAbsolute path to the manifest-(sha1|md5).txt file.
instance.tag_manifest_fileAbsolute path to the tagmanifest-(sha1|md5).txt file. None if it doesn't exist.
instance.fetch_fileAbsolute path to the fetch.txt file. None if it doesn't exist.
instance.baginfo_fileAbsolute path to the bag-info.txt file. None if it doesn't exist.
instance.manifest_contentsA dictionary containing the manifest file contents.
instance.tag_manifest_contentsA dictionary containing the tagmanifest file contents
instance.fetch_contentsA dictionary containing the fetch.txt file contents
instance.baginfo_contentsA dictionary containing the bag-info.txt file contents
instance.bag_compressionIf the bag originates from a compressed file, this is set to either tgz or zip. None if the bag is not compressed. Note: This property does not determine the subsequent bag compression format. If the file originates from a zipfile and package() is called without method="zip" it will return a .tgz file.
instance.bag_errorsA list of all bag validation errors. All errors are in tuple format, e.g.:
('data/path/to/file.txt', 'Incorrect filename or checksum in manifest')