Project Website: https://ahankinson.github.io/pybagit
Code hosting: https://github.com/ahankinson/pybagit
BagIt 0.96 Spec: http://www.digitalpreservation.gov/library/resources/tools/docs/bagitspec.pdf
This module simplifies creation and management of BagIt files. Version 1.0 of this module conforms to the BagIt v0.96 specification.
This is a "loosely monitored" bag system. The constructor simply specifies a folder to monitor. Files can be added and removed from the data directory at will, and the update()
method for that bag will automatically checksum and add all the appropriate files to the manifest files.
See the examples
directory in the source distribution for more information.
pybagit.bagit.BagIt(self, bag, validate=False, extended=True, fetch=False)
A bag instance is constructed by passing in either:
package()
method on this bag once you are finished to save it to a permanent location once you have finished.
Optional parameters are:
validate
: If True
, will run validate on the bag to check it for errors. Note that this will run the checksum on all files, so it may take a while if there are many files or they are especially big. Default is False
extended
: If True
it will ensure that the optional 'bag-info.txt', 'fetch.txt' and 'tagmanifest-(sha1|md5).txt' files are created. Default is True
fetch
: If True
it will download all the files specified in the 'fetch.txt' file. Default is False
.
import pybagit.bagit as b # path to non-existent folder. The bag structure will be created with the name "folder". bag = b.BagIt('/path/to/non/existent/folder') # path to an existing folder. Upon running the update() method it will ensure that the bag is valid in the 'file' bag = b.BagIt('/path/to/existing/folder') # path to an existing compressed file. 'tgz' and 'zip' are supported. bag = b.BagIt('/path/to/file.tgz') # optional parameters. # Creates a minimal bag instance. bag = b.BagIt('/path/to/folder', extended=False) # Validates a bag's structure bag = b.BagIt('/path/to/folder', validate=True) # Fetches files bag = b.BagIt('/path/to/folder', fetch=True)
instance.is_valid()
Returns True
if there are no validation errors reported.
instance.is_extended()
Returns True
if the bag contains the optional files 'bag-info.txt', 'fetch.txt', or 'tagmanifest-(sha1|md5).txt'
instance.get_bag_info()
Returns a dictionary containing:
version
: The Bag's version, as reported in 'bagit.txt'encoding
: The Bag's tag files encoding, e.g. 'utf-8'hash
: The Bag's checksum algorightm, e.g. 'sha1'instance.get_data_directory()
Returns the absolute path to the bag's data directory.
instance.get_hash_encoding()
Returns the bag's checksum encoding scheme.
instance.set_hash_encoding(algorithm)
Sets the bag's checksum hash algorithm. Must be either sha1
or md5
.
instance.show_bag_info()
Prints a number of bag properties, e.g.
Bag Version: 0.96 Tag File Encoding: utf-8 Bag Type: Extended Manifest Contents CHECKSUMS FILENAMES ------------------------------------------------------------------------------ 422936e456d17cf39341598ca377cbc5dd049364 data/subdir/subsubdir4/testfile.txt ab2903263a468a76615a363cea207b0bc87b0449 data/subdir/subsubdir1/testfile.txt 05b920f51b05bb135c9e9f2d69650384216e7cdc data/subdir/subsubdir3/testfile.txt 0b3394596f8f1429efa019016207ad9b61d1c573 data/subdir/subsubdir2/testfile.txt Tag Manifest Contents CHECKSUMS FILENAMES ------------------------------------------------------------------------------ b73a8865fbaa633ebeb283828d7986420135db47 manifest-sha1.txt da39a3ee5e6b4b0d3255bfef95601890afd80709 bag-info.txt da39a3ee5e6b4b0d3255bfef95601890afd80709 fetch.txt a7b95616bf7307fc398baeb04ce60a88ed370f51 bagit.txt Bag Errors -------------------------------------------------- No Errors.
instance.get_bag_contents()
Returns a list with the absolute paths of all the files in the data directory.
instance.get_bag_errors(validate=False)
Returns a list of all the bag errors. If validate=True
it will run the validate()
method to verify the integrity first.
instance.validate()
Runs the bag validator on the contents of the bag. This method:
instance.update()
This method is used whenever something has been added or removed from the bag. It contains many sub-processes for ensuring a valid bag is produced:
instance.fetch(validate_downloads=False)
Downloads every entry in the fetch.txt file. If validate_downloads=True
it will run the update()
and validate()
methods automatically.
instance.add_fetch_entries(fetch_entries, append=True)
Writes new entries to the fetch.txt file. fetch_entries
is a list containing the URL and the path relative to the data directory for the file. For example:
entries = [{'url':'http://www.example.com/path/to/file1.txt', 'filename':'data/path/to/file.txt'}, {'url':'http://www.example.com/path/to/file2.txt', 'filename':'data/another/path/for/file.txt'}] bag.add_fetch_entries(entries)
If append=False
the whole fetch.txt file is overwritten with the entries; if append=True
it will be added to the existing entries.
instance.package(destination, method="tgz")
Compresses a bag and copies it to destination
. method
can be either "tgz" (default) or "zip". Note: Files with tgz compression are saved with the ".tgz" extension, not a tar.gz extension.
Note: These properties are exposed for convenience, but you should not set any properties by changing these. Properties that can be changed have appropriate set_...()
methods. (c.f. set_hash_encoding()
). All others are maintained by the state of the files in the bag itself and are verified when update()
is called.
instance.bag_directory
Absolute path to the bag directory
instance.extended
True
if the bag is extended; False
if not.
instance.hash_encoding
sha1
or md5
. Default is sha1
instance.bag_major_version
Returns the major version number as declared in bagit.txt. Any files created by this package will be default to v0.96, so this property would return 0.
instance.bag_minor_version
Returns the minor version number as declared in bagit.txt. Any files created by this package will be default to v0.96, so this property would return 96.
instance.tag_file_encoding
Returns the tag file encoding as declared in bagit.txt. Any files created by this package will default to utf-8.
instance.data_directory
Absolute path to the bag's data directory.
instance.bagit_file
Absolute path to the bagit.txt file.
instance.manifest_file
Absolute path to the manifest-(sha1|md5).txt file.
instance.tag_manifest_file
Absolute path to the tagmanifest-(sha1|md5).txt file. None
if it doesn't exist.
instance.fetch_file
Absolute path to the fetch.txt file. None
if it doesn't exist.
instance.baginfo_file
Absolute path to the bag-info.txt file. None
if it doesn't exist.
instance.manifest_contents
A dictionary containing the manifest file contents.
instance.tag_manifest_contents
A dictionary containing the tagmanifest file contents
instance.fetch_contents
A dictionary containing the fetch.txt file contents
instance.baginfo_contents
A dictionary containing the bag-info.txt file contents
instance.bag_compression
If the bag originates from a compressed file, this is set to either tgz
or zip
. None
if the bag is not compressed. Note: This property does not determine the subsequent bag compression format. If the file originates from a zipfile and package()
is called without method="zip"
it will return a .tgz file.
instance.bag_errors
A list of all bag validation errors. All errors are in tuple format, e.g.:
('data/path/to/file.txt', 'Incorrect filename or checksum in manifest')