Written by Mark Tse, Software Developer, Platform Development D2L
Starting in 10.7.5, administrators can schedule differential extracts in addition to full extracts, allowing for higher frequency reporting. These differential extracts are generated at a set interval, and contains data newly available since the previous differential extracts were generated.
In this article, we will build upon the code sample in Brightspace Data Sets - Headless (Non-Interactive) Client Example to show how to leverage the application APIs to keep a database updated with differential extracts. The source code in this article is available on GitHub.
API Differences
To accommodate differential extracts, the code sample has been modified to leverage /d2l/api/lp/(version)/dataExport/bds, which was introduced in 10.7.5. This route returns a list of all available (full and differential) data set extracts and any corresponding metadata, including DownloadLink
, which points to the location of a given data set.
Changes Overview
To demonstrate how to process both full and differential extracts, the code sample was augmented to support the User Enrollments data set. The schema files are found in the schema folder.
The biggest change to the code sample is determining which data set to get, and how to get the data set. The code that does an upsert (insert or update) to the database is reusable and remains unchanged.
Static Data
We added the User Enrollments full extract to the existing full data set list and a new list to store the differential extracts of interest.
DataSetMetadata = collections.namedtuple('DataSetMetadata', ['plugin', 'table'])...FULL_DATA_SET_METADATA = [ ... DataSetMetadata( plugin = '533f84c8-b2ad-4688-94dc-c839952e9c4f', table = 'user_enrollments' )]DIFF_DATA_SET_METADATA = [ DataSetMetadata( plugin = 'a78735f2-7210-4a57-aac1-e0f6bd714349', table = 'user_enrollments' )]
Helper Methods
Because we will be making authenticated calls to Brightspace multiple times, the following helper function was added:
def get_with_auth(endpoint, access_token): headers = {'Authorization': 'Bearer {}'.format(token_response['access_token'])} response = requests.get(endpoint, headers=headers) if response.status_code != 200: logger.error('Status code: %s; content: %s', response.status_code, response.text) response.raise_for_status() return response
The code to extract the zip file and update the database was put in its own helper method for readability:
def unzip_and_update_db(response_content, db_conn_params, table): with io.BytesIO(response_content) as response_stream: with zipfile.ZipFile(response_stream) as zipped_data_set: files = zipped_data_set.namelist() assert len(files) == 1 csv_name = files[0] with zipped_data_set.open(csv_name) as csv_data: update_db(db_conn_params, table, csv_data)
Downloading the Data Sets
First, we added an option to toggle between downloading full or differential extracts:
parser = argparse.ArgumentParser(description='Script for downloading data sets.')parser.add_argument( '--differential', action='store_true', help='Use differential data sets instead of full data sets')args = parser.parse_args()
We then added a method to get a mapping of all data sets and their locations using the list data set route as mentioned above:
def get_plugin_link_mapping(config, access_token): data_sets = [] next_page_url = '{bspace_url}/d2l/api/lp/{lp_version}/dataExport/bds'.format( bspace_url=config['bspace_url'], lp_version=API_VERSION ) while next_page_url is not None: list_response = get_with_auth(next_page_url, access_token) list_json = list_response.json() data_sets += list_json['BrightspaceDataSets'] next_page_url = list_json['NextPageUrl'] return { d['PluginId']: d['DownloadLink'] for d in data_sets }...plugin_to_link = get_plugin_link_mapping(config, token_response['access_token'])
Based on the command line argument, we added code to get the download links for all the full or differential extracts we are interested in:
# args.differential is true if `--differential` is provided as an argumentdata_set_metadata = DIFF_DATA_SET_METADATA if args.differential else FULL_DATA_SET_METADATAplugin_to_link = get_plugin_link_mapping(config, token_response['access_token'])db_conn_params = { 'host': config['dbhost'], 'dbname': config['dbname'], 'user': config['dbuser'], 'password': config['dbpassword']}for plugin, table in data_set_metadata: response = get_with_auth( endpoint=plugin_to_link[plugin], access_token=token_response['access_token'] ) unzip_and_update_db(response.content, db_conn_params, table)
The unchanged database update code will properly do an upsert (insert or update), regardless of the data source (full or differential extract).
Scheduling
This code sample does not include functionality to recover from a missed differential extract. Since differential extracts only contain data starting from the previous differential extract, a missed differential extract would mean a data gap in the target database, even if the next differential extract was processed successfully. For this reason, this code sample should be scheduled at the same cadence as the Brightspace Data Sets Differentials
scheduled task to avoid missing a differential extract.
Recovering from missed differential extracts is currently out-of-scope of the code sample. However, a simple mitigation would be to target the same database with both full and differential extracts, which will ensure any data gaps from missed differential extracts will be filled when the next daily extract is processed. Alternatively, the code sample could be extended to look at all the available differential extracts in the PreviousDataSets
field on each run to determine if it has missed any extracts.
Final Comments
By leveraging the routes introduced in 10.7.5, and with some minor adjustments to our sample code, we have demonstrated how to programmatically keep a database updated with both full and differential extracts. Feel free to post any questions or comments below!
Code formatting by
http://markup.su/highlighter/
.