What is a Soft-Delete Indicator?
This is a column on the data set that indicates that the item has been deleted in the interface and will no longer appear in the Brightspace UI. In the context of differentials, it is a way of indicating that the row is no longer in use and potentially can be removed from your data store. For more information, refer to the Introducing Soft-Deletes for the Brightspace Data Sets article.
The Dilemma: Can you only use differential files when there is no soft-delete flag?
Many Brightspace Data Sets have a soft-delete flag, however, some do not. Ideally, to minimize volume and processing time, it would be more efficient to use only differential files (and not use the full files). However, differential files only include inserts and updates and therefore do not include the rows that have been hard deleted since the previous run. These deletions can be confirmed using full files, however there are some strategies you can adopt to allow you to capture the deleted rows or unnecessary rows using only differential files even when there is no soft-delete indicator on the data set itself.
For the following case studies showing the strategies available, we will be using Content User Progress as our target data set. Content User Progress is often a very large data set as it has all the progress of all users against all content and it does not include a soft-delete field. It is often unwieldy to work with as a full file and can be a massive table when stored.
Strategy 1: Inheriting the soft-delete from a dependent (base-object) dataset
Often, when a dataset does not have a soft-delete flag, there is a related dataset that does which you can use identify which rows in the target dataset are actually deleted. Important to remember here is the relationship between the two data sets, the data set with soft-deletes should be a prerequisite for any updates in the other data set. For example, in the example of Content Objects and Content User Progress below, you can no longer have any progress related to content once the content object has been deleted.
When using Content User Progress in a report, often the Content Objects data set is used as well to get the title of the content, to group by module, or to link to Organizational Units to get the course information. Content Objects data set contains a soft-delete indicator (IsDeleted), and likely if the content has been deleted, the historical progress is no longer relevant for any analysis or reporting and should be excluded. This process could also be automated as a stored procedure to run periodically and remove any rows from Content User Progress where the Content Objects row is indicated to be deleted.
Strategy 2: Treating rows as deleted where enrollments or roles make the data unnecessary
When looking at the course level, it may be prudent to remove rows that are not the type of data you are looking to analyze. For example, Brightspace treats all users the same in the data, but you may only want to report on student actions or specifically those students that are currently in a course and not need to store other users’ actions. In this case, it may be useful to regularly remove rows corresponding to actions for individuals that are not your target. A note of caution here – if your policies allow individuals to change roles within the same course shell, or to re-enroll after being withdrawn, this may not be correct strategy to use as you may be removing rows you might actually want in the future.
When using Content User Progress, you may want to join to either Enrollments and Withdrawals or User Enrollments to get the current enrollment status for a user and their role so you will have enough information to determine if the row is worth maintaining.
Strategy 3: Keeping the data regardless of status
In some cases, it may not be worth it to use the previous strategies to purge the data or continue using the full data sets. For example, the data set Role Details does not contain a soft-delete field. However, you are likely to change roles rarely, and delete them even more rarely so there will be few rows and few updates. Also, leaving a row there that has been deleted in the interface does not negatively impact any reporting logic as that role will no longer be used in the enrollments and so will be filtered out of any resulting current state reporting. It is definitely worth it to focus on optimizing the larger data sets to minimize storage and use of full files, but at some point the cost benefit analysis will favor leaving the smaller data sets without deletes or continuing to pull down the full data sets periodically to verify the deletes. It is important to know where that line exists in your organization and make decisions accordingly.
Conclusion
With the expectation that your ultimate goal is to move entirely away from using the full data sets and to rely only on the differentials, the soft-delete indicators get you partially there, and the strategies above can get you even closer. Ultimately, minimizing the data you are downloading, processing, and storing not only saves you money it likely gets the reports your users need into their hands faster. Also important to keep in mind, is that some data sets will never have a soft-delete field because they are logs and so their rows will not be deleted. With some careful thought about what data needs to be retained and why, you can optimize your data processing and minimize the storage space you need for Brightspace data without sacrificing any of your reporting capabilities.