Originally published November 13, 2019
Brightspace Data Sets come in two different extraction types: Differential and Full. If you are new to working with Brightspace Data Sets you may be confused about which type of data set you need to be working with, or even what the different use cases are for each type. In this post we aim to clarify some misconceptions and frequently asked questions around working with these different extraction types.
How are full and differential data sets similar?
- Both types of data sets have the same data schema and are generated on a schedule.
- They both are available for download as zipped csv files.
- All the fields of data (columns) and named files (e.g. User Logins) that are available in one extraction type are also available in the other.
- Both follow the same change management policy and schedule found here.
How are the Full and Differential data sets different?
Full Data Sets
- ‘Snapshots’ of data at the time the full data set is generated
- Intended to replace a previous full data set file; not used in conjunction with prior fulls
- Generates weekly by default, daily with paid upgrade
- Constrained to 150 million rows of the most recent data
Differential Data Sets
- ‘Record of changes’ to the data since the previous differential file was generated
- Intended to be added to (by inserting or updating) a data store containing previously extracted data
- Generates daily by default, hourly with paid upgrade
- Given the limited time frame within which differentials are extracted, should never reach a row count limitation
The files are all string data, but the Core Admin Analytics Guide says they are supposed to be different data types, do I have the right files?
Yes, you do! We provide the data to you in string/text format so that it can be used with almost any tool or system. You will need to convert the data from string fields to the correct data format before use.
If I only need to report weekly or less frequently, is there any reason to use differential files?
Generally, no - if you don’t need the frequency of differential extraction type files then you can simply work with full extraction type files. However, the exception might be if you need to turn around a report in a very short time period – e.g. by the 5th of the month for data from the previous month – and the schedule for the Full doesn’t align.
You may also find, depending on your reporting environment and the activity you have in Brightspace, that the full data sets become unwieldy to download and process and it is easier to work with the differential files.
Is the data the same in the Full Data Set and the Differential Data Set?
For most purposes the data is the same, but there are two situations where the data you have can differ between extraction types. The difference is because of the delivery timing and the difference in data generation (record of changes since the last extract vs snapshot of the current state). Because the Differential files are delivered more frequently than Full files, you will see fresher data in them then the most recent full data set unless they were generated at the same time. Additionally, because the differential files are all the changes that happened within a given time, you could see a value – e.g. a user’s first name - be changed, then changed back to the original value in the differential data set while the next Full data set generated would only show the most recent change.
Can I use the Differential files but not the Full files?
Generally, yes, with 2 exceptions. First you will need an initial import of the Full data set to overlay the changes in the differential files on. If you don’t do this, you will miss rows in datasets that do not change. For example if you have 1,000 users in your system but only use the differential files, you will see the 200 new students enrolling for the first time this term, and the 50 students that had a change to their name or information since the you began collecting data, but you will be missing the data of all the individuals that remain in your system from the previous term with no changes to their information. The second exception would be files that do not have a soft delete field in their structure. A soft delete field is a column within the extract that indicates if the row has been deleted. Extracts that have hard deletes mean the row is just removed from the extract entirely. For those files you would only know they have been deleted by referencing the Full extraction file – hard delete actions are not in the Differential extraction file, only inserts and updates are included. In this case, the soft delete is considered an update.
How do I know everything you just said above won’t change tomorrow and break my process? Do I need to keep checking all the files even the ones I’m not using just to make sure I’m not missing any data I may want?
Don’t worry! We have committed to a change management policy for the data sets that should give you plenty of time to adjust to coming changes. You can find more information here.
It is also useful to follow the Brightspace Data Sets Updates discussion on Brightspace Community so you will be notified when new versions are available.
If you have more questions about the Brightspace Data Sets extraction types that we didn’t address in this article, please comment below!
Need further help working with data? Stuck on any of the above concepts? Lacking one of the skill sets needed? D2L provides flexible levels of Data Solutions Consulting services that can assist with any or all steps in the process detailed in this post. If you are interested, please contact your D2L Customer Success representative.