Data sprawl is an issue that affects all types of companies. It takes smart people away from working on impactful projects and makes them spend time managing the data and systems that support those projects. This is an expensive and time-consuming task that can be avoided with some careful planning and a few simple best practices.
Data Sprawl is when you have a large number of different data storage systems that have been created over time to serve many different purposes. These storage systems are often independent of each other and do not use consistent metadata or schemas. This makes it difficult to search and find the information you need. It also increases the risk of outages and data loss due to a failure in one system affecting multiple data stores.
The sgpData dataset includes a set of anonymized student-instructor lookup tables. These tables provide student identifiers and instructor numbers associated with each student’s assessment records. These tables also include the scale scores associated with each student’s test record for each of the past five years.
In addition, sgpData contains a set of tables that allow users to view the SGP score for a given student. SGP is a measure of relative student growth that compares a student’s performance to the average score of their academic peers nationwide. SGP is based on a combination of the student’s previous test scores and the results of the current assessment.
SGP is reported as a percentage, so higher SGP scores indicate that a student has grown more than or at least as much as their academic peers. Similarly, lower SGP scores indicate that students have grown less than or at least as little as their academic peers. The SGP data is available in two formats: window specific SGP and current SGP.
Microsoft may copy Customer Data between regions within a Geo for redundancy and operational purposes, including backups, replication, service traces, and other similar activities. Additionally, Microsoft personnel located outside of a Geo may remotely operate Customer Data processing systems in that Geo. However, such personnel will not access Customer Data without authorization.
This vignette introduces a new data format that enables teachers and administrators to view student growth trends over time. The sgpData dataset contains an anonymized, student-instructor lookup table and a set of 5 tables that contain the scale scores from each of the last five years. Each table is sorted by the unique student id. The first column, ID, provides the student id; and the remaining columns, SS_2013, SS_2014, SS_2015, SS_2016, and SS_2017, provide each students’ assessment scores from each of these years. The sgpData dataset is available in the Data Explorer. Please consult the SGP data analysis vignette for more detailed documentation on how to use this data. If you have a topic request for a future data analysis vignette, please write or open an issue on GitHub.