What if growth mattered even more on STAR? And demographics less?

Screen Shot 2020-12-10 at 3.24.45 PM.png

When OSSE released the new DC STAR report card, we jumped straight to analyzing how schools performed with certain student groups and on the growth metrics because we had a strong feeling STAR results would be correlated with socioeconomics due to 30 of the 95 possible points being PARCC achievement. Turns out, STAR scores are correlated with percent of at-risk students as OSSE showed in their STAR research brief, and as we discussed in another post, demographics served make up between 20-45% of a school's STAR score.

That sounds like a lot, but there are a couple positives when it comes to STAR compared to other state school accountability systems and even the prior DC system. First, the correlation of the new STAR framework and demographics is significantly lower than DC's prior NCLB-era accountability that exclusively relied on math and reading DC CAS/PARCC achievement. STAR also includes several growth measures with limited-to-no correlation with demographics, including median growth percentile, growth to proficiency, attendance growth, and English learner growth. Most states only offer one of these metrics in their systems. Lastly, STAR calculates an at-risk student subscore for every school where floors and targets are based on low-income student performance across the city.

While we agree that STAR is a significant improvement and preferable to many other states' systems, we still like our accountability tools to focus more on growth, especially for schools serving high poverty populations. The last bit of good news is that OSSE made all of the STAR data publicly available to let us explore additional score options, ones focused more on growth and adjusted for demographics served. The rest of this data-heavy, wonk-tastic blog examines how school 2018 STAR scores would change based on various models utilizing different metric and subgroup weightings.

Two important obligatory caveats before we dig into different STAR score scenarios:

Three of the scenario models would never be acceptable under the Every Student Succeeds Act. For example, by law, state accountability systems are required to include academic achievement in math and reading, while one of our explored scenarios ignores achievement altogether. As we describe each model, we will note the degree to which it is allowed under the law and might pass the U.S. Department of Education's peer review process. We opted to explore a variety options anyway so readers can fully understand the data interactions from all angles.
We ultimately agree that keeping achievement in an accountability system is important. EmpowerK12's ultimate vision (hopefully, one that many of our readers also share) for DC schools is to be the highest performing urban district in the U.S. as well as the first to demonstrably close the achievement gap, celebrating schools who accomplish this feat along the way. If an accountability system exclusively focused on growth, we would lose sight of these goals and the ability to identify schools helping our students meet them. Also, at the elementary school level, growth is only captured at the 4th and 5th grade level. Schools serving high poverty students with great growth in pre-kindergarten through 3rd grade may not show really high growth in 4th and 5th grade, yet it is possible for their students to demonstrate higher achievement, closing the achievement gap by the time they take PARCC for the first time in 3rd grade. This is why we have advocated for the STAR accountability system to include metrics that focus on students a school has served 3+ years, especially those serving at-risk students, to indirectly account for that growth.

Overview of our general thought process in determining the models we ultimately opted to explore

We want to start by reiterating all of the results from our selected model options utilize the publicly-available STAR data only. We also only focused on re-scoring schools on the elementary and middle school frameworks. Since one main thrust for this analysis was to examine increasing the weight of growth, we did not include high school at this time, even though PCSB published high school median growth percentiles for the charter high schools. We will dig-in to this data in a separate HS growth blog post soon.

In terms of the various model options, we focus mainly on adjusting the following components of the STAR Technical Guide:

Relative weighting for growth. For the elementary and middle school STAR frameworks, the weight for PARCC growth is 40 possible points and 30 for achievement. We explore options that increase the weight of PARCC growth and/or decrease the weight of achievement. We further explore an option that gives additional weight to the other growth metrics for attendance and English learners. For comparison purposes, we also ran a model that is achievement-only.
Growth or achievement. In many respects, if a school has students meeting or exceeding achievement expectations for their grade, how much do we care about the growth data? Conversely, if a school serves students coming in below achievement expectations, then we probably really care more about whether students are growing towards expectations in a timely manner. One option a couple of our models consider is giving additional weight, in terms of points possible in the overall score, to whichever one of 3 metrics (PARCC 4+ achievement, median growth percentile, and growth to proficiency) the school is already earning the highest percent of points. If, for example, School A in math earned 45% of possible points for PARCC 4+, 85% of MGP points, and 90% of growth to proficiency points, then we would award bonus possible points to the "growth to proficiency" metric for School A.
Additional weight for At-Risk student group in overall score. We explore a couple options for increasing the number of possible points in the final score for schools serving a significantly higher portion of at-risk students. One method, reallocating the points for the "All Students" and "At-Risk" student groups (which account for a total of 80 out of 100 of the final STAR score, 75 for all students and 5 for at-risk) based on percent at-risk served, is not likely to pass Peer Review. However, the feds have approved state plans (e.g. Wisconsin for English learners) with smaller weighting changes based on percent served, so we explore an option that weights At-Risk between 10-20 points, depending on a school's percent of students at risk.
Lastly, for one model, we adjust school median growth percentiles to control for the additional summer learning loss frequently experienced by low-income students. There is plenty of research that demonstrates using spring-to-spring normed growth measures like Median Growth Percentile can put schools serving low-income students at a slight disadvantage due to summer learning loss gaps with higher-income peers. To determine how much adjustment to make, we utilized EmpowerK12 partner schools' student-level data from another national Common Core-aligned test, the NWEA Measures of Academic Progress, analyzing the impact of serving at-risk students on fall-to-spring and spring-to-spring MGPs. We find a statistically significant correlation (p<0.06) between percent at-risk served and school-level differences between the two MGPs. The more at-risk served, the lower a school's spring-to-spring MGP is compared to the fall-to-spring MGP in the same school year. This analysis confirms other summer loss research and also gives us a solid model for adjusting every DC school's spring-to-spring PARCC MGP to control for the demographic impact of summer learning loss. Publishing and utilizing an adjusted PARCC MGP is not something OSSE can likely do anytime soon. The PARCC consortium would need to produce an abundance of psychometric-level data analysis in order to pass the ED Peer Review process. However, we feel this MGP adjustment is worth further exploration in how EmpowerK12 identifies DC schools and teachers that maximize student growth over the 180 days they actually have them in class. More on this soon when we announce our next round of Bold Improvement schools!

Read on to learn about all of our model scenarios.

Just want to know our final recommendations based on scenario outcomes and explore the data yourself? Skip to the bottom for our preferred model and link to a dashboard with all of the model data!

Summary of the model options

The chart below lists the models we created for this STAR thought-experiment and their included components.

Screen Shot 2020-12-10 at 3.32.17 PM.png

OSSE STAR 2018: This is the original published STAR score for every school using the approved technical guide
Achievement Only: This model looks at school outcomes if the STAR framework only utilized the PARCC 3+ and PARCC 4+ metrics for math and ELA
No Achievement: This model is the opposite of Achievement Only and includes just the academic growth, ELL growth, and school environment metrics
Achievement and Growth Only: This model does not include the school environment metrics
EK12 ESSA-Compliant Recommendation: This model includes two tweaks to the STAR framework that helps reduce the predictive power of demographics on the STAR ratings. They are adjustments likely to pass the U.S. Department of Education peer review process as similar changes have already been approved in other state plans
EK12 Preferred Model: This model includes everything we think can be accomplished within the STAR Framework to minimize the impact of demographics on the ratings and focusing on the quality of education students received during the academic year. We believe a couple of the changes, while totally legit, are likely to find significant challenges in the ESSA peer review process based on the ESSA regulations

Impact of demographics for each model

The table below shows two statistics that help describe the impact of demographics (i.e. at-risk, special ed, ELL, etc.) on a school's STAR score, the predictive power of the demographics in determining the score and the correlation coefficient. Larger magnitudes in both numbers mean that the demographics a school serves matters more in the final score.

Screen Shot 2020-12-10 at 3.33.04 PM.png

The two models with the least demographic influence are "No Achievement" (the illegal model that does not include PARCC 3+ and PARCC 4+ metrics) and our "EmpowerK12 Preferred" model. Results also show that making our two proposed tweaks to the current STAR framework would reduced the influence of demographics by about the same amount as adding growth and environment reduced impact compated to the prior achievement-only accountability system.

The scatter plot below shows the correlation of the original STAR score (grey) and our EK12 Preferred model (blue) with percent at-risk served by school.

Screen Shot 2020-12-10 at 3.33.47 PM.png

Our final model recommendations

Ultimately, based on model data and federal ESSA regulations, we have two different recommendations. Our team favors the EK12 Preferred Model because we believe it most closely aligns with our vision for DC schools' future and does the most to isolate the variable we feel school accountability systems are supposed to help us identify, school impact on student outcomes that we currently measure and care about the most. This m odel significantly lowers the predictive power of demographics served on a school's final score, attributes more points to achievement or growth depending on where students are already performing, aligns student group points possible more closely with actual demographics served, and controls for summer learning loss in estimating a school's impact on student growth.

However, since that model is unlikely to meet ESSA regulations, we also recommend OSSE explore our "EK12 ESSA-Compliant" model as it requires only two business rule changes, lowering the impact of demographics on STAR scores while also likely to pass federal peer review. This model gives a little more weight to the At-Risk student group depending on percent of those students enrolled and assigns more points to achievement or growth depending on where students perform best.

With either model, we still encourage OSSE to explore additional metrics of student successand parents to look beyond the overall STAR score.

What if growth mattered even more on STAR? And demographics less?

Two important obligatory caveats before we dig into different STAR score scenarios:

Overview of our general thought process in determining the models we ultimately opted to explore

Read on to learn about all of our model scenarios.

Summary of the model options

Impact of demographics for each model

Our final model recommendations

Explore our Dashboard with results from all model options!

What if STAR included growth for DC high schools?

Does DC have a teacher retention problem?

EmpowerK12