On November 30, a major child welfare publication reported on a new study, published in the respected journal Children and Youth Services Review, that tested Broward County, Florida’s child welfare decision-making model against a model that was derived using the new techniques of data mining and supervised machine learning. The researchers concluded that 40% of cases that were referred to court for either foster care placement or intensive services could have been handled “with less intrusive options.” A close reading of this opaquely written paper, as well as conversations with two of the authors, Ira Schwartz and Peter York, reveal a pioneering effort at applying emerging data science techniques to develop a “prescriptive analytics” model that recommends the appropriate services for each child. This research is innovative and exciting but this first attempt at deriving such a prescriptive model for child welfare has serious flaws. These very preliminary results should initiate a conversation but should not be used to support policy recommendations.
The authors began with a large database of 78,394 children with their complete case histories between 2010 and 2015. They merged datasets from the Broward County Sheriff’s Office, ChildNet (the local agency contracted to provide foster care and in-home services) and the Children’s Services Council (CLC), which represents community based agencies serving lower-risk cases. The authors primarily used only one year of data on each child after they were discharged from the system. Children without a full year of data were not included. So the authors had a large selection of hotline, investigative, and service data for the children in their database as well as information on whether they experienced another referral within a year.
In a nutshell, the authors applied machine learning to build a model “based on the segmentation and classification of cases at each step of the reporting, investigation, substantiation, service and outcome process.” The result was the creation of groups or clusters that have a similar combination of characteristics based on hotline and investigative data. Each stage of the modeling process produces progressively more uniform groups. The goal was to ensure that if these groups received different treatments, the difference in outcomes would be due to the treatment and not some other aspect of the children or their situations. Within each group of similar children, the researchers compared those who receive different interventions, namely removal from the home or community-based prevention services. They used a technique called propensity score matching to control for differences between members of .each group that might affect their outcomes. The authors use one outcome–whether a child is re-referred to the system within a year of exit–to determine whether each intervention was successful.
Based on this analysis, the authors concluded that many families are receiving services that are too intensive for their needs. For example, they concluded that “at least 40% of the cases that were referred to the court and to Childnet (mainly for foster care) were inappropriate based on the outcome data for children in their cluster group. The authors then went on to claim that these “inappropriate referrals” are actually harming children. For example, “inappropriate referrals” to court were 30% more likely to return to the system after the court referral than they would have been if the referral had not been made. And “inappropriate referrals” to ChildNet were 175% more likely to return to the system than similar cases that did not receive such a referral.
Finally, the authors present a “prescriptive” model that addresses the question, “Which services are most likely to prevent a case from having another report of abuse an/or neglect [within a year]?” This concept of “prescriptive analytics” is a new one in child welfare if not human services in general. The authors devote only two paragraphs to this model but they note that it would result in a decline in “inappropriate referrals” to court and ChildNet.
Even if we accept the machine learning process presented by the authors as a reasonable basis for estimating risk, several issue remain about the authors’ findings. The first issue is the use of one-year re-referral rates to denote intervention success. Ongoing maltreatment may not be seen or reported for months or years. The authors report that 57% of their cases that received another referral did so within one year. However, that leaves 43% that were referred after a year had passed. These cases were not counted as “failures” by their model. In addition, because the databased covered only 2010 to 2015, the authors did not include any referrals that happened after 2015, including those that are yet to happen. If the authors classified some cases wrongly as not returning, this reduces the validity of their model.
The second problem stems from that famous social science bugaboo–unmeasured differences between groups. The authors relied entirely on hotline and investigative data on family history and characteristics. Yet, many family issues may not be reflected in the data. These could include unknown histories of criminal behavior, mental illness, violence, or drug abuse. If the authors observed that an intervention appeared to cause harm to certain children, the explanation may not be that the intervention was inappropriate. A more plausible explanation might be that that the matching algorithm failed to correctly assess risk as well as the social workers in the system. If the cases referred to the court were in fact those that social workers correctly identified as being at higher risk (even though this was not picked up by the algorithm) one might expect higher rates of return to the system of these cases relative to cases that were matched with them by the algorithm but not referred to the courts. This possibility seems a lot more likely than the possibility that court-ordered services made parents more abusive or neglectful.
A third problem relates to the use of the child rather than the family as the unit of analysis. The family or household is obviously the appropriate unit of analysis here. It was the parents or caregivers that perpetrated the abuse or neglect and they are the main recipients of services. Author Peter York agreed that using the family would be be more appropriate but explained that most of the data in the system were linked to the child and not the family. Using the child as the level of analysis means that the same parents will be counted as many times as they have children in the system. This will obviously weight larger families more heavily, with whatever biases this may introduce.
Finally, it is concerning that the authors reported about the proportions of children that were provided with too-intensive services such as foster care but not the proportion that were provided with services that are not intensive enough. We all know about the worst case scenarios when children die or are severely injured after the system failed to respond appropriately to a report, but there are many more cases in which allegations are not substantiated or interventions are not intensive enough, and the children return to the system later, often in worse shape. Reporting on one type of error but not its opposite invariably raises questions about bias.
The authors should not be blamed for making too much of their findings. In their article abstract, they do not mention the specific findings about over-reliance on foster care and more intensive child welfare interventions. Rather, they argue that their findings indicate that “predictive analytics and machine learning would significantly improve the accuracy and utility of the child welfare risk assessment instrument being used.” I fervently agree with that statement. But this new approach by Schwartz et al is qualitatively different from the predictive risk modeling algorithms currently being applied and studied by jurisdictions around the country. In particular, the authors used machine learning to identify groups with similar risks but which received different treatments. Their purpose was to assess the effectiveness of distinct treatments of different subgroups. How well this approach will accomplish that purpose remains to be seen. This fascinating study is just the the beginning of a conversation about the utility of this new approach, not an argument for reducing the reliance on foster care or community services.