A Right to Know: Q&A  

 

 

Given that you have a model that predicts “out of sample” to all the communities not in the data, anyone can see their predicted DFI score. How are you communicating that? And are you providing a warning that the score is just a prediction and can be wrong? 

  • Once the website is built out and active and community members can find their DFI, we will include information about how the index was imputed (i.e., with the IRT model or Random Forest model) along with their community's technologies. Our Random Forest models both had around 85% accuracy rates and we are confident in the DFIs that were derived from the model.    

It is unfortunate that there is no national database, do any US states have databases of surveillance tools so that the Atlas of Surveillance could be validated for one state? 

  • For this project, we focused our analysis on publicly available data, nationwide, but this is definitely something our team plans to drill deep into over the summer.  

How can we discover the data on the technologies that police are using to conduct surveillance against citizens? Are you in contact with EFF? And do you have plans to expand data collection through FOIAs or another tool?  

  • Some of our project’s next steps include expanding our data collection to expand our digital force database. Depending on how the project grows we may have to pursue that route but at present we do not have plans to FOIA police departments or local governments.  

Can you explain again why you created two models? I didn’t quite understand diff btw and the output of these models, particularly for the census data model, since it isn’t taking into account the other data (police tech types/ quantity, arrest records, etc.) - or is it?  

  • In combining census data with police funding and arrest records, we noticed the police funding and arrest records data were missing for about 45% of the US counties. We needed a way to generate DFIs for the 45% of counties without funding and arrest records, even though the first Random Forest model was dropping them. We decided to create two models one that pulled from the full dataset and one that pulled from just census data since the census data was all we had for the 45% of counties that were missing funding and arrest records.  

Can you explain again how the RF model works to predict the DFI for counties/communities where there was a lack of police tech/ digital force data?  

  • Because we were missing police technology data for most of the country but wanted everyone to have a DFI, we decided to use a Random Forest model to predict what the DFI would be for a community based on how a community matched other communities in basic demographic makeup, police funding, arrest records, and so on. Using the Random Forest model’s predictive capabilities, we can say that communities with similar data will have a similar DFI.  
View All