Bots Go to Washington
The Federal Communications Commission has unwittingly provided the world with a database of 5 million names, locations, and email addresses of people who care about a particular policy issue: net neutrality. If there is one saving grace to the FCC’s public disclosure problem, it’s that its system is so bad that it’s hard to know which comments are real and which are machine-generated.
The FCC’s filing system makes no effort to verify human submission through user account generation or “I am not a robot” CAPTCHA technology. The tools for bulk submissions are also vulnerable to abuse — any programmer can set up automated submissions with a few lines of code. The system can receive unlimited autofill entries with comments for or against a particular policy. Unfortunately, the platform used by many federal agencies, Regulations.gov, similarly lacks technology to prevent bot comments.
In response to a recent malicious service attack, the FCC may now have plans to upgrade its comment system. The agency should make reasonable efforts to ensure that its comment system collects coherent arguments from actual people and organizations, not machine-generated mass-mailings. At the same time, it should avoid revealing personally identifiable information to the public.
The FCC could strike this balance by using CAPTCHA technology or by requiring individuals to create user accounts in order to submit comments, like, for example, the White House’s We The People petition site. Moreover, the FCC can address its data disclosure problem by, first, ensuring that it only collects the personally identifying information it needs and, second, de-identifying the data it makes public. If the FCC had adhered to the Federal Trade Commission’s best practices for privacy and security in its comment system, as do other federal agencies, it could have avoided the public release of millions of email addresses.
The FCC could address the data protection issue by following the FTC’s recommendations. In a 2012 report on protecting consumer privacy, the FTC advised companies to “incorporate substantive privacy protections into their practices, such as data security, reasonable collection limits, sound retention practices, and data accuracy.” The FTC has implemented best practices to collect comments on its own site, which notes that “As a matter of discretion, the FTC makes every effort to remove home contact information for individuals from the public comments it receives before placing those comments on the FTC Web site.” Personally identifiable information displayed includes name and home state, but redacts more specific contact information.
The FCC’s comment system includes no such privacy safeguards. It may be collecting more personally identifiable data than it needs, and it’s making far more of these data available to the public than necessary. The highly publicized nature of the net neutrality-Title II proceedings simply serves to increase the exposure risk.
It is conceivable that the government could find unforeseen, beneficial uses of such data to develop better ways to provide services. However, government data collection poses more risk than private data collection, which is why strict rules exist on what data the federal government can collect. Because federal databases have records on individuals’ health, disability, criminal activity, taxes, employment, and political donations, agencies are restricted from merging federal datasets. This keeps data that may be helpful for a single proceeding today from being combined with personally identifiable data from other federal records and then being used in unforeseen, possibly detrimental ways.
As it stands, the FCC’s electronic comment system is too open to bots and humans, even for an open Internet proceeding. To ensure the integrity of administrative processes, the FCC must address these data and privacy problems immediately.
Sarah Oh is a Research Fellow and Brandon Silberstein is a Research Associate at the Technology Policy Institute.