NOTE: This article is a work in progress. The information is useful as is, but it is not yet finished. Please check with your OIT consultant for more details on the data security aspect of your research project.
Background
Researchers at Central Michigan University who plan to conduct studies involving human subjects must work with the University’s Institutional Review Board (IRB) to review their project proposals and obtain approval before starting their work with human subjects. Generally speaking, the primary goal of the IRB is to ensure the safety of the human subjects and to minimize any possible risks. Mishandling or the unintended disclosure of data about persons involved in research can often pose the most likely “risk” to the research subjects. Research proposals can take many different forms, involving various methods of collecting, analyzing, storing and reporting data. Exactly how best to secure and maintain the confidentiality of data throughout a research life cycle is not common knowledge. It is also true that many researchers are not initially well versed in University level policies classifying data and prescribing specific data security controls and standards. Therefore, the IRB will often require a data security consultation with a member of OIT who possesses knowledge of the IRB’s review processes and expectations as well as the University level data security policies and how to implement them.
Why a Data Security Consult with OIT is Called For
Research Proposals Are Evaluated According to Established IRB Principles
The CMU IRB is committed to ensuring that research involving human subjects adheres to the ethical principles laid out in The Bellmont Report (1979). The first of the three ethical principles is “Respect for Persons” and is summarized in the link above as:
“Respect for Persons, which is ensured by obtaining informed consent, consideration of privacy, confidentiality, and additional protections for vulnerable populations.”
Building off of those ethical principles, research proposals that come to IRB are evaluated according to specific criteria (known as “The 111 Criteria” from federal regulations for the protection of human subjects at 45 CFR 46.111) all focused on protecting human subjects. Criterion number seven is most relevant to data security, with the key words of privacy and confidentiality defined for clarity:
“Criterion 7. Participants' Privacy Will Be Protected and Confidentiality of Data Will Be Maintained”
“Privacy: Having control over the extent, timing, and circumstances of sharing oneself (physically, behaviorally, or intellectually) with others.”
“Confidentiality: Methods used to ensure that information obtained by researchers about their subjects is not improperly divulged.”
Put simply, an examination of how the privacy and confidentiality of research subjects is maintained through data security measures is a baked in part of an IRB review.
University Policies Governing Data Handling and Security Exist
Central Michigan University has at least two policies that are pertinent to data security in research. Both policies apply to all University faculty, staff, students, student employees and contractors with the signing authority being the University President. The policy names are Data Stewardship and Secure Configurations Policy – Workstations and can be found on this page listing University policies with policy numbers 3-30 and 3-49 respectively.
Data Stewardship Policy
CMU’s Data Stewardship Policy has an all-encompassing scope and explicitly discusses research data, so it certainly applies to the construction of a data security plan for research involving human subjects.
The scope is established in the first paragraph under the heading of “Purpose”:
“The Data Stewardship Policy applies to All University faculty, staff and students. This policy encompasses the safekeeping of the University’s information in whatever physical form (such as printed, audio, video and electronic) it may exist, now or in the future”
The policy defines roles and responsibilities, then goes on to establish a tiered data classification model ordered from least-to-most controlled based upon risk. Those classification tiers are called “Public”, “Protected” and “Restricted”, with “Restricted” data requiring the most control and security rigor. By the definitions within the policy, most research data would be categorized as “Restricted”, especially if personally identifiable information (PII) is collected, stored or analyzed.
Most of pages 3-4 in the policy are devoted to describing what “Restricted” data is and how to handle it. Researchers should definitely be familiar with the whole policy, but below will be some important quotes underneath the “Restricted” heading that researchers should find instructive in constructing their data security plans.
“Restricted data includes personally identifiable information (PII) that is required to be protected through contractual and/or legal specifications, as mandated by CMU’s Institutional Review Board (IRB) and/or specified in state or federal law.”
“The types of data included in the category are, but not limited to…HIPAA-protected health data…personal intellectual property that might be housed for academic reasons on University computing resources…research data including data and consent from research subjects.” (Some examples omitted here as indicated by the ellipses.)
“If Restricted data needs to be sent outside CMU systems, it must be encrypted. Restricted data should also be encrypted at rest whenever possible.”
“Loss or release of restricted data may require additional steps to mitigate for any harmful effects… For these reasons, restricted data must not be stored on personally-owned devices (personal laptops or desktops) or in online services (like Google Drive or Amazon Web Services) not specifically contracted and intended by CMU for the storage of restricted data”.
The Data Stewardship policy is five pages long, single spaced and quite detailed, but to summarize what researchers should take away in bullet point form:
- The policy applies to all CMU faculty, staff, students, student employees and contractors.
- Most research data, but especially research data containing personally identifiable information would be classified as “Restricted” – the classification that calls for the most safeguards and care.
- Restricted data must not be stored on personally owned devices or in online services that have not been specifically contracted and intended by CMU for the storage of restricted data.
- Email is almost inherently insecure and inappropriate for the transmission of restricted data; if email must be used to communicate about restricted data, the restricted data should not be transmitted in the email communication itself, but rather a link should be provided to a system designed to house the restricted data (SAP, EPIC, Office365 etc)
- Encryption must be employed if restricted data needs to be sent outside of CMU systems, and restricted data should be encrypted at rest whenever possible.
Secure Configurations Workstations Policy
Similar to the Data Stewardship Policy, the “Secure Configurations Policy – Workstations” applies to all faculty, staff, students, student employees and contractors and is signed by the President. Essentially what the policy does is to describe what the baseline or minimal level of security should look like on a “workstation” (computers below the classification of “server”) by prescribing a series of best practices and the implementation of standard controls. Even before the “standard controls” are listed and defined in the policy, there is the following callout of how “restricted data” (the classification for most research data, especially research dealing with PII) should be handled:
“Workstations used for access, processing, or storage of Restricted data (see Data Stewardship Policy) may require additional, specialized protection and must have those protections installed, activated, managed and kept up-to-date. Users and their IT support personnel are responsible for knowing and maintaining additional, applicable controls.”
The policy then goes on to describe thirteen different “standard controls” and how they should be implemented:
- Administrative Access (for OIT)
- Asset Tracking (CMU Property tags and network registration)
- Configuration Management (Automated and actively managed config systems)
- Malware Protection
- Password Protection
- Patch Management (Automated software updates etc)
- Personal Firewall
- Physical Protection
- Proper Disposal (Procedure to dispose of a system wherein data is securely destroyed/erased)
- Removable Media Protections
- Session Time-Out or Screen Saver (system locks and prompts for password after being idle)
- User Data Backups
- Whole Disk Encryption
For researchers that read through the Secure Configurations Policy – Workstations document, the key takeaway should be that from CMU’s stated policy perspective, even systems that handle more mundane (low risk) computing tasks and chores must be configured to a standard that includes security controls that require the involvement of CMU’s Office of Information Technology to properly implement, verify and maintain. Even if the Data Stewardship policy hadn’t explicitly forbade placing restricted data on personally owned systems (which it did) it should be clear after reading the Secure Configurations for Workstations policy that it isn’t practical or probably even possible to meet that standard of system security on a personally owned device. In addition, it is expected that systems used to work with or store restricted data (most research data) must not only meet, but often exceed the secure configurations for workstations standard through the application of additional security controls as appropriate.
What an OIT Consultant Will Look For in a Data Security Plan for Research Involving Human Subjects
The aforementioned policies will act as a framework from which to build a data security plan. Of utmost importance though is minimizing any potential risk to research participants by maximizing the level of confidentiality provided to them through the data security measures. Therefore, most consultations will start by zeroing in first on personally identifiable information (PII) within the research proposal.
A Focus on the Handling of Personally Identifiable Information (PII)
In research involving human subjects, the data collected about those human subjects can generally only present a risk to those individuals if their contributions can be attributed to them. Personally identifiable information (referred to as PII) are pieces of information or data points (taken in combination or alone) that can lead to the identification of an individual.
Examples of PII data points include, but are not limited to:
- Names
- Phone numbers
- Postal addresses
- Email addresses
- Birth dates
- Social security numbers
- Medical record numbers
- Account numbers
- Drivers license or ID numbers
- Internet Protocol (IP) addresses
- Social media account names or identifiers
- Biometric identifiers (finger prints, retinal scans, DNA)
- Images or video of an individual’s face
- Recordings of an individual’s voice
It is not uncommon for research involving human subjects to collect some personally identifiable information – in fact, it is often necessary. In some research projects, PII such as names or contact information may be collected on informed consent documents if researchers or the IRB feel that such information may be necessary to ensure the safety of their participants. In other projects, researchers may offer some form of compensation to participants, and need to collect some PII (such as an email address) simply for the purposes of tracking participation and awarding the compensation. That being said, it is rarely necessary to collect, store or analyze PII directly in-line with the other data collected from and about participants.
After identifying where and why PII is collected in a research proposal, an OIT consultant will often help researchers devise data handling strategies to keep PII secured and segregated from the rest of the data about research participants.
As an example, consider a research proposal involving having participants repeatedly fill out a survey over time, with compensation offered at completion. A common data handling mistake would be to have the participants provide a piece of PII each time they filled out the survey, such as their email address. Doing so would result in a data set that intermingles PII and research data points that is unnecessary and could present risks to the confidentiality of participant data.
Figure 1 – Avoid creating a data set like this
Qualtrics Survey Results
|
Email Address
|
Research Answer 1
|
Research Answer 2
|
Research Answer 3
|
Smith1za@cmich.edu
|
“High Anxiety”
|
“Low Familial Contact”
|
“Often Angry”
|
Paul.Weston@dow.com
|
“Moderate Anxiety”
|
“High Familial Contact”
|
“Rarely Angry”
|
In the example above, researchers will have created a data set that results in personally identifiable information (the email address) being collected and stored in a way that makes the research contributions of the participants attributable to their identity. Also, according to CMU’s Data Stewardship Policy, since this data set is research data that includes PII, it be given the data classification of “Restricted” which places the greatest level of constraints on where the data can reside and which systems can be used to analyze it etc. The researchers likely did not construct this survey like this because they desired to later analyze the participants’ responses right next to their identifiable information, but more likely because they wanted a way to measure completion and award compensation.
A better approach would be to collect/store PII and the other research data separately, with separate surveys using a non-identifiable participant ID value with the actual research data.
Figure 2 – Separate the PII from the research answers
Qualtrics Survey 1 – Collect PII for enrollment, informed consent etc, assign participant ID
|
Email Address
|
Assigned Participant ID
|
Smith1za@cmich.edu
|
89X2C
|
Paul.Weston@dow.com
|
63W4A
|
Qualtrics Survey 2 – Collect actual research data
|
Assigned Participant ID
|
Research Answer 1
|
Research Answer 2
|
Research Answer 3
|
89X2C
|
“High Anxiety”
|
“Low Familial Contact”
|
“Often Angry”
|
63W4A
|
“Moderate Anxiety”
|
“High Familial Contact”
|
“Rarely Angry”
|
The approach in figure 2 (above) has several advantages. First, it avoids the creation of a table or dataset that directly intermingles personally identifiable information and research data about the human subjects. This fact alone makes it far less likely that “restricted data” (research data that includes PII) will be accidentally downloaded, stored or analyzed on a storage platform or a system that is not approved for use with restricted data later on in the research project. Second, the clean separation of PII from the research data provides clarity on where additional technical controls and data security measures are needed.
If PII is Collected Understand Why, Where and How
After an OIT consultant has reviewed your research proposal documents, (IRB application documents, informed consent language, recruitment materials etc) if the project includes personally identifiable information (PII) it will be important to understand some basic questions about that aspect of the research so that a data security plan can be constructed accordingly.
Why is PII Collected?
Be prepared to help the OIT consultant understand why the collection of PII is necessary in this research project. Is it to help ensure participant safety? Is PII collected merely to demonstrate a signed informed consent? Is some PII being collected strictly to facilitate some compensation to participants? If it is determined that the personally identifiable data points are not strictly needed to ensure the safety of the participants, to meet some other obligation to the research participants or for the success of the study itself, then the researchers should consider not collecting it. If however, it is determined that collecting the PII is necessary, the reasons for it should be understood and the next steps will be to figure out where and how to store and analyze the PII in a way that maximizes confidentiality, minimizes risk to participants, and complies with existing University policies.
Where Will the PII Be Collected, Stored and Analyzed?
Recall that CMU’s Data Stewardship Policy defines research data that includes personally identifiable information (PII) as “Restricted”, the highest/most sensitive level of data classification at CMU. That same policy defines where restricted data should live:
“Loss or release of restricted data may require additional steps to mitigate for any harmful effects… For these reasons, restricted data must not be stored on personally-owned devices (personal laptops or desktops) or in online services (like Google Drive or Amazon Web Services) not specifically contracted and intended by CMU for the storage of restricted data”.
The policy states that restricted data (which is what research data that includes PII would be classified as) must not be stored on personally owned devices or in online services not specifically contracted and intended by CMU.
In practice in many research projects, this means that plans to collect, store or analyze research data that includes PII data points must be done using CMU owned systems or CMU managed platforms.
Examples of plans that run afoul of the where aspect of data handling restricted/PII data would include:
Recording participants using a personally owned smart phone or tablet.
Audio and video recordings collect voice prints and/or images of an individual’s face, both of which are considered personally identifiable data points. Research data that includes PII should not be collected or stored on personally owned devices.
Gathering survey responses that include any PII data points, including email addresses using Google Forms.
Google Forms or Google Drive are not currently specifically contracted and intended by CMU for the storage of restricted data. A better choice would be CMU’s Qualtrics instance, or Microsoft Forms, using CMU credentials and CMU’s Microsoft Office 365 platform.
Allowing anyone, but most commonly students, collect, store or analyze research data that includes PII on a personally owned device.
It is not common for CMU students to be assigned a physical computing device that is owned and actively managed by CMU and OIT, but it is common for students (especially graduate students) to participate in research projects and analyze data. It would be best to construct the research project’s workflow such that personally identifiable data points are collected and stored on CMU managed or intended systems, and that only de-identified data sets be analyzed by students. If that isn’t possible, students also will have access to the CMU Virtual Lab or possibly CMU managed physical computers in lab areas, as well as CMU managed storage, like the UDrive, CAB/MCAB shares, or OneDrive etc. Your OIT consultant can help you choose the appropriate platforms to use.