Access to Data

Access to data is typically managed by reviewing and controlling the following factors:

  • Who has access to the data?
  • What part of the data?
  • During what time period?
  • For how long?
  • For what type of use?

Data access control applies to electronic data just as it does to things like lab notebooks, patient intake forms, and survey forms. Data access control is managed both technically (the digital equivalent of IDs, locks, guards, etc.) and by policy (conditions of access to the data).

Use of IS&T services for storage, backup and archiving will generally allow for the kind of access control (read/write/execute for owner/group/world) that would typically be required in a research setting.

Many researchers will have a simple access policy; they will manage the data on their own local systems using commodity security software bundled with their operating system.  Then, at the time of publication, their data will be placed in an institution-supplied repository. However, there are a number of cases that require more complicated access control. These include:

  • Subject/patient privacy
  • National security / export controls
  • Legal / liability
  • Intellectual property
  • Competitive advantage
  • Accuracy and correctness
  • Funder or industrial sponsor control mandates

Considerations for data access may come from:

  • Researchers themselves: Restrictions to keep a competitive advantage, by being able to analyze data before it is available to others; to ensure the accuracy of data and analyses before making either public; for ethical considerations in terms of the use of the data; or to meet funder or institutional requirements. Researchers may desire open access to advertise successes and share results.
  • Funding agencies: The purpose of funding is often for the public good, so funders want to make results available to as large a research community as possible.
  • Government interest: The government may encourage or discourage sharing of data in the national interest. Restrictions in the form of secret classifications or export controls keep data for use of one nation. Release of information can be intended to spur economic growth and maintain leadership.
  • Legislative powers: Researchers must follow all laws, particularly as they protect privacy and safety of subjects, patients, researchers, and employees.
  • The University: Boston University must protect its interest in intellectual property, and protect itself from liability claims. Therefore the University will generally have broad powers to require access to data as it needs, and control the release of data that it considers sensitive.

Elements of access control include authentication, authorization, administration, and audit:

  • Authentication is the process of determining the identity of a requestor. This requires a system for collecting and verifying credentials such passwords, captchas, etc.
  • Authorization refers to the relationship between a requestor and a set of data, specifying which parts of the data, if any, the requestor may access. Authorizations can change over time.
  • Administration is the process of managing access. Some set of people will have direct control over the policies and mechanics of access control. Usually at least the PI and a University employee will have such control.
  • Auditing is the monitoring and recording of actions of requestors. Auditing may be used to ensure compliance with access policy, as well as provide information about use of the research data for reporting and predictive purposes.

The types of access (or “permission”) are generally classified as:

  • Read permission
    • See what objects are available (e.g., list contents of a folder/directory)
    • Download/look at contents of an object
  • Write permission
    • Create objects
    • Edit/modify objects
    • Delete objects
  • Execute permission
    • Run programs

These permissions may be granted on an individual basis or to sets of people:

  • Owner (there is almost always one person or entity designated as the owner)
  • Specific users (which requires that lists, or groups, of individuals be maintained)
  • Institution (where the credential supplied for access is proof of membership in the organization, or the domain in which the accessing machine resides)
  • World (specifies that the data is open to anyone asking for it)

It is possible to define access in the negative, i.e., to specify who is denied access. The most common case of this is based on national security (for instance, in the form of export controls).

Below are some references related to access control:

  • From Michigan State University: somewhat old, but a succinct statement of policy for access to data. “Following are a set of ‘best practices,’ approved by the University Research Council in February 2001, developed to assure that research data are appropriately recorded, archived for a reasonable period of time, and available for review under the appropriate circumstances. …”
  • From MIT, on Ethical and Legal Issues: “When publishing data, it is vital to consider the rights and responsibilities you have with regard to issues of confidentiality and intellectual property. … “
  • NIH supplies a number of references on data sharing on one web page. Others give specific guidelines for data access: “Data Sharing Guidance” and “Access to Research”
  • A 2006 report from the Council on Government Relations is relevant to the topic of access control. “Access to and Retention of Research Data: Rights and Responsibilities focuses on effective practices for the management of research data. The text describes an aspect of management, e.g., ownership, restrictions, sharing, and is followed in many instances by a case scenario or study.