12 May 2016
When bright lines blur: when is personal information "de-identified" and why does it matter?
by Steven Klimt, David Kreltszheim, Talia Lirosi
Even if you de-identify personal information, you must minimise or manage the risk of others putting the pieces together and identifying individuals.
In general, regulation under the Australian privacy laws is triggered by the handling of "personal information". That's a clear bright line, right?
The challenge for organisations is that "personal information" extends to information that can reasonably identify an individual (not just information that, on its face, identifies an individual). So an organisation handling information that may reasonably be used to identify an individual needs to comply with the privacy laws unless it takes effective steps to "de-identify" that information ‒ and to stop others from putting the pieces back together.
What’s in a name? Quasi-identifiers as pieces in the identity puzzle
Information that can reasonably be used to identify an individual is also personal information, even though the information may not on its face refer to the individual.
For example, a motoring association has various pieces of information about every owner of a particular model of vintage car. If it uploads the name of an owner's hometown and one or more of the day, month and year of birth (but not the owner's name) to its public website, it could have disclosed a car owner's personal information.
This is because dates of birth (and sometimes years and months of birth), towns of residence and the fact that an individual owns a particular type of vintage car can be "quasi-identifiers". Quasi-identifiers may permit an individual to be re-identified (or their identity strongly inferred) depending on the nature and number of the identifiers and the number of individuals who share the same characteristics.
Big data: big opportunities, big privacy risks
Certain business practices involve the handling of information that a business does not intend to be personal information, such as:
- online banner advertisements that are displayed to an Internet user based on the user's browsing habits, without reference to the name or other information identifying the individual who is browsing the Internet;
- data analytics for market research purposes, where it is important to know what X's supermarket shopping habits are (for example), but the actual identity of X is irrelevant as the organisation is not seeking to market any products directly to X as a result of that research; and
- disclosure of "de-identified" information by an organisation to third parties for commercial purposes (for example, results of market research or information for investors).
The problem is that technical advances in data mining and analytics, combined with increases in computing power and data storage capacity, have significantly expanded the categories and volume of the raw information available to private and public sector organisations.
This "big data" capacity magnifies the opportunities and risks (including privacy risks) for organisations, particularly the risk that ostensibly de-identified information can be re-identified by cross-matching with other (identified) information. Notable examples of this include:
- journalists demonstrating in 2006 how it was possible to re-identify selected individuals (and their Internet search histories) following AOL's public release of 20 million AOL search queries not linked to any identified person; and
- academics demonstrating in 2008 how it was possible to re-identify selected individuals (and their movie ratings and preferences) following Netflix's public release of a database of 100 million movie ratings by Netflix subscribers not linked to any identified person.
If ostensibly de-identified information can be linked to a particular individual, the organisation holding that information must comply with applicable privacy laws in handling that information. For example, in May 2015, the Privacy Commissioner found that certain metadata that Telstra held about an individual's phone communications (like time and location data concerning incoming and outgoing calls and texts from his phone) was "personal information". This was even though matching such metadata to a particular individual would take four days' full time work to retrieve one week's worth of data, and only with specialist help. As a result, Telstra had to comply with the individual's access request for that metadata under the Privacy Act.
Telstra successfully challenged this decision in the Commonwealth Administrative Appeals Tribunal in December 2015. The AAT reasoned that the relevant metadata was not personal information because the metadata was not information "about" the individual at all. The AAT reasoned that, as a result, the privacy claim by the individual did not get past first base. Therefore, the AAT reasoned, it was not necessary for the AAT to consider the further question of whether the metadata could be linked with other information so as to identify the individual, The Privacy Commissioner has appealed against the AAT's decision. The appeal will be heard in the Federal Court in August 2016. Watch this space.
In any case, under the data retention law that came into effect from October 2015, specified metadata that "relates to an individual or to a communication to which the individual is a party" is taken to be personal information about the individual for the purposes of the Privacy Act. As a result, there is now a different "bright line" that requires telecommunications service providers to handle certain metadata as if it is personal information, even if it wouldn't otherwise be covered as personal information under the Privacy Act.
More generally, if a re-identification risk causes ostensibly de-identified information to be treated as "personal information" , this impairs the ability of an organisation to collect, use or disclose that information (and lead to privacy compliance costs and consumer concerns). So it would make commercial sense for an organisation to minimise re-identification risk when handling commercially valuable de-identified information.
So how do you de-identify information and minimise re-identification risk?
The Office of the Australian Information Commissioner (OAIC) in its Privacy Business Resource 4 of April 2014 provides guidance to private sector organisations on techniques that can be used to de-identify information and minimise the risks of re-identification. Corresponding guidance for Commonwealth public sector agencies is in Information Policy Agency Resource 1 of April 2014.
The OAIC notes that removing personal identifiers such as an individual's name, address and date of birth should be the first step when de-identifying information.
However, that alone may not suffice to minimise or manage the risk of information being re-identified, for the reasons we've outlined. Other techniques to minimise that risk include:
- removing or altering quasi-identifiers that are unique to an individual in a data set, such as income or profession;
- grouping information into categories to disable identification (such as age group of 25-30 rather than individual ages);
- swapping identifying information between individuals to mask the distinctiveness of certain information;
- supressing data by not releasing particular information or deleting the information from the data set; and
- limiting access to a de-identified dataset.
Other steps for an organisation to manage and minimise re-identification risk include:
- entering into contracts with their data receivers to limit the receivers' use and distribution of the data (including restraints on re-identification); and
- enforcing the terms of those contracts.
What should your organisation do?
- Identify cases where you collect, use or disclose information that may be at risk of being linked to an identifiable individual ‒ the At Risk Information.
- Assess the re-identification risk, taking into account the cost, difficulty, practicality and likelihood of re-identification of the At Risk Information.
- In some cases, you might have to commission a statistical or scientific assessment of the At Risk Information to ensure that the re-identification risk is low.
- You should reassess the risk periodically, as analytical technologies develop or more information becomes available either to you or the organisations to which you disclose the At Risk Information.
- If you receive At Risk Information, make contractual undertakings that you will not seek to re-identify the At Risk Information.
- If you disclose At Risk Information to a third party, obtain undertakings that the other party:
- will secure the At Risk Information from misuse, interference or loss, or unauthorised access, modification or disclosure; and
- will not re-identify the At Risk Information.
You might also be interested in...