Earlier this week I blogged about the growing evidence of governments opening up their public data at both a national and local level. While this in itself represents a great leap forward it brings with it a new set of challenges the we will need to address. One in particular stands out and it is around the evolution of some of the very real challenges we’re going to face around Privacy in a Web/Gov 2.0 world.
Earlier this month I was chatting to Stuart Aston (one of our security advisors – you know the type, smarter than your average bear and very switched on to the evolution of the security principles we will face in an increasingly connected world) and he introduced me to the concept of “Differential Privacy“. He left me with a few white papers and a smile and a few hours later, with my head pounding and eyes bleeding (trust me you want to try and read this stuff) I finally got my head around the concept and what it’s going to mean to us as citizens.
Differential privacy is essentially, the ability to make very specific conclusions (with incredible accuracy) about the identity of an individual when provided with two disparate sets of anonymised data on a similar topic.
The example given uses NetFlix’s recent competition to improve their recommendation system as the backdrop…
NetFlix published an anonymised data set of around 500,000 records in order to help developers come up with a solution to improve their recommendation system. Some bright sparks took this data and a similar export from the IMDB and by applying some fairly hairy maths, they were able to identify specific individuals with a shocking 96% accuracy rate.
This is mind blowing, not just because of the maths involved, but because of what it means in a world of growing public data, the old bastions of Privacy that we have relied upon thus far may no longer be enough.
Governments and organisations are going to need to take this seriously as it will present some difficult challenges about liability and the duty of care to keep their citizens/customers identity and data private.
In particular, think about the duty of care element. As an organisation, you have a legal requirement to look after the privacy of the data you hold on an individual or organisation – with differential privacy, how far does this duty of care extend? If you keep your data anonymised but others can compromise that privacy (albeit with hairy maths and more public data) who is actually liable or legally responsible for the breach?
There are some tough answers to be found here and undoubtedly some more legislation will be required – in the meantime though, it’s a concept we need to understand more so we can build appropriate responses that don’t restrict the overall movement towards making public data more readily accessible . We cannot afford to let this (and other similar issues) stop the democratisation of data, but we do need to go into this with our eyes open.