Big data can be very useful but we are understandably nervous about how that big data is collected and shared. For it to be useful it has to be derived from real people in a real world, but to be shared it has to be transformed into generalities that leave the identity of the real people behind. If the data about the individuals goes with the information that is abstracted, there is a breach of privacy.
Clever people (known as adversaries) can work their way through the statistics and find a path back to information about individuals. The way to prevent this is to introduce slight errors, or ‘noise’ into the data, not so great that the overall statistics extracted from the data are altered markedly, but great enough to upset the maths and make it impossible to manipulate data in order to move from generalities to particulars. Privacy is obtained by not allowing people to narrow down a statistical differential, hence differential privacy.
Often the information gleaned about people is not particularly startling, although useful to marketers. The fact that a large percentage of the community like chocolate and a small percentage don’t is neither here not there, but arriving at the knowledge that someone has particular disease is a more significant breach of privacy.