Saturday, 28 February 2009

PDF documents and metadata - some examples

Before I do a deeper dive into what metadata a PDF document contains, let's take a look at what must have been the main headline hitting example in 2008 of sensitive information being discovered within PDF metadata.

I am referring to the situation Google found themselves in with a submission they made, supposedly anonymously, to the Australian Competition and Consumer Commission regarding eBay and their proposal to force their users to use PayPal. After speculation on many blogs about the author of the anonymous submission one Dave Bromage took a look at the metadata in the PDF document and let the world know who it was. Despite the submission being replaced with a new version without the revealing metadata the word was out. I won’t comment on the reasons why this was at least embarrassing to Google (this is one report that gives the details as well as showing the metadata contents), but will add that there was an additional chuckle in the techie community that the metadata also showed that the document had not been created using Google’s own word processing app, one being The Register. My main comment is that this unintentional leakage of information involved a regulator as well as embarrassment at the very least to the originator (author and company).


The submission also had masked what would have been visible text about the submitter within the document. However the PDF did not have any security applied to it so it was very easy to copy that area of the document and paste it into another text processor to see the underlying information. Facebook/ConnectU have just this month fallen foul for the same reason. Numerous other examples in this area, GE and the US Justice Department being a couple of examples from 2008. If you want to mask visible text at the very least add security settings to the PDFs that you generate to disallow copying and pasting of text. Also look at redacting software which fully removes and masks text whilst retaining the layout in the PDF document.

I am sure it is pure coincidence that one of the other headlines in 2008 around information garnered from PDF metadata also involved Google, but from the other side of the fence. As reported here metadata in a PDF version of a lobbying letter from the Corn Farmers to Congress linked, albeit tentatively, the author back to some of Google’s political adversaries.

The lesson from these examples is that you should not assume that converting and sending/publishing a PDF removes metadata that could contain sensitive information.


Friday, 31 October 2008

It might have been quiet on this blog for a while but elsewhere...

I know, I know, it has been a long while since I last posted to this blog! Thank you to all of you who have been checking in regularly.

It has been a busy six months both in terms of data loss instances and also for 3BView. In the case of the latter we have gained great new customers and partners in the intervening time ... you'll be able to find out more about some of them on our website - a new improved version of which is going live next week.

On the former: well watch this space. Many things to blog about, and I will be doing just that over the coming weeks.

Tuesday, 18 March 2008

Good eWeek article on DLP

EWeek has an interesting article comparing Database Activity Monitoring (DAM) with Data Leak Prevention (DLP).

In the article, Paul Proctor, a Gartner analyst who’s tracked this area for a while, says: “"Most every security monitoring technology would benefit from DLP content awareness, which is the ability to recognize sensitive content on the fly.” Yep, I’d agree with that.

Thursday, 28 February 2008

California Bar Journal reviews legal metadata position

The California Bar Journal, in this article, presents an excellent round-up of the problems for lawyers, including the myth that PDF documents are safe from metadata leaks, and the latest legal position in the US. Worth reading.

Monday, 18 February 2008

Eli Lilly’s lawyers accidentally emails confidential info to New York Times

We’ve been here before, but this is a corker. All the pieces of a classic ILP mistake: the $1bn lawsuit, the external law firm accidentally emailing confidential information to the wrong person, and the fact that the wrong person happened to be a New York Times reporter. Oops.

Law firms, get yourself some ILP tools now, before it’s you!

Wednesday, 30 January 2008

Scottish council caught out by tracked changes

It’s that old classic: sending out a Word document with information you really, really don’t want to reveal left in tracked changes.

This time the metadata culprit is Aberdeenshire County Council, which managed to send out a report on waste management, containing incriminating details of problems in tracked changes that hadn’t made it into the final report.

Even worse than the information revealed is the inference that the council had covered up the information it didn’t like on the problems – and the press has certainly taken this line.

Saturday, 19 January 2008

That Jeremy Clarkson story

I know I’m coming a little late to this story and there’s been a lot of debate about it. In case you’ve not read about this: the UK TV presenter Jeremy Clarkson published his bank details in a newspaper column, in which he claimed the furore about lost personal details from the HRMC was a fuss about nothing. Of course, a kind soul promptly used the details to set up a direct debit payment from Clarkson’s account to a charity.

On reflection, you could argue that in fact the system works – the UK’s direct debit scheme provides safeguards to protect the consumer, and to refund any disputed money. In this kind of situation, no doubt Clarkson is covered financially.

But you could imagine a consumer being less than happy if, say, the money taken out of their account meant they went overdrawn, other payments bounced, and they then had to sort out the unholy mess.

And Clarkson himself says he only discovers the loss when he read his bank statement – how many people do that every month? And would they notice the loss if it was £50 not £500?

For me, it does highlight two important issues: firstly, the context in which personal data is used is important. As many commentators have said, Clarkson only divulged information that we give to anyone whenever we give them a cheque. But, he did so in a highly public way. “Security by obscurity” has long been a facet of protecting data, and shouldn’t be forgotten when risk is being assessed.

The second key point is that it’s much, much easier to not leak data in the first place, than to deal with the consequences even if there is no nominal financial risk. As I mentioned, the UK’s banks guarantee to refund any money that a consumer loses due to a mistake with a direct debit. In practice, I imagine it’s still a difficult process to go through, and can cause much inconvenience. It’s the same with any company’s data – you might theoretically not have any negative consequences of a leak, but managing the process when information goes missing can be time-consuming and costly.