One of the issues about security that I always found puzzling is the security seen as a “we” vs “us”.
Look around you: in most cases where you have to enter password, codes, pin, etc- the security is mainly to filter out the undesired.
But, once you are inside… it is assumed that you belong to.
Between 2003 and 2005, I posted online an e-zine (originally in Italian and English), and one of the issue was about considering outsourcing as a structural choice, not just a cost-control opportunity.
In July 2009 I posted a short article about the usual questions that I keep asking to SaaS (software-as-a-service) providers.
If you do not know what I am talking about: basically the idea is that, instead of buying a license, you use it when you need it, and pay accordingly.
How does it work? Well, do you have a gmail account? Do you use gdocs? You are using a kind of SaaS.
In the end, any security system is as secure as the way you use it properly- and there is not enough incentive to bother circumventing it.
Otherwise: time and/or resources will make yesterday’s super-secure system look as not-so-secure.
1. Security and data
Few months ago I said that I was going to re-develop a security framework that I had created over one decade ago and then used in online and offline applications to test keeping data confidential while using low-cost online hosting.
The idea: the way you structure and classify your information sometimes tell much more about how your organization makes decisions or how you process information than you would like others to know.
The solution: store your data online, but embed into the information the logic, while encrypting both the application and keeping part of the information needed to read or modify information outside the website.
Before discussing the technical side, I will discuss the logic.
2. Not just the data
I still remember that, in my first projects, PCs weren’t available (yes, late 1980s), and, also when became available, we were required to have a signature and authorization on each page that we were bringing outside the customers’ premises.
And it was just a segment of accounting information for business customers.
Late 1980s, add security software in the picture, and start adding PCs (pre-Windows: can you imagine it?).
Few years more, and add databases moving from the massive mainframe to widespread use on individual PCs first, then networked PCs, and finally on the web.
Welcome to a new world, were data are not necessarily located on computers under your own control, but on shared facilities that you do not even know were are physically based.
But databases are just containers: and, as on a paper directory, you need a logic to make sense of the data you put inside the database.
3. “How” tells more than “what”
Making sense of the data requires few items: the data, a structure, and a way to add more information and retrieve it- and a basic infrastructure to provide support to those items.
From the database itself, to the computers it will run on, to the computers needed to access the database, to the network facilities connecting the two set of computers, be it just the Internet, or a dedicated network.
If you are an information processing professional- sorry for oversimplifying.
But, however you call it, you just cannot dump data into a database and then expect to be able to retrieve it as if by magic.
You need to decide “how” the data is organized inside the database.
And you need to create some logic (programs) to be able to ask to the database the information when, how, and if needed- to make the right question to obtain the appropriate information answering to your question.
This “how” tells often more about the thinking and organizational patterns of whoever produces the data than the data itself.
It is as if the data were the ingredients, and the logic to store and retrieve were your recipes.
The data can change, sometimes so fast that you cannot even “watch” the data, you have to “freeze” (as in a report) the information to be able to understand it.
But your logic will change much slower than each individual bit of information that you provide.
4. Give to Caesar what belongs to Caesar
So, let’s split the roles: somebody has to manage the building blocks, the infrastructure, and somebody else has to take care of the way the information is stored and retrieved.
Of course: and somebody has to produce the information, so that somebody (not necessarily the same people) can actually retrieve the information, if and when and how needed.
While within the walls of you castle it makes sense to have just these four, and sometimes you can safely assume that those managing the information within your company will keep it confidential, would you allow every employee to access the information about how much each other employee is paid?
Or about the actual cash available in your company? Probably not.
Therefore, I keep being puzzled at how all these safeguards seems to be lost when people (and organizations) go on the Internet.
Sometimes I saw grown up people answering online to questions that they would never answer if their employer or anybody else were to ask them the same questions.
Actually, they would consider excessively intrusive most of the question.
The funny part is: this is at least information that they are willingly giving to an unknown third party who tells them explicitly (but nobody reads all those online caveats) that they will do whatever they see fit with their information.
And, interestingly, almost none of the online companies writes what will happen to your data stored with them should their company go bankrupt, or change ownership.
5. How do you relate?
We do not see the data as individual points.
To make sense, also when we talk, we always see the context.
In the database world, your context is often defined not just by how you link data- but also the sequence in which you look at information.
I think that few people would search an address by looking at, say, all the streets that have a number 55 address available, then move onto the street type, then the street name, then the town, and so on.
And the same applies to data.
Your “patterns” in accessing your information, how you relate data with other data and which data you access first tells a lot about your decision making processes.
Actually, when in late 1980s I was designing decision support models, I cared first about the results expected from the decision process, and then went back-and-forth on the available data, to finally identify which information was really needed.
Thinking in abstract how your decision process without considering the data available, or thinking about the data without applying some boundaries on your search would only produce one result: a waste of resources.
Therefore, relating each data item with other items, or collecting specific data items in “blocks” (call it tables, records, whatever your want), and then connecting these blocks together allows to frame both your data within what is meaningful in your context.
And limit what you can and cannot do with your data.
Again another example: if I collect all the bills and store by date, without keeping a separate record of the type of expense, producing a report by category is still feasible- but I need to read everytime all the bills to produce summaries by type of expense, and I need to at least to each bill an expense type.
If I leave the expense type outside my database- there is no way that I can produce a report on expense types.
To give you a real example.
In early 1990s, a customer told me that a government agency managing pension-related payroll information asked for a report that, in order to be produced, required more data than those that companies were required to legally keep.
Most companies grudgingly sent the report- anyway, they had the information, to provide to employees, when they were eventually to retire, the appropriate severance pay.
So, in most cases, instead of storing just what needed, companies kept all the payroll information.
The agency called back and admitted the truth: they had lost tapes. Could they kindly provide again the raw information that they used to produce the report?
6. Turn your data into Lego ™ bricks
My approach to store information online without giving access to the way I organized information or correlated information (the “data model”) or the reasoning path that I used to access information was at first an intellectual curiosity.
But I was appalled when I saw (for myself and partners) that some hosting companies simply “got inspiration” from their customers’ databases and applications hosted online.
So, you invest in creating your own (probably) unique way to connect a data bit with another data bit, and to retrieve what you need, describing your unique way of doing business, and any technician at your hosting company can take a walk with your way of doing business.
Not really what you were bargained for, when you accepted that it was safer to use the computers of a larger organizations, probably able to ensure that your data are accessible 24/7.
What did I do? I analysed data- and identified what made sense (to me).
By late 1990s, I had already created plenty of data models and business processes supported by data, both on paper and on databases, and therefore I decided to simply consider data as “Lego(tm) bricks”.
A brick is useful for certain purposes- but it needs interfaces, so that it can be connected to other bricks.
If you are the only one who knows what a brick contains, i.e. how data inside the brick are organized, and you have a limited number of possible way to connect bricks, anybody taking your collection of bricks and connections will be able to use it- but not to understand what the inside of each brick means to you.
And if you “obfuscate” the way you get the information, the bricks are useless without your unique knowledge on “how” bricks make sense in your business process.
I simply considered that every brick has a key (a way to find it), one or more classification items (the additional keys to find it), a “content”, and some ancillary information (who and when updated or created it).
7. Make it simple
It now gets a little bit technical.
At a later stage, I will post an example on cwcommunity.org.
For the time being, if you are into technology, I am confident that you can apply the logic without any need for an example (unless you are really lazy).
If not, you can at least transmit the logic to somebody else who can.
How did it work?
I split the logic to store and retrieve information into a “network” of mini-logic points, scattered across multiple mini-files, then processed everything through a program that “obfuscated” the program.
It sounds more complex than it really is: it is just as if you called “xyz23BAt0” what is really “read the address”, and “jklru420u029” what is really “address”.
It still works, but, as everything is scattered around, without the logic, only the stupid computer that followed the endless detour can make sense of it all.
As for the data, I used the “brick” approach described above, by simply identifying in my data model(s) how many “connectors” where needed.
Why? Because each connector was encrypted, but had its own position (“field”), and therefore it could be still retrieved fast.
I created few applications using the logic- one was a mini-community online, ComshareNoMore.com, with about 150 members, another was an application to design organizational manuals based upon the distribution of roles within a company, to ensure that any overlapping between different parts of the organization was explicit.
The second deserves a short explanation: when you design an organization chart, you assign roles within business processes to different parts of the organization- and, in most cases, you will have some processes that straddle the boundaries of two parts of the organizations.
It is all fine when you are creating it. But when you start updating… more than once I had to intervene on processes where the evolution had lost the original link between two parts of the organization, and some conflicting roles were assigned on what seemed to be a different activity, while originally was just a different aspect of the same activity.
The logic: converting an organization chart into a “network” of activities, roles, etc- and then assign the “building blocks”, shuffling when needed, and producing automatically cross-checking tables (based on the decision tables concept).
So, I had various levels of “connectors”.
By storing using the “brick” approach, I had a list of the “categories” of bricks, just seven categories, and a list of the possible “content types” inside each one of the seven categories.
What was visible online was just a series of seven categories- each data item was encrypted and decrypted using a key that was supposed to be provided online after having established a secure connection using a once-only access code.
So, also if you copies the database, it would have taken some time to maybe eventually decrypt the database.
But you will still be left with useless information, as the “content” part was actually the encoding of an XML structure that was specific for each “content type”.
Yes, if you decrypted the database, then listed all the logic, then tried to understand what everything did, you could eventually reverse-engineer.
In my case, I had my key with myself, and the encryption for the online communities (it was just a test) was based on a user identification, and then a “book encryption” using a text that I extracted from an e-book on Gutenberg.
The idea was to eventually add the possibility to add a two or multiple parties selection of a Gutenberg book as their own “team” reference.
Anyway, you could increase your security by changing encryption method- or using a different encoding approach, involving more than one of the fields.
The interesting part was when I added inside the database also some further obfuscation, by leaving part of the logic inside the database- but stored as any other data, with the number of connectors (in these case, the parameters) required to execute that part of the logic.
Finally, this logic is database-transparent.
Also if a database does not allow encrypting data, or stored procedures, or automatically keeping track of the “links” (foreign key, etc) between records, provided that it supports the standard SQL datatypes, this logic can be easily implemented.
I plan to post a full example later this Summer, time permitting, but on cwcommunity.org, as it will be really a piece of software.
Conditions for use? Well- use it if you find it useful, and just state and link the source (if you want- creative commons for attribution but without all the legal mumbo-jumbo).
The main aims? Making it useful- sharing expertise- and avoiding that some smart parasite will find a way to legally protect it without stating the source, and maybe ask (it happened before) a license fee for something that I developed and they copied