The Shape of the Data

data model

The Offfshore Leaks Database contains in a structured way the contents of the more than 3 million database files inside the Panama Papers leak and the client databases from the Offshore Leaks investigation.

The Offshore Leaks Database was imported into Neo4j to be used by journalists and researchers to take advantage of the connections in the data. To the left is the basic "property graph" data model. Each data record is called a "node" representing an entity, intermediary, officer or address. They're connected to form a "graph" that reveals a complex web of relationships. To the left you can see a simplified diagram how the nodes connect to each other.

These are the types of nodes that you will encounter in the data:

  • Entity - The offshore legal entity. This could be a company, trust, foundation, or other legal entity created in a low-tax jurisdiction.

  • Officer - A person or company who plays a role in an offshore entity, such as beneficiary, director, or shareholder. The relationships shown in the diagram are just a sample of all the existing ones.

  • Intermediary - A go-between for someone seeking an offshore corporation and an offshore service provider — usually a law-firm or a middleman that asks an offshore service provider to create an offshore firm.

  • Address - The registered address as it appears in the original databases obtained by ICIJ.

Cypher Introduction

Graph Patterns

Neo4j’s query language, Cypher, is centered around graph patterns which represents entities with parentheses, for example,
(e:Entity) and connections with arrows, for example -[:INTERMEDIARY_OF]->.

:Entity and :INTERMEDIARY_OF are the types of the entity and the connection, respectively.

Here is an example pattern: (:Intermediary)-[:INTERMEDIARY_OF]->(:Entity). These patterns may be found with the MATCH clause.

Other Clauses

The following clauses may follow a MATCH clause. They work with the properties stored at the nodes and relationships found in the graph matching that pattern.

filter

WHERE intermediary.name CONTAINS 'MOSSACK'

aggregate

WITH e.jurisdiction AS country, COUNT(*) AS frequency

return

RETURN country, frequency

order

ORDER BY frequency DESC

limit

LIMIT 20;

Jurisdiction distribution of Mossack Fonseca Clients
MATCH (intermediary:Intermediary)-[:INTERMEDIARY_OF]->(e:Entity)
WHERE intermediary.name CONTAINS 'MOSSACK'
RETURN e.jurisdiction AS country, COUNT(*) AS frequency
ORDER BY frequency DESC LIMIT 20;

Click on the block to put the query in the topmost window on the query editor. Hit the triangular button or press Ctrl+Enter to run it and see the resulting visualization.

data model

Nodes

data model

Nodes are the entities in the graph. These are the types of nodes that you will encounter in the Panama Papers data:

  • Entity - The offshore legal entity. This could be a entity, trust, foundation, or other legal entity.

  • Officer - Either the beneficiary, director, or shareholder of the offshore legal entity.

  • Intermediary - A go-between for someone seeking an offshore corporation and an offshore service provider — usually a law-firm or a middleman that asks an offshore service provider to create an offshore firm for a intermediary.

  • Address - The registered address according to the information on file.

Relationships

data model
data model

Relationships connect the nodes in the graph.

The following relationships appear in the data model:

  • (:Intermediary)-[:INTERMEDIARY_OF]->(:Entity)

  • (:Officer|Intermediary)-[:UNDERLYING]->(:Intermediary|Officer)

  • (:Intermediary|Officer|Entity)-[:REGISTERED_ADDRESS]->(:Address)

  • (:Officer|Intermediary)-[:OFFICER_OF]->(:Entity)

All relationship types in the graph
MATCH (n)-[r]->(m) 
RETURN labels(n) AS fromLabel,type(r) AS relType, collect(distinct labels(m)) AS toLabels, count(*) AS frequency 
ORDER BY frequency DESC;

We will next walk through each node and see what properties are available for each.

Intermediary

data model
data model

Each Intermediary node represents a go-between for someone seeking an offshore corporation and an offshore service provider — usually a law-firm or a middleman that asks an offshore service provider to create an offshore firm for a intermediary.

Properties

Each Intermediary node has the following properties:

  • name - The name of the intermediary.

  • address - The address of the intermediary.

  • sourceID - Offshore Leaks or Panama Papers depending on the data’s source

  • status

  • valid_until

  • country_codes,countries

MATCH (i:Intermediary) RETURN i LIMIT 25;

Relationships

  • (:Intermediary)-[:INTERMEDIARY_OF]->(:Entity)

MATCH p=(:Intermediary)-[:INTERMEDIARY_OF]->(:Entity) RETURN p LIMIT 25;
  • (:Intermediary|Officer|Entity)-[:REGISTERED_ADDRESS]->(:Address)

MATCH p=(:Intermediary)-[:REGISTERED_ADDRESS]->(:Address) RETURN p LIMIT 25;
  • (:Officer|Intermediary)-[:OFFICER_OF]->(:Entity)

MATCH p=(:Intermediary)-[:OFFICER_OF]->(:Entity) RETURN p LIMIT 25;

Entity

data model
data model

Each Entity is a company, trust or fund created in a low-tax, offshore jurisdiction by an agent.

Properties

Each Entity node has the following properties:

  • name - The name of the legal entity.

  • sourceID - Offshore Leaks or Panama Papers depending on the data’s source

  • address - This field includes the registered address connected to the entity only when the address is the same as the one of the intermediary. Otherwise, the registered address information is stored in the address node connected to this Entity node through a REGISTERED_ADDRESS relationship.

  • former_name, original_name

  • company_type

  • status

  • incorporation_date, inactivation_date, struck_off_date, dorm_date - dates for events in the company's development

  • service_provider

  • ibcRUC

  • valid_until

  • jurisdiction, jurisdiction_description

  • country_codes, countries

MATCH (e:Entity) RETURN e LIMIT 25;

Relationships

  • (:Intermediary)-[:INTERMEDIARY_OF]->(:Entity) - The relationship showing the intermediary that oversaw the creation of the entity.

MATCH p=(:Intermediary)-[:INTERMEDIARY_OF]->(:Entity) RETURN p LIMIT 25;
  • (:Intermediary|Officer|Entity)-[:REGISTERED_ADDRESS]->(:Address) - The registered address of the entity.

MATCH p=(:Entity)-[:REGISTERED_ADDRESS]->(:Address) RETURN p LIMIT 25;
  • (:Entity)-[:RELATED_ENTITY]->(:Entity) - Entities that in the leaked documents were connected to each other.

MATCH p=(:Entity)-[:RELATED_ENTITY]->(:Entity) RETURN p LIMIT 25;

Officer

data model
data model

Each Officer node represents the beneficiary, director, or shareholder of the offshore legal entity, such as a beneficiary, shareholder, or director, etc.

Properties

Officer nodes have the following properties:

  • name

  • valid_until

  • sourceID - Offshore Leaks or Panama Papers depending on the data’s source

  • country_codes, countries

MATCH (o:Officer) RETURN o LIMIT 25;

Relationships

  • (:Intermediary|Officer|Entity)-[:REGISTERED_ADDRESS]->(:Address)

MATCH p=(:Officer)-[:REGISTERED_ADDRESS]->(:Address) RETURN p LIMIT 25;
  • (:Officer)-[:OFFICER_OF]->(:Entity)

MATCH p=(:Officer)-[:OFFICER_OF]->(:Entity) RETURN p LIMIT 25;
  • (:Officer|Intermediary)-[:UNDERLYING]->(:Intermediary|Officer) These are relationships such as NOMINEE_DIRECTOR_OF, representing people that are acting as nominees of others. This applies to all relationship types beginning with NOMINEE_

MATCH p=(o:Officer)-[:NOMINEE_DIRECTOR_OF]->(:Intermediary) RETURN p LIMIT 25;

Address

data model
data model

The Address node represents the address as found on file for the Intermediary, Officer, or Entity.

Properties

Address nodes have the following properties:

  • address - the address as it appears in the records

  • sourceID - Offshore Leaks or Panama Papers depending on the data’s source

  • valid_until

  • country_codes, countries

MATCH (a:Address) RETURN a LIMIT 25;

Relationships

  • (:Intermediary|Officer|Entity)-[:REGISTERED_ADDRESS]->(:Address)

MATCH p=()-[:REGISTERED_ADDRESS]->(:Address) RETURN p LIMIT 25;