The following is an example showing how to query multiple nested arrays in JSON data in a PostgreSQL database.
Given
CREATE TABLE sample_json (
id serial PRIMARY KEY,
name varchar(64),
json_data json
);
INSERT INTO sample_json (name, json_data)
VALUES
(
'NA',
'
{
"location": "US",
"topLevelArray": [
{
"id": 1,
"secondLevelArray": [
{
"key": "someKey",
"operator": "=",
"value": 10
},
{
"key": "foo",
"operator": ">=",
"value": 5
},
{
"key": "someOtherKey",
"operator": ">",
"value": 647
}
]
},
{
"id": 2,
"secondLevelArray": [
{
"key": "blah",
"operator": "<",
"value": 7
}
]
}
]
}
'
),
(
'EU',
'
{
"location": "poland",
"topLevelArray": [
{
"id": 2,
"secondLevelArray": [
{
"key": "bar",
"operator": "<",
"value": 10
},
{
"key": "moo",
"operator": ">=",
"value": 16
},
{
"key": "baz",
"operator": "!=",
"value": 9
}
]
}
]
}
'
)
;
With the aforementioned data, let’s say we want to know the id of the rows that have an object in the secondLevelArray
with operator
equal to >=
and the value of the key
field.
The concept to understand to be able to search for all of the key
s in the secondLevelArray
where the operator is >=
is the lateral join. The TLDR; is that a subquery appearing in the FROM
clause can reference columns provided by preceding items. Or, you can write clauses in the FROM clause that read from the result of previous FROM clauses.
Let’s run some queries and go through them, line-by-line. First we will just select everything in the table.
SELECT * FROM sample_json;
|id |name|json_data |
|---|----|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|1 |NA |"
{
\"location\": \"US\",
\"topLevelArray\": [
{
\"id\": 1,
\"secondLevelArray\": [
{
\"key\": \"someKey\",
\"operator\": \"=\",
\"value\": 10
},
{
\"key\": \"foo\",
\"operator\": \">=\",
\"value\": 5
},
{
\"key\": \"someOtherKey\",
\"operator\": \">\",
\"value\": 647
}
]
},
{
\"id\": 2,
\"secondLevelArray\": [
{
\"key\": \"blah\",
\"operator\": \"<\",
\"value\": 7
}
]
}
]
}
"|
|2 |EU |"
{
\"location\": \"poland\",
\"topLevelArray\": [
{
\"id\": 2,
\"secondLevelArray\": [
{
\"key\": \"bar\",
\"operator\": \"<\",
\"value\": 10
},
{
\"key\": \"moo\",
\"operator\": \">=\",
\"value\": 16
},
{
\"key\": \"baz\",
\"operator\": \"!=\",
\"value\": 9
}
]
}
]
}
" |
As expected, we just get back everything.
Now, let’s start to drill down into the JSON object. First we will select the row id and just the data from the topLevelArray
.
SELECT
sj.id
topLevelArray
FROM
sample_json sj,
json_array_elements(json_data -> 'topLevelArray') topLevelArray
;
|id |toplevelarray |
|---|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|1 |"{
\"id\": 1,
\"secondLevelArray\": [
{
\"key\": \"someKey\",
\"operator\": \"=\",
\"value\": 10
},
{
\"key\": \"foo\",
\"operator\": \">=\",
\"value\": 5
},
{
\"key\": \"someOtherKey\",
\"operator\": \">\",
\"value\": 647
}
]
}"|
|1 |"{
\"id\": 2,
\"secondLevelArray\": [
{
\"key\": \"blah\",
\"operator\": \"<\",
\"value\": 7
}
]
}" |
|2 |"{
\"id\": 2,
\"secondLevelArray\": [
{
\"key\": \"bar\",
\"operator\": \"<\",
\"value\": 10
},
{
\"key\": \"moo\",
\"operator\": \">=\",
\"value\": 16
},
{
\"key\": \"baz\",
\"operator\": \"!=\",
\"value\": 9
}
]
}" |
The “columns” selected are the id
and topLevelArray
. The id
is straightforward. The topLevelArray
is a lateral join clause statement and we use the json_array_elements()
function to select the contents of the json_data.topLevelArray
key.
We can continue to traverse deeper into the JSON object with another lateral join clause that accesses the next nested array from the topLevelArray
key.
SELECT
sj.id,
secondLevelElements
FROM
sample_json sj,
json_array_elements(json_data -> 'topLevelArray') topLevelArray,
json_array_elements(topLevelArray -> 'secondLevelArray') secondLevelElements
;
|id |secondlevelelements |
|---|----------------------------------------------------------------------------------------------------------------------------------------------------|
|1 |"{
\"key\": \"someKey\",
\"operator\": \"=\",
\"value\": 10
}" |
|1 |"{
\"key\": \"foo\",
\"operator\": \">=\",
\"value\": 5
}" |
|1 |"{
\"key\": \"someOtherKey\",
\"operator\": \">\",
\"value\": 647
}"|
|1 |"{
\"key\": \"blah\",
\"operator\": \"<\",
\"value\": 7
}" |
|2 |"{
\"key\": \"bar\",
\"operator\": \"<\",
\"value\": 10
}" |
|2 |"{
\"key\": \"moo\",
\"operator\": \">=\",
\"value\": 16
}" |
|2 |"{
\"key\": \"baz\",
\"operator\": \"!=\",
\"value\": 9
}" |
We add an additional FROM clause. This one referencing the result of the previous FROM clause.
json_array_elements(json_data -> 'topLevelArray') topLevelArray,
json_array_elements(topLevelArray -> 'secondLevelArray') secondLevelElements
Now we have access to the elements in the secondLevelArray
and we can add a WHERE
clause to select only what we want from that nested array.
select
sj.id,
secondLevelElements ->> 'key'
from
sample_json sj,
json_array_elements(json_data -> 'topLevelArray') topLevelArray,
json_array_elements(topLevelArray -> 'secondLevelArray') secondLevelElements
where
secondLevelElements ->> 'operator' = '>='
;
|id |?column?|
|---|--------|
|1 |foo |
|2 |moo |
The result being the row id, and the value of the key
field in inner most nested object.