Technique Translation
The technique for solving a problem in one RDBMS doesn't always translate well into another.
I've had performance problems with similar code to that which was in the article. Thus, my comment "This technique will kill performance in Oracle".
I don't see CAST(MULTISET()) (Oracle's version of ARRAY_AGG) being used too much. As such, the code might be unmaintainable by your replacement after you get a promotion.
Translating Terms
Translation goes like this
STRUCT in BigData is TYPE in Oracle
ARRAY_AGG in BigData is CAST(MULTISET()) in Oracle
UNNEST in BigData is TABLE() in Oracle
UNNEST of XML Data is XMLTABLE() in Oracle
UNNEST of JSON Data is JSON_TABLE() in Oracle (12c+)
You can't create the structure on the fly. You have to define it ahead of time.
EXAMPLES
Creating a structure of a single row:
create type emp_t as object (
EMPNO NUMBER(4),
ENAME VARCHAR2(10) ,
JOB VARCHAR2(9) ,
MGR NUMBER(4) ,
HIREDATE DATE ,
SAL NUMBER(7,2) ,
COMM NUMBER(7,2) ,
DEPTNO NUMBER(2)
);
/
Creating a structure of a set of rows based on above: ( nested table )
create type emp_tt as table of emp_t;
/
Using CAST(MULTISET())
select d.deptno
,cast(multiset(
select * from scott.emp e where e.deptno=d.deptno
) as emp_tt) as emp_table
from dept d
;
Using TABLE()
with data as (
select d.deptno
,cast(multiset(
select * from scott.emp e where e.deptno=d.deptno
) as emp_tt) AS emp_table
from dept d
)
select b.*
from data a, table( a.emp_table ) b
order by empno;
Oracle's Solution
Business Requirement: I want the first time a hurricane reached its maximum category along with its position.
Other, inferred, identifiers to "GROUP BY" would be season, basin, subbasin in addition to hurricane name.
One method to solve this in Oracle is to use Analytics.
Since I don't have access to the huricane data, I'll have to improvise.
select deptno as hurricane_name
,2017 as season
,'NA' as basin
,'WP' as subbasin
,sal as category
,ename as position
,rownum as time
from scott.emp
First thing we do is use the analytic function RANK() to rank all of the rows by category (descending) and time (ascending) but partition the rankings by season, basin, subbasin, and hurricane_name
select h.*
,RANK() over (partition by season, basin, subbasin, huricane_name
order by category desc, time)
as rank_score
from huricane_data h
Finally, we'll pick only the best (rank_score=1)
select *
from analyized_data
where rank_score=1
order by season, basin, subbasin, hurricane_name, time
Putting it all together
with hurricane_data as (
select deptno as hurricane_name
,2017 as season
,'NA' as basin
,'WP' as subbasin
,sal as category
,ename as position
,rownum as time
from scott.emp
), analyized_data as (
select h.*
,RANK() over (partition by season, basin, subbasin, hurricane_name
order by category desc, time) rank_score
from hurricane_data h
)
select *
from analyized_data
where rank_score=1
-- place season/basin/subbasin filters here
order by season, basin, subbasin, hurricane_name, time;