I'm new here so I'm not entirely sure what tags to use so please notify/edit if this should be changed, thanks!
Background
I have a graph with ~4 billion nodes and ~1 trillion edges that I want to store in a database (like sqlite) as in this way data can be inserted on the hard disk rather than in RAM as many other graph data structures require.
Data format
This graph format looks like:
accession node position orientation
23101.1 1 1 plus
23101.1 100 2 plus
...
23101.1 100 1 min
...
~1trillion
The plan
I think of creating TABLES for accession, node, and orientation that I will refer to using foreign keys. So then the "main" table will be
accession node position orientation
<FK1> <FK2> 1 <FK3>
(P.s. not sure if I should do if for the position too, but this will simply be 1 till graph-path-length).
The question
I only have little knowledge of SQL but based on code like this I would have to execute something like the code below for each line in the graph file:
INSERT INTO main (accession, node, position, orientation) VALUES
( SELECT id from accessions WHERE accession=23101.1,
SELECT id from nodes WHERE node=1,
1,
1
);
and also catch an error when the node/accession does not exist and insert it. However, I wonder whether obtaining the foreign keys with the SELECT .. WHERE will not get really slow when they are billions of rows in the table? So overall, what would be a proper way to store this information in the database (if at all)?