-4

Let's assume I have access to two rest endpoints, /persons and /products. If I need to retrieve 100K persons with their respective products, what is the best way to do that?

How can I join them efficiently ? Implementing temporary table in DB, joining them in memory? For example, If I do it in memory, then I found myself doing RDBMS work, when the join condition is only on some ID key it can be acceptable, but if they are more complex join conditions using web services, in this case, seems to be very inefficient.

Join with WS:

  1. HTTP GET 100K persons in JSON format
  2. HTTP GET all the products filtered by the person ID
  3. Create some array structure of products indexed with the person ID
  4. Loop over persons then adding into person their products when person.id = products.person_id
  5. Use the data

Join with some table dump:

  1. FTP get persons.sql
  2. FTP get (linked) products.sql
  3. Load them into temporary table in DB (doing the same with some JSON or XML need transformations)
  4. Run SQL query
  5. Use the data

So in the WS simple scenario, the disadvantages are the massive use of memory and the complexity of doing the RDBMS (join not even talking about some other case like ordering grouping if needed) work and in this case, there are only two entities.

(The idea behind this reflexion is to study the possibility to completely remove EAI in enterprise for SOA/ESB architecture)

Laiv
  • 14,990
Michael
  • 101

1 Answers1

1

So 100k Person each with many Product to be brought down over a web service seems like a lot. But you have to remember that webservices are asynchronous.

If the data is held in a NoSQL scaleable way then the task is amenable to a different approach than the standard query the db and loop through one.

We can split the task up amongst many async workers, each maybe taking a single Person and getting a list of Product, applying chunk of business logic and writing back to some aggregate.

Sure its not going to be as quick an running an SQL query joining the tables directly... at 100k rows. But when you get to 100 million rows it might be faster. or at least run to completion without locking up the rest of your application.

Ewan
  • 83,178