-3

I have an SQL database in Azure that is ~4.5 TB / 150 million rows worth of data. Users will need to query this database for large sets of data. Right now, I am having users submit their requests through a web application hosted on an Azure VM.

How can I serve the rows pulled from the Azure SQL database directly to the user? It seems slow for the server to have to stream the data from Azure SQL, and then for the client to have to stream that from the VM.

I wish the user could directly query the Azure SQL db from the client side, but the problem is that the user's request has to be preprocessed and manipulated against a reference set of over a gigabyte of data; the request to the SQL db needs to happen server-side. Any help appreciated!

1 Answers1

1

No, you should NOT expose the database publicly. This is very difficult to do safely, and is difficult to integrate directly with a HTTP-based API.

So what you're doing – having a web app backend that sits between the user and the database – is the typical architecture. This can deliver perfectly acceptable performance, in particular if the backend server can scale horizontally. But how performant is this will be more of a function of the networking hardware and network architecture than of the software architecture. And in most cases, the user's internet connection is the limiting factor, not your server.

What you can do is consider whether a SQL database has to store all of this data. If you have columns that contain large blobs (multiple MB) then you might want to consider storing the data externally. Blob storage / object storage such as S3-like storage could be more appropriate. Since every object has an URL, the user can fetch it directly without having to go through your backend. Access controls can be implemented with signed URLs that are only valid for a finite time. Your backend would then get a list of object IDs from the SQL DB, and generate signed URLs for the user.

Note that cloud egress charges can make such a design much more expensive than doing extensive processing and filtering within your backend. Sometimes the user will truly need access to the full data, but often they just need a summary or report of the data.

In this scenario it seems that you merely have a lot of data, not large blobs (average row size only about 30kB). So moving parts of the DB to external storage is likely not feasible, though it could be possible for the reference data set.

amon
  • 135,795