What are exact benefits of using R services on Enterprise vs Standard?

Question

I am currently using SQL Server 2016 SP standard edition and want to install R services. In the documentation is said that a stand alone installation is allowed only on Enterprise (Machine Learning Server (Standalone)).

Also, I found a post, where the guy is talking that on standard I am not able to use parallel operations and there are memory limitations:

Could anyone tell what are the exact memory/processors limitations?

I need such details as going to use heavy R calculations and the tests for now show that the implementation in the standard edition is not working as I hoped.

score 6 · Accepted Answer · answered Jul 25 '18 at 14:44

Check out the "Feature availability across editions of SQL Server Machine Learning Services" page of Books Online:

The difference between basic and advanced integration is scale. Advanced integration can use all available cores for parallel processing of data sets at any size your computer can accommodate. Basic integration is limited to 2 cores and to data sets fitting in memory.

Books Online goes into more details on that page about deployment restrictions as well like:

Only a (Standalone) server offers the operationalization features that are included in a Microsoft (non-SQL-branded) R Server or Machine Learning Server installation. Operationalization includes web service deployment and hosting capabilities.

For an (In-Database) installation, the equivalent approach to operationalizing solutions is leveraging the capabilities of the database engine, when you convert code to a function that can run in a stored procedure.

score 4 · Answer 2 · answered Jul 25 '18 at 19:26

As Brent has quoted there, one of the key differences is scale.

If your calculation is embarrassingly parallel, i.e. each row can be computed independently of one another, then using sp_execute_external_script with @parallel = 1 will allow you run your R calculations in parallel easily. Note that it is still possible to bite off more than you can chew and run out of memory, in which case you can add @params = N'@r_rowsPerRead INT', @r_rowsPerRead = N where N is the maximum amount of rows you would want to read in one go.

Using the above I have been able to run an R calculation against 2 billion rows of data in less than 10 minutes (obviously your mileage may vary).

If you need more control of the batching, such as ensuring appropriate rows are grouped together then I think you can achieve this by calling the RevoScaleR functions from within your R script directly, but I have not toyed with this.

If you absolutely have to use Standard Edition then you can get creative with batching up the data yourself, but when I tried this I found the code would get quite complex and was also not as quick as taking advantage of Enterprise Edition streaming.

More info here: https://learn.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-execute-external-script-transact-sql?view=sql-server-2017

Note SSDT does not support GRANT EXECUTE ANY EXTERNAL SCRIPT so if you are using this for deployments then things become painful. I've raised this as an issue on user voice https://feedback.azure.com/forums/908035-sql-server/suggestions/32896864-grant-execute-any-external-script-not-recognised-b .

What are exact benefits of using R services on Enterprise vs Standard?

2 Answers2