4

I run on a daily basis a set of VBA-rich Excel files. Most of them include MS Office application cross-talk, but also employ third-party applications and MySQL. Due to the fact of running those files in a specific order and at a specific time of day I set up an xlsm-based scheduler to run those files and control outputs. Since my 'application' is growing I'm facing a problem of resource usage and typical garbage collector errors in VBA (not mentioning lack of active VBA support from MS, poor IDE and debugging etc) therefore I'm looking for other ways of automation.

Processes I'm running are typical ETL from 3rd party sources and CRMs & RDBMSs, integration of the data to edible by non-DataScience users and tools (must cover Excel 64- and 32-bit architecture). Most of it runs along data validation after imports, type testing before further analysis, sometimes picture2data replacement, applying business logic to data, pushing outlook notifications, SharePoint data I/O etc.

My question is whether PowerShell-based scheduler running those VBA and VB scripts may be more effective when it comes to memory usage or it will only as effective as the VBA/VB code is?

It is worth noting that due to the given policy I'm strongly discouraged using excellent R or Python libraries for working with data, so I need to stick with MS tools.

Looking forward for any advice, hint or even a loose tip on the topic.

Oskar_U
  • 151

1 Answers1

3

Powershell is based on the .NET framework infrastructure and runtime, which is pretty efficient and to my own experience often more effective in memory management and CPU utilization than the old COM-based VBA infrastructure. To be precise - it bears more potential of being more effective.

Said that, we cannot tell you if a non-existant PowerShell-based scheduler will be more effective than your current solution, since this would require a crystal ball - we do not know your old solution, and even you do not know your new solution. Writing very inefficient programs is possible in any language and environment, Powershell is no exception from this.

Moreover, it looks a little bit strange to me that in the kind of process you described the scheduler is the actual bottleneck. Usually, I would expect the data processing tasks to be the ones which require most of the performance and memory, whilst the scheduler should be a relatively lightweight program which only orchestrates the other processes.

So if your current scheduler has issues, maybe you should look for the root cause first, which is probably not that it is written in VBA?

Doc Brown
  • 218,378