1

I am writing an application that will crawl web pages and analyze data, and I'm wondering if I can host this as a cloud-based service.

The main reason I'm looking at cloud hosting is that users may process thousands of pages at once, and I need to return data as quickly as possible. The user will have software installed on their computer that will communicate with this application.

If, for example, each page takes 10 seconds to process, is it possible to set up auto scaling so that a user can process 1, 10 or 10,000 pages in the same amount of time (ie: 10 seconds) by simply adding and removing instances as needed? Note: Each process requires little RAM/CPU resources, so it will be possible to process multiple pages in a single instance but for this example, I'll simplified it to 1 instance per process.

If so, which cloud-based service is best suited for this task?

1 Answers1

3

The big difference you are going to find between the platforms is how the vendor abstracted the environment. It will determine how you are going to develop your application or services. You can think of that as the API (of sorts for the application). The servies that you use within each vendor's context is up to you.

AWS doesn't really abstract much at the platform level. You basically get a machine where you install your software. It is up to you to write all of the supporting code. The services include storage, etc.

Azure provides a framework for you to run your code as a background or web service. It takes over and scales out from that perspective. You do not need to create the supporting os software to run your application. The services include storage, etc.

I haven not personally used GAE (yes to AWS and Azure) but it look like Google provides another framework and does not allow AWS like machine level access. Their environment appears to be more like the one hosted by Azure. They do have an interesting URLFetch service but include standard storage services as well.

If so, which cloud-based service is best suited for this task?

That would depend on the level control you would like to have? AWS provides the most control but would require more work. Azure and GAE provide more infrastructure that comes with limitations such as development languages. Azure is .NET while GAE has Java, Python.

I would suggest thinking about designing the application first and then choosing an appropriate language for development. That should give you a direction. If you are going to mix and match toolsets then it appears that AWS would be the way to go.

Ken Brittain
  • 640
  • 3
  • 4