2

I need to make some get/post requests to a website that I have credential to log in. I plan to do this with Ruby and Net::Http. As I'm new to this kind of experience, I'm struggling with the fact that the log-in page requests robot verification (check-box kind) - that means I'm not able to automate the log-in phase. Besides that, the server keeps alive for some time until it verify that no active has been made, after that it request the log-in page again. The website is build with PHP and JS (most of it is JS) and it requires that the user enter with a "restrict-area" browser's mode after the log in phase.

It would be no problem I manually log in and execute an operation (few requests) for every time I need it. But I don't know how I could pass credential information from the browser, as session id, to my script. I need some concepts ideas about this.

Additional information:

  • There is no public API.
  • The "restrict-area" browser's mode is a browser without some buttons (forward and backward in history pages) and it don't permit to change the URL - that is all I know.
  • I need this for automating some manually tasks that take hours to do.
  • The website uses Ajax.

If additional information is needed I can add it, just ask in the comments.

Thanks in advance!

EDIT

My intension isn't to crawl random websites, but how to make specifics HTTP request in a specific website where is necessary credentials to do so.

1 Answers1

2

For JS-intensive websites, it might be much more convenient to use a "headless-browser" approach, such as capybara-webkit gem, which basically allows automation on top of a popular browser engine used in Chrome, Safari, Opera, etc. I'm not sure if it's good enough to cheat the robot verification (leaving moral aspect aside), but at least it beats Net::Http in cases like getting Google search results.

Also, have a look at PhantomJS which is a JS browser automation (as capybara-webkit is a Ruby browser automation), which gives an additional convenience of working with in-page elements in the same language which controls the browser.