Today I was unfortunate to discover that one of the drives in my FreeNAS box failed. I replaced the drive and wanted to watch the progress of the rebuild. If you log into the FreeNAS web management console there is a section that shows you the number of sectors synchronized and the percent complete. But that’s only useful if you stare at it. I want to know if it’s locked up which would require grabbing this value and if it doesn’t change after a certain period, send an email alert.
But before I can do any of that, I need to start with the basics and figure out how to pull the actual HTML from the website so I can parse it and do interesting things like that from there.
The code below has the following capabilities:
- Is able to programatically authenticate against any PHP based (and possibly other) authentication mechanisms
- Connects to a specific URL and pulls down all of the raw HTML for that page into a variable for further manipulation
This is certainly a handy snippet to keep in your back pocket!
# This is the URL that when visited with a web browser contains the username and password fields to fill in
$LoginURL = "http://yourwebsite.com/login.php"
# This is the URL of the page you actually want to pull content from but if accessed directly will normally just redirect you to the login page above
$ContentURL = "http://yourwebsite.com/someothercontentthatfirstrequiresauthentication.php"
# The username and password used to authenticate with the site above
$Username = "hero"
$Password = "superman"
# Create a new object that pulls the HTML data from the login page including the username and password fields
$website = Invoke-WebRequest -Uri $LoginURL
# Note the "username" and "password" attributes specified here may have a different name.
# Verify by checking the contents of $website.Forms.fields
$website.Forms.Fields.username = $Username
$website.Forms.Fields.password = $Password
# Connect to the login URL and send the login credentials you created as POST and save the resulting session
Invoke-WebRequest "$LoginURL" -SessionVariable WebSession -Body $result.Forms -Method Post | Out-Null
# Now that we're authenticated, connect to the actual URL you want and pass in the session object you created above
$data = Invoke-WebRequest -Uri $ContentURL -WebSession $WebSession
# There is a ton of other metadata that is returned that you most likely don't care about.
#If you just want the raw HTML to pull some specified content, try using the "outerhtml" property as shown below
$HTMLOutput = $data | select -ExpandProperty Parsedhtml | select -ExpandProperty IHTMLDocument3_documentElement | select -expandproperty outerhtml
# Display the results to the screen. This will be the raw HTML returned by the site. You can now do whatever you'd like with it.