Today I was unfortunate to discover that one of the drives in my FreeNAS box failed. I replaced the drive and wanted to watch the progress of the rebuild. If you log into the FreeNAS web management console there is a section that shows you the number of sectors synchronized and the percent complete. But that’s only useful if you stare at it. I want to know if it’s locked up which would require grabbing this value and if it doesn’t change after a certain period, send an email alert.
But before I can do any of that, I need to start with the basics and figure out how to pull the actual HTML from the website so I can parse it and do interesting things like that from there.
The code below has the following capabilities:
- Is able to programatically authenticate against any PHP based (and possibly other) authentication mechanisms
- Connects to a specific URL and pulls down all of the raw HTML for that page into a variable for further manipulation
This is certainly a handy snippet to keep in your back pocket!
# This is the URL that when visited with a web browser contains the username and password fields to fill in $LoginURL = "http://yourwebsite.com/login.php" # This is the URL of the page you actually want to pull content from but if accessed directly will normally just redirect you to the login page above $ContentURL = "http://yourwebsite.com/someothercontentthatfirstrequiresauthentication.php" # The username and password used to authenticate with the site above $Username = "hero" $Password = "superman" # Create a new object that pulls the HTML data from the login page including the username and password fields $website = Invoke-WebRequest -Uri $LoginURL # Note the "username" and "password" attributes specified here may have a different name. # Verify by checking the contents of $website.Forms.fields $website.Forms.Fields.username = $Username $website.Forms.Fields.password = $Password # Connect to the login URL and send the login credentials you created as POST and save the resulting session Invoke-WebRequest "$LoginURL" -SessionVariable WebSession -Body $result.Forms -Method Post | Out-Null # Now that we're authenticated, connect to the actual URL you want and pass in the session object you created above $data = Invoke-WebRequest -Uri $ContentURL -WebSession $WebSession # There is a ton of other metadata that is returned that you most likely don't care about. #If you just want the raw HTML to pull some specified content, try using the "outerhtml" property as shown below $HTMLOutput = $data | select -ExpandProperty Parsedhtml | select -ExpandProperty IHTMLDocument3_documentElement | select -expandproperty outerhtml # Display the results to the screen. This will be the raw HTML returned by the site. You can now do whatever you'd like with it. $HTMLOutput