If you’ve ever had to write a Microsoft certification exam, you know that the exam syllabus is available from a URL that looks like this:
https://www.microsoft.com/en-us/learning/exam-##-###.aspx
Where the ##-### is the specific exam you are writing. You likely also know that the way Microsoft presents this information is not ideal from a study guide perspective. It’s broken into categories and from there is just a jumble of words including many, many “filler” words like “plan and configure” or “configure and troubleshoot”.
I am writing a new certification exam and I wanted a simple means to know what specific concepts to study and play with in my lab. I realized that I should be able to write a PowerShell script that would download the HTML from the URL above, extract the exam syllabus text and then cut out all of the unnecessary words. This would leave me with a clean “checklist” of keywords I need to Google and understand.
So I wrote that.
The PowerShell code below requires the URL for the exam you wish to extract the checklist from. Once it runs, it will save the checklist to the clipboard at which point you can paste it into your favorite application. In my cause, I use Excel. For the 70-345 exam, it takes this:
produces a list that looks like this:
Note: The script produced one a CSV with all of those items in one column. I then went through and categorized them by priority based on my own comfort level with the various concepts. This is a preparation exercise itself though as it gets you familiar with the focus of the exam so as you begin your studying, you’ll be able to zero in on important concepts as you encounter them.
# Connect to Microsoft Website and download syllabus for Microsoft Certification exam, extract all relevent keywords to study and export to the clipboard # Microsoft; Exam; Certification; Parse; Extract # Will present GUI prompt to approve cookies. Only way to disable it is to use -usebasicparsing with invoke-webrequest but doing so loses access to parsedhtml property so we just have tp accept it $ExamURL = 'https://www.microsoft.com/en-us/learning/exam-70-347.aspx' $Webpage = Invoke-WebRequest $ExamURL # Leverage the parsed HTML property to extract the text under the <DL> tag as all exam content seems to be kept under this tag $Raw = ($webpage.ParsedHtml.getelementsbytagname('dl')) | select -first 1 | select -ExpandProperty innerhtml $Raw = $Raw -replace "`n","" # Remove carriage returns so text that currently splits over two lines is joined properly for later processing $Raw = $Raw -replace '\s+', ' ' # Remove any extra spaces that contain more than one space $Raw = $Raw -replace ">",">`r`n" # With the HTML cleaned up, insert a carriage return after each ending '>'HTML tag so to make the results easier to parse $Raw = $Raw -split "`r`n" # Convert the text into a PowerShell object such that each line is its own row in the object $Raw = $Raw -replace '<[^>]+>','' # Remove all HTML tags $Raw = $Raw -split ";" # Place all the unique concepts in their own line $Raw = $Raw | where {$_} # Remove Blank Lines # Microsoft typically uses a semicolon to sepearate ideas but sometimes they have really long strings of concepts separated by commas. Our logic says if the string is longer than 120 characters, separate out each item $Lines = @(); $Raw | ForEach-Object { if( ($_).length -ge 120) { $Lines += $_ -split "," } else { $Lines += $_ } } $Lines = ($Lines | sort -Unique).trim() # Any of the text below will be detected at the beginning of each string and if found, removed. This should leave us with just the relevent keywords # You want to generally sort by length with the longest, most specific phrases at the top and work your way down from there # Each set of terms will likely be somewhat unique for each exam but you can add the bulk of the new ones here as necessary $TermsToRemove = @' Plan, deploy, manage, and troubleshoot Plan, Deploy, Manage and Troubleshoot plan, create, configure, and deploy Plan, configure, and manage a create, configure, and manage Plan, deploy, and troubleshoot Plan, configure, and perform Plan, configure, and manage plan, create, and configure Plan, create and configure plan, deploy and configure plan, deploy, and configure Plan, deploy, and manage a Plan, create, and manage Monitor and troubleshoot troubleshoot and monitor Plan, deploy, and manage Plan, deploy and manage plan and configure for Plan, manage, and use plan and configure a create and configure configure and manage plan and configure plan and delegate plan and create plan and deploy plan and manage troubleshoot a and configure configure troubleshoot plan for plan an plan a plan and the '@ -split "`r`n" $FinalOutput = @() # Loop through each unique line that should now represent a single concept we need to study for ForEach($Line in $Lines) { $TempLine = $null # Loop through each line and compare it to our lists of terms to remove. If they are found, remove them ForEach($Term in $TermsToRemove) { #If thee line includes one of the terms we wish to remove, remove it in the output variable and stop processing. Startswith by default is case Sensitive so we have to tell it to be case insensitive if(($Line).startswith("$Term","CurrentCultureIgnoreCase")) { $TempLine = ($Line -replace "$Term","").trim(); break } } if(-not $TempLine) { $TempLine = $Line } $FinalOutput += $TempLine } # Remove any strings that contain a % as these should pretty much only be the major headings of each category which we don't care about in this scenario $FinalOutput | where {$_ -notmatch '%'} | sort -Unique | clip Write-host "Your data has been copied to the clipboard. Paste it elsewhere for review" -ForegroundColor Green