HOWTO: Microsoft Certification Exam Preparation Generator

If you’ve ever had to write a Microsoft certification exam, you know that the exam syllabus is available from a URL that looks like this:

https://www.microsoft.com/en-us/learning/exam-##-###.aspx

Where the ##-### is the specific exam you are writing.  You likely also know that  the way Microsoft presents this information is not ideal from a study guide perspective.  It’s broken into categories and from there is just a jumble of words including many, many “filler” words like “plan and configure” or “configure and troubleshoot”.

I am writing a new certification exam and I wanted a simple means to know what specific concepts to study and play with in my lab.  I realized that I should be able to write a PowerShell script that would download the HTML from the URL above, extract the exam syllabus text and then cut out all of the unnecessary words.  This would leave me with a clean “checklist” of keywords I need to Google and understand.

So I wrote that.

The PowerShell code below requires the URL for the exam you wish to extract the checklist from.  Once it runs, it will save the checklist to the clipboard at which point you can paste it into your favorite application.  In my cause, I use Excel.  For the 70-345 exam, it takes this:

pic2

produces a list that looks like this:

Note: The script produced one a CSV with all of those items in one column.   I then went through and categorized them by priority based on my own comfort level with the various concepts.  This is a preparation exercise itself though as it gets you familiar with the focus of the exam so as you begin your studying, you’ll be able to zero in on important concepts as you encounter them.

 


# Connect to Microsoft Website and download syllabus for Microsoft Certification exam, extract all relevent keywords to study and export to the clipboard
# Microsoft; Exam; Certification; Parse; Extract

# Will present GUI prompt to approve cookies.  Only way to disable it is to use -usebasicparsing with invoke-webrequest but doing so loses access to parsedhtml property so we just have tp accept it
$ExamURL = 'https://www.microsoft.com/en-us/learning/exam-70-347.aspx'
$Webpage = Invoke-WebRequest $ExamURL
# Leverage the parsed HTML property to extract the text under the 
<DL> tag as all exam content seems to be kept under this tag
$Raw = ($webpage.ParsedHtml.getelementsbytagname('dl')) | select -first 1 | select -ExpandProperty innerhtml


$Raw = $Raw -replace "`n","" # Remove carriage returns so text that currently splits over two lines is joined properly for later processing
$Raw = $Raw -replace '\s+', ' ' # Remove any extra spaces that contain more than one space 
$Raw = $Raw -replace ">",">`r`n" # With the HTML cleaned up, insert a carriage return after each ending '>'HTML tag so to make the results easier to parse
$Raw = $Raw -split "`r`n" # Convert the text into a PowerShell object such that each line is its own row in the object
$Raw = $Raw -replace '<[^>]+>','' # Remove all HTML tags
$Raw = $Raw -split ";" # Place all the unique concepts in their own line
$Raw = $Raw | where {$_} # Remove Blank Lines

# Microsoft typically uses a semicolon to sepearate ideas but sometimes they have really long strings of concepts separated by commas.  Our logic says if the string is longer than 120 characters, separate out each item
$Lines = @(); $Raw | ForEach-Object { if( ($_).length -ge 120) { $Lines += $_ -split "," } else { $Lines += $_ }   }
$Lines = ($Lines | sort -Unique).trim()

# Any of the text below will be detected at the beginning of each string and if found, removed.  This should leave us with just the relevent keywords
# You want to generally sort by length with the longest, most specific phrases at the top and work your way down from there
# Each set of terms will likely be somewhat unique for each exam but you can add the bulk of the new ones here as necessary
$TermsToRemove = @'
Plan, deploy, manage, and troubleshoot 
Plan, Deploy, Manage and Troubleshoot 
plan, create, configure, and deploy 
Plan, configure, and manage a 
create, configure, and manage 
Plan, deploy, and troubleshoot
Plan, configure, and perform 
Plan, configure, and manage 
plan, create, and configure 
Plan, create and configure 
plan, deploy and configure 
plan, deploy, and configure
Plan, deploy, and manage a 
Plan, create, and manage 
Monitor and troubleshoot 
troubleshoot and monitor
Plan, deploy, and manage
Plan, deploy and manage 
plan and configure for
Plan, manage, and use 
plan and configure a 
create and configure
configure and manage 
plan and configure
plan and delegate
plan and create
plan and deploy
plan and manage
troubleshoot a
and configure
configure 
troubleshoot 
plan for
plan an
plan a
plan
and 
the 
'@ -split "`r`n"

$FinalOutput = @()

# Loop through each unique line that should now represent a single concept we need to study for
ForEach($Line in $Lines)
{
    $TempLine = $null
    # Loop through each line and compare it to our lists of terms to remove.  If they are found, remove them
    ForEach($Term in $TermsToRemove)
    {
        #If thee line includes one of the terms we wish to remove, remove it in the output variable and stop processing.  Startswith by default is case Sensitive so we have to tell it to be case insensitive        
        if(($Line).startswith("$Term","CurrentCultureIgnoreCase")) { $TempLine = ($Line -replace "$Term","").trim(); break } 
    }
    if(-not $TempLine) { $TempLine = $Line }
    $FinalOutput += $TempLine
} 

# Remove any strings that contain a % as these should pretty much only be the major headings of each category which we don't care about in this scenario
$FinalOutput | where {$_ -notmatch '%'} | sort -Unique | clip

Write-host "Your data has been copied to the clipboard.  Paste it elsewhere for review" -ForegroundColor Green

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.