«

»

Jan 10 2016

HOWTO: Real world use case for Convert-FromString

This HOWTO covers a real world example of how to use Convert-FromString which was introduced in PowerShell 5.  As a reminder, this is the powerful new cmdlet that allows you to parse any kind of text data and convert the resulting data into structured PowerShell objects by defining “templates” for how the data is laid out and what information you want to extract.

The largest mall in my city includes literally hundreds of stores. I needed to shop for a particular kind of thing and wanted to know what stores I might want to check out.  I started by visiting the website for the mall.  This ended up giving me output that looked like this:

image

It’s not bad certainly but I would like to apply some filters to the dataset.  Now in reality, I could have certainly figured out what I wanted from this website but I realized this would be a great opportunity to see if I could make the ConvertFrom-String and FlashExtract do something useful.  Could I make a PowerShell object out of this data?

The first thing I did was download the webpage in PowerShell using the Invoke-WebRequest cmdlet.  Powershell includes a “parseddata” object that tries to break down any webpage into its component parts and return the results as nested objects.  I looked at the output and discovered that all of the data I was interested in was stored in parseddata.documentElement.outertext.  This gave me the following results:

image

That is what I was looking for!  You’ll note that once the data starts, it follows the following format:

[Storename]
[Category] Telephone  [Phone Number]

This is exactly the kind of thing FlashExtract was designed for.  The problem though is those alphabet headers (ie “A A”, “B B”, “C C”, etc) as well as the word “Close” in each entry.  I needed to remove those.  The “Close” was pretty simple as that was just a find and replace.  But the headers were more interesting.  Ultimately I solved this by using regular expressions, specifically “^[A-Z] [A-Z]” which says the start of the string must be capital letter space capital letter which worked great.
The next challenge I had was the “Telephone” keyword that was included in each line.  FlashExtract was able to handle this fine… until some of the categories started including the word “Telephone” as well. I fought with this for a while but ultimately realized that the best way to work with Flash Extract was to ensure that the source data you give it is as clean as possible.  I noted that the “Telephone” label always had two spaces after it.  I used this as a means of identifying the label as opposed to the category name and removing it entirely.

At this point I was now ready to define my template.  This was a lot of trial and error as I had to define a template entry for each type of dataset that would be returned.  The complications came from the following kinds of entries:

BEBE-(Coming Soon) (Includes special characters such as dashes, brackets, commas, etc)
Weekend MaxMara (Includes camel case spelling)

In order to figure this out, I stumbled upon an absolutely amazing script/tool called Convert-FromString Buddy. That’s available here:

http://dougfinke.com/blog/powershell-v5-0-convertfrom-string-buddy/

What this basically does is provide you a real-time IDE to test your templates against your actual dataset.  The killer feature here is that as you make changes to the template or the source data, the results are updated instantly make it incredibly fast to iterate on your template designs and identify issues.  Check out a screenshot below to see how that works in this example:

image

The end result is that you end up with a standard powershell object including storenames, categories and phone numbers that you can now filter to your hearts content.
For example, below is a filter that shows any categories that include the word ‘apparel’ (Ie to include both Mens, Women’s and Unisex) and are not marked as ‘Coming Soon’

image

ConvertFrom-String is going to prove to be an invaluable tool in my toolkit going forward.  Thanks PowerShell team!

Alternatively if you’d like a short version of the code without any comments, here you go:

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">