You've FOUND the EA Blog

How We Keep Data Fresh

By Tom Montpool

Every year, Environics Analytics updates virtually all of our main data products, from the core offerings that everyone sees—like PRIZMC2 and Demographic Estimates and Projections—to products with a limited user base or ones that are used mostly behind the scenes—such as Generations and the Postal Code Conversion File. Our goal is to prepare all the new data to coincide with the functional updates to ENVISION and, especially, to give our customers access to the newest data as soon as possible—usually the end of February. We distribute the data primarily through ENVISION but we also ship these files to many clients and in a wide variety of formats for project work as well as to many partners for integrating into software. Some of our partners’ software, like Alteryx’s Allocate, DemographicsNow Canada and Esri’s Business Analyst, features data built by us in their format, while Tetrad’s PCensus is built by their experts from our standard data releases.

If you’ve seen our Products & Services Catalogue, you know the extent of the data we offer: datasets on everything from wealth and household spending to shopping mall locations and mobile phone usage. You name it and, more than likely, we’ve got it.

A few years ago, updating and producing consistent datasets in the format that our users wanted could take days or even weeks because of the inconsistencies and limitations of technology: some software had spatial functions, other software had calculations and statistics and still other software had conversion functions. And each one also had its own native format and its own limitations in number of fields, size of tables or ability to export to other formats. Creating consistent products was a real tear-your-hair-out kind of puzzle, and one that needed to be solved quickly if our customers were to have the best data available.  

Today, I owe my sanity to Alteryx. As you may know, Alteryx is the business intelligence platform that offers flexible solutions through the rapid execution of calculations, formulas and methodologies. For me, it is a lifesaver because it combines the ability to do spatial processing, advanced data processing, conversions and exports in one package that is quick and flexible. Alteryx lets me work with data in whatever format it is sent to me—comma-delimited text, fixed-field, shapefile—perform virtually any manipulation I could want and then export the results to whatever format I need—SQL Server, MS-Access, or shapefile. Even better, once I’ve created an Alteryx module for one task, it can often be modified for a similar task (or re-used the next year) with very little additional effort. In many cases, a process that used to take four days can now take as little as an hour.

Where do all of our data products come from? Many originate with external suppliers such as infoCanada, TomTom and the Print Measurement Bureau. To make them useful for our clients, some datasets need only simple processing, like geocoding, while others take more work. The PRIZMC2Links—which link our PRIZMC2 segments to media and marketing databases like PMB, BBM RTS Canada, NADbank and AskingCanadians™—are some of the more complex products. Our own research department also creates a significant portion of our data, such as Demographic Estimates and Projections (DEP) and the yearly PRIZM updates, and also performs much of the pre-processing of data from suppliers. When products come directly to me from suppliers, I typically check the data, manipulate it to the final structure and export it in various formats at the national level and for our six standard regional cuts (Atlantic, Quebec, Ontario, Prairies, Alberta & BC).

Maintaining consistency within the product, across our various products, and over time is especially important. Whenever possible, I try to make data tables consistent with the previous year’s version in order to facilitate scripting and automated processes at the user end. This means table names, field names, field formats and content for a product should be consistent from year-to-year. This past release, our demographics team acquired a new data source and created a Citizenship by Age category in DEP. I could have added the new variables to the existing table structure, but that could have caused automated import scripts to fail. Instead, the new variables are in a new table.

Of course the data are paramount but, without documentation and support, the data updates can be confusing or meaningless. So, as part of the refresh process I compile drafts of the variables lists that accompany the data products, modify release notes to identify changes and create metadata that goes to the ENVISION team. Our able marketing and fulfillment team then uses this information to create support documentation that is shipped with every data product in Environics Analytics’ Products & Services Catalogue, PRIZMC2 Marketer’s Handbook and the Environics Analytics website.

While the “big update” of our data products is certainly a big deal, much of my time is spent with customized data requests. These typically involve dealing with strange formats like text delimited by the tilde character (you know, the squiggly ~ line) instead of commas, really customized spatial limitations like the dissemination areas in the Ontario part of the Ottawa-Gatineau CMA, or a diverse combination of variables from different data products like the DEP dwelling types, the HouseholdSpend maintenance variables and the WealthScapes Real Estate category (for the Ontario part of the Ottawa-Gatineau CMA in tilde-delimited format J). The most unusual request I ever received was for a pipe-delimited (you know, the vertical bar character) file containing dissemination area-level percentages of all of the DEP and HouseholdSpend variables joined onto the Postal Code Conversion file. The table we produced had about 850,000 rows and nearly 2,200 columns.

One of the most rewarding parts of my job is being a go-to data and technology resource for my colleagues. Sometimes I think they’re just playing “Stump the Geek” with questions like how many postal codes are in the Atlantic provinces or how do you create a database of publicly funded schools in Ontario (which can be generated through a web-scraping and parsing process using Alteryx tools). But I know all these seemingly arcane questions are important to helping our customers use our products. And to be honest, I get a little kick out of being able to answer the tough ones.

Currently, the research team is well into their preparations for DEP 2013—the big change being the move from the 2006 Census geographic base to 2011, and using the Census 2011 data in the DEP recipe. But even before the February 2013 big update, we’ll be releasing a new edition of WealthScapes in June and a new Postal Code Conversion File toward the end of the summer. And somewhere along the way, I hope to do some updating of my own: the front walkway to my home needs renovating. But that’s a subject for another blog…

###


Tom Montpool is Director of the Standard Data team at Environics Analytics. With nearly a decade of experience in the geodemographic industry, he is a recognized expert in data management and modelling for complex site selection, database marketing, consumer behaviour and target marketing.

Advertisements

Written by Environics Analytics

June 14, 2012 at 2:06 pm

%d bloggers like this: