Variable Mapping Visualization, An Example

This visualization can help you clarifying and improving the program flow. Excel is not the best software to visualize your mapping, but it could be a first place to start with using the link between input and output variables. Here is an example with two mapped datasets.

Visualizing Two Datasets

In the following example, we have the mapping of a program which generates two datasets:
  • EVENT_ALL (first few rows in blue) and
  • its derived dataset EVENT_ACTUAL (the rows below in purple).
The name of these datasets is given in the first column. The second column lists the variables in the dataset (or drop, for intermediate variables).

Visualizing Input and Output Variables

For the other columns, we have two types of background color:

  • The dark background colors are used to highlight output variables.
  • The light background colors are used for the input variables.

Let’s start by zooming in on the top left corner in order to understand how the mapping is specified.

1. First Dataset, Basic Visualization Concept

Unchanged Variables

In the first dataset, many variables are unchanged. These are the first ones on the list (sparte, untertitel_1, etc.).

Updating an Existing Variable

The variable produktion is the first modified variable. Special apostrophes with standard apostrophes. But to know the kind of transformation, you would need to look at the code. It is not mentionned in the visualization.

Updating a Variable using Another Variable

Next, the titel variable is modified. In addition to apostrophes changes, hard coding is necessary for specific events. These events are identified with their number (nummer variable). Basically, terms such as “Premiere” that were forgotten in the database are added, in order to have all the details in the final report.

Creating a New Variable

Finally, the exclude variable from the updated produktion and titel variables and the veranstalter (organizer), spielstaette (venue), status, and datum (date) variables.

In Summary

This initial overview gives you a rough idea of the method used to present the variable mapping.

And now let’s have a look is the first part of the mapping for the second dataset.

2. Second Dataset - Using one Data Step

This part of the mapping is done in a single data steps.

Here are a few examples:

  • The variable exclude is only used to select specifc rows from the input dataset. It is dropped from the output dataset.
  • The variable nummer is renamed to eventno.
  • Two variables are derived from the preiskategorie variable.
  • The variable showtype is created using five input variables.
  • The derived variable, city, created from three variables, is used to create the region variable.

3. Second Data Step - Merging using Intermediate Datasets

And finally, here is the second part of the mapping for the second dataset.

Merging Two Datasets to the Main Dataset to Add One Variable

First notice the different colors, blue and green. Two intermediate datasets were created from the variables in the main dataset, each having a new variable.

The variable _show_tm in this first intermediate dataset (blue) requires the variables eventno (event number), event_da (event date), company, event_tm, and event_timeofday to be created. Similarly, the second variable _show_tm in this second intermediate dataset (green) use multiple input variables.

The main dataset (purple) is then updated with the two intermediate datasets to create the show_tm variable.

Merging one Dataset to the Main Dataset to Add Two Variables

Now let’s move on to the orange background. This third intermediate dataset, created from the main dataset, has two new variables, eventno_list and pricecat_list.

The main dataset (purple) and this third intermediate dataset (orange) are merged in order to add the variables eventno_list and pricecat_list to the main dataset.

The new variable eventtype needs the variable eventno_list to be created, but it is not the case of the the other variables (square, facility, and seatplan). They could have been added much earlier in the program, even before working with the intermediate datasets.

Conclusion

This kind of mapping can be useful to have a quick understanding of the program. It can help improving the program flow and support the validation process.

Althgouth I’ve used a SAS program to illustrate the purpose, it can be implemented whatever the programming language used.

However, as you mentionned above, it was done manually. Any change in the program has to be implemented manually in the visualization. A tool able to generate such kind of output automatically would be needed to generalise its use in a department. Do you have any suggestion?

About the Author

Leave a Reply

Your email address will not be published. Required fields are marked *

You may also like these