You would like to have a KPI (Key Performance Indicator) dashboard to pilot your business and have access to common key statistics with your managers.
You’re starting your KPI Project from zero.
You have multiple data sources. You will have centralise those data in a database in order to build our KPI dashboard.
Data were maintained for specific business needs. In other words, some of the fields could be left empty or non maintained in a standard way, because they had no impact at the original goal. Those fields may be relevant for your future database and KPI dashboard.
Start small. Start using a limited number of statistics for your dashbaord anda limited number of sources.
What is the main source for your main statistics? What is the secondary source?
They are usually multiple ways to access to data. Some provide an access to all your data, some to a subset.
List the know access possibilities.
Identify the easiest, most relevant access in order to investigate your data. It could be some CSV files. CSV files are plain text files which can be opened in Excel. This is usually a relevant type of file. Consider any PDF and Word documents as non machine readable.
Test this access method and list its limitations, e.g.:
- Too many files to download manually
- Manual editing
- header (variable names) missing in the first row
- remove the merge on given celles
- open and save the files in order to be machine readable
- Files extracted for reports, not from tables
Identify costs for alternative methods, whenever possible e.g.
- API access with monthly amount per user
Knowing your data
How many observations/lines in your table?
What variables are available?
What are the possible values for those variables?
Is there any variables with only missing values?
Is there any pair of variables?
- value and unit
- city and region
List possible value combinations.
How many occurrence of each value/value combinations are there?
What values were possible but didn’t occur?
Can some of the variables be derived from others? If so, compare actual values with derived values? e.g.
- Based on DateTime values, it is morning, noon, afternoon, evening, night
- Based on DateTime values, it is Monday, Tuesday….
Which variables should be kept to start with? i.e. which variables could be relevant for our KPI dashboard?