A Power Query Demo to Automate Repetitive Tasks in Excel

What is a Power Query?

Microsoft provides a decent summary of what Power Query is:

With Power Query (called Get & Transform Data in previous Excel versions), you can import or connect to external data, and then shape that data, for example remove a column, change a data type, or merge tables, in ways that meet your needs. Then, you can load your query into Excel to create charts and reports. Periodically, you can refresh the data to make it up-to-date.

You’re probably thinking along the same lines as me, that this tool sounds incredibly underwhelming. Fortunately, I had a couple years worth of hands-on tutorials courtesy of EY, and of course the help of various Internet tutorials to wrap my head around this.

Power Query is a tool that lets you pull data from one or many sources, and manipulate it in such a way that you can display it however you want. One of the key aspects is that it can be used time and time again, for example if you continually have reports automatically generated in Excel format, Power Query is a great way to summarize everything. Furthermore, Power Query makes no changes to the underlying documents, but is linked and can be refreshed for changes in those documents.

Necessity is the Mother of Invention

But it’s really hard to just learn something, if you don’t have anything to use it on.

Fortunately, I help out a professor with marking of midterms and exams. Due to COVID-19 and the remote nature of study, all the students complete their exams in Excel and submit them online. A “marking sheet” is added as a tab to the front of each Excel, and each exam is marked with the marks updated in the marking sheet. Each marking sheet from anywhere between 50-100 exams is entered into the learning management software that the professor uses.

Designing a New Tool

Opening 50-100 individual Excel files is an incredibly slow way to manually input marks, exacerbated by the fact that checking to make sure everything was input correctly is even more difficult, since every file would have to be reopened.

Ideally, a summary spreadsheet would display all the information in a row for each exam:

  • First Name
  • Last Name
  • Total Mark
  • Student’s Mark for Each Question

The Tool

The tool and example information can be found here. It is bundled in a .zip file with all the fictional midterm exams contained in a folder, which are intended to be stored there.

Feel free to play around with the source files - updates to marks in any source Excel file will be reflected in the Summary file once you hit Refresh Data. Note, that the Refresh Data button is the button specifically located in the Power Query section.

Cracking it Open

If you have never used Power Query before, it may not be visible, in the interest of time I will leave it to you to figure out how to enable it depending on what version of Excel you have.

The Source File

A Screenshot of a Sample Source File

The source file (e.g. the marked midterms) are all saved in the ‘Midterm Exams’ folder. Don’t worry - it’s all sanitized, the the marking sheet by itself was used with fictional marks entered into each one.

The ‘Total Marks’ tab in each Excel was unintentionally designed to be difficult to manipulate. It reads like a summary page itself, the relevant data is in a column that would normally contain header information, and the title for the information (e.g. Q1, Q2) are actually on the right side of the table, in Column D.

Furthermore, there is a lot of extraneous information, such as a total score shown using a text concatenate function, and a gap between the total scores and the individual question information. There’s even a bonus question for added complexity!

Building the Power Query

The Power Query goes through a number of steps, which can broadly be defined as

  1. Pulling all the files in the target folder (note that the file uses relative referencing based off the filepath shown in Cell K2, which should automatically update when you allow it to recalculate).

  2. Filtering out non-relevant files in the folder (e.g. Word files, hidden Excel files that exist when someone is editing an Excel file).

  3. Importing the contents of the Total Marks table into Power Query and expanding the table. A Screenshot of the Power Query editor with the tables expanded

  4. Cleanup of data by removing extraneous rows and columns

A Screenshot of the Power Query editor after the expanded tables are cleaned up

  1. Pivoted the table to move the Question numbers into the column headings

A Screenshot of the Power Query editor after pivoting

  1. Grouped all the rows for each student, so there was only one row per student, with all the info. A Screenshot of the Power Query editor after grouping all the rows

  2. Parsed the student’s name from the file names

  3. Added a summation column to show a student’s total marks. A Screenshot of the Power Query editor after cleaning up the output

  4. Added a summation column to show a student’s total marks.

Conclusion

And there you have it, now anytime the Power Query is updated, the table is updated with all the marks from each Excel. No need to manually pull anymore numbers, no risk of accidentally overwriting student marks with haphazardly written VBA coding, easily update for any last minute changes to the source files and no need to reinvent the wheel next time, as the same Power Query can be used every time.

A Screenshot of the Power Query editor after cleaning up the output

Eric Fong, CPA, CA, CPA (Illinois)
Eric Fong, CPA, CA, CPA (Illinois)
Tax Manager

I solve complex problems.

Related