Microsoft Excel provides various ways to query and use external data sources in workbooks. Information can be loaded from different Excel workbooks or imported from other data formats. It is also possible to query data from databases or web services.
We will look at the following different ways to import external data and review their usage
- copy-pasting from other formats
- importing from other XLSX files
- importing from csv files
- importing from XML files using Power Query
Copy-pasting from external sources
Any data that is tab-separated can be copy-pasted into an Excel sheet using the clipboard. It works both on Windows and Mac Office and it is the quickest way to transfer data into Excel. Depending on where the data originally comes from, there may be formatting options (font types, text-decoration, colors) that should be discarded. This can be done by pasting data as text instead of pressing CTRL (cmd) + V.
Most HTML tables from the web or any other source or text file can be easily pasted into Excel this way.
Using “Text to Columns” to format data
Excel is capable of converting data that is already pasted into a workbook but somehow ended up in one column instead of proper formatting into different ones. The option is called “Text to Columns” and it can be found in the data menu or ribbon.
After selecting the right region and activating it, a wizard appears that converts data into columns. It is possible to set any delimiter (comma in the above example) to use as one. It is also possible to split cells at a fixed length, for example, every 4th character by setting it to “fixed-width” then dragging the slider on the bottom to adjust columns.
External cell references
Different Excel files can reference others’ cells by simply using them in functions just like cells from different sheets in the same file, including structured references. You can read more about them here: Excel structured references explained.
To create such a reference, type an equal sign into a cell then use the mouse to click on any other cell even in different files. This will automatically put the right reference into the original cell. These references have the following format:
[Workbook]Sheet!Cell
Excel has a tendency to create absolute references to workbooks, so once the file is saved it’s likely that these external references will have filenames in them, for example, the reference
[Book5]Sheet1!$B$4
was automatically replaced to
'C:\Users\techtipbits\Documents\[Book5.xlsx]Sheet1'!$B$4
Excel tries to keep these links relative to the original file’s location when possible, even when it shows an absolute file path. It’s an important detail because it means that if both files live in the same folder and they are copied elsewhere together, references will automatically use the right file. In some cases (network paths, mapped drives..) these references may break and will need to be updated manually.
By default, Excel doesn’t load data from external references and displays a warning when opening a file for security reasons. Clicking [Enable Content] will automatically load the right references.
Importing data from CSV files
CSV is a human-readable text-based format that can be opened or created with a text editor, even the venerable Notepad works. It stores rows of information in separate lines, each text column is separated by a “comma” – the file format is an abbreviation for “Comma Separated Values”. There are different variations of the CSV format, sometimes cell values are enclosed in quotes, and storing new lines in cells can be problematic unless they are properly escaped (by using \n instead of a new line or having a backslash before the new line or using quotes around cell values)
To import files from a CSV data source, select Data > Get Data > From File > From Text / CSV or Data > From Text / CSV. In older Excel versions (or the current one on a Mac) a Data Wizard pops up similar to the “Text to Columns” functionality, on the recent (Windows) ones a Power Query style window shows.
The role of character sets in text files
Traditionally, text files don’t carry any information on how to store letters that are outside the usual basic set of letters, numbers and various punctuation marks. As long as the imported data contains characters from the English (Latin) alphabet only, pretty much any character set works.
Special characters with accents or basically anything outside the usual letters used in English are stored differently depending on the character set or the encoding of the text file. Excel tries to guess this character set by looking at a sample of data to be imported but it doesn’t always get it right. One sure sign of a misplaced character set import is that random non-latin characters are broken and display gibberish instead of the proper ones. The most common ones are
- 1252 (Western European Windows)
- 20127 US ASCII
- 65001 (Unicode UTF-8)
The source of the data normally indicates what character set the data is encoded in. Recently everyone’s using UTF-8 so that should be the first guess.
Loading real time external data with Power Query
Power Query provides a tool to query and load data from various sources and databases. In our example, we will use the daily updated data feed of COVID related information from https://ourworldindata.org.
Unfortunately Power Query for Mac isn’t available yet, the Mac Office supports opening and refreshing data in Excel sheets that contain external data sources but it can’t create them.
The example data set we’re going to use is a data feed in JSON format from this URL:
https://covid.ourworldindata.org/data/owid-covid-data.json
To load these into Excel, select Data > Get Data > From Other Sources > From Web then simply paste the above URL into it. A Power Query window will show up, showing the basic structure of available data. It’s possible to extract columns (use the double arrow next to the column header, where available), rearrange columns, filter data then finally insert it into an Excel sheet by clicking the “Close & Load” button.