Aarhus University Seal

JSON Extraction to CSV or Excel

The JSON data format is widely used in web applications and data handling, wherefore it is very likely at some point to get data in this format when harvesting, collecting or accessessing web data. For example, several of the results that may be retrieved with the service YouTube Data Tools from Digital Methods Initiative will be in the JSON format.

JSON stands for JavaScript Object Notation. The JSON data format is coded in a way that includes relations between data, e.g. arrays or sequences.

Although the format is designed to "be easy for humans to read and write" there are very few application that can readily read or analyse this data format for research purposes.

So if or when you have retrieved a dataset in the JSON format, you will very likely benefit from transforming the data to another format: CSV. Or if possible, to Excel or other spreadsheet format which can then be cleaned and edited, and then converted to CSV if needed.

When one goes about to extract JSON data to other formats such as CSV or Excel, first of all remember:

Always store your original dataset for as long as it may be needed for your research!
The JSON file(s) you have originally harvested are a more complex and detailed data format than a CSV or Excel file, and should be stored. If errors occur, or if a specific category of information what should be available has not been translated correctly into another format, the original dataset may still be needed.

Recommended solution:

The application OpenRefine is the best all-round tool for working with and converting JSON files.

Depending on your amount of data and on the specific JSOn file other solutions may prove easier, or necessary.
But in most cases with larger data sets OpenRefine is the best solution, both in respect of handling the JSON files (including subtypes like JSONL (JSON in a line-based format)), and in respect of extracting the data to other formats.

CDMM Instruction Video: Converting JSON to Excel in OpenRefine

Service: OpenRefine

Converting from JSON to Excel on Windows

Some versions of Excel on Windows can import JSON files directly. This is a good solution if accessible, but this will depend on the Excel version you are running if you are a Windows user. The same functionality is unavailable in any version of Excel on Mac.

If you have access to a Windows machine with MS Office, you may check if this conversion can be done by following these steps:

1) Create an Excel Spreadsheet with an appropriate name.

2) In the empty spreadsheet go to the data tab, and select “From Text”.

3) In the path finder window browsing for the file to import, change the search from “Text Files (*.pm; *.txt; *.csv)” to “All Files (*.*). If the feature is available in your Excel version, this will now allow you to point to the JSON (or JSONL) file that you wish to import. If not, the you will have to proceed with a JSOn to CSV conversion, as described in the section below.

4) Select your data file and click “Import”.

5) The Text Import Wizard now opens. “Delimited” (by commas, etc.) should be marked by default.

6) Tick the line saying “My data has headers.”

7) In “File Origin” (which will likely state “OEM: Unites States”), scroll down and select “65001: Unicode (UTF-8). (This is to avoid unnecessary character mismatches in the import, as discussed in the last chapter, “Basic Data Cleaning for Character Endocing Conflicts”).

8) Press “Next” which will take you to “Text Import Wizard – Step 2 of 3”.

9) Deselect the tick-off in “Tab” and tick off “Comma” instead. (Note what happens in the scrollable preview window if in doubt).

9) Press “Next” to go to the last Import Wizard window. The default settings here (“General”) should work, so press “Finish” and the import starts. (With large datasets Excel will have to work a while. A progress bar will be visible at the bottom).

Converting from JSON to CSV

There are several ways to convert JSON files to CSV files.

Unfortunately, most are either rather technical and will require further installations, and further use of command lines – versus those that are easier to use, but which cannot handle datasets of all sizes - notably very large datasets.

An online JSON to CSV converter which is recommended by Twitter is listed below. Provided that your dataset does not exceed the limitations of the converter, this will be a rather easy and comfortable way to get a CSV version of you dataset, so it is recommended to try. (The exact limitation is not specified).

For example, the online converter mentioned in Twitters tutorial (see below) will have trouble with large files, and it also seems that it – and other online converters tested – cannot recognise the “.jsonl” format harvested with twarc2. (However, since a JSONL file is a JSON file with the "L" specyfing the internal data format, JSONL files may be renamed to simply JSON. Just remember to rename it to JSONL (or keep a backup with the original file extension) because other programs expect the "L").

Finally, if a dataset proves too large to convert by any of the above means, you may have to look into command line programs for the conversion. This kind of solution will be powerful, but it will also demand high IT skills and/or patience for at steep learning curve.

For good measure, if for example you want to experiment with the more advanced options, please refer to Twitters guide on various methods for JSON to CSV conversion:

https://developer.twitter.com/en/docs/tutorials/five-ways-to-convert-a-json-object-to-csv

Online JSON to CSV converter by Eric Mill: https://konklone.io/json/

JSONKit (a command line application): https://github.com/johnezang/JSONKit