Parsing JSON Embedded in Markdown

A Step-by-Step Tutorial

Markdown is a widely-used markup language for formatting text in a simple and readable way. It's especially popular in documentation, blogging, and even note-taking. However, sometimes we need to embed JSON data within markdown text for various purposes, such as configuring settings, demonstrating examples, or embedding structured data. This blog post will walk you through a Python solution for parsing JSON data embedded within markdown text.

Understanding the Problem

Consider a markdown file that contains a JSON object wrapped in triple backticks. Our goal is to extract and parse this JSON object into a Python dictionary. Here's an example of how such a markdown might look:

Here is an example of JSON data:

{
  "name": "John Doe",
  "age": 30,
  "email": "john.doe@example.com"
}

We need a function that can:

  1. Identify the JSON block within the markdown.
  2. Extract the JSON string.
  3. Parse the JSON string into a Python dictionary.

The Solution

We'll create two functions that will help us achieve this goal:

  1. parse_json_markdown - This function will extract and parse the JSON from the markdown text.
  2. parse_and_check_json_markdown - This function will parse the JSON and verify that it contains certain expected keys.

Step 1: Read the Markdown File

Read the content of the markdown file in your Python script.

with open('example.md', 'r') as file:
    markdown_content = file.read()

Step 3: Parse the JSON

Use parse_and_check_json_markdown to parse the JSON and check for expected keys.

expected_keys = ["name", "age", "email"]

try:
    parsed_json = parse_and_check_json_markdown(markdown_content, expected_keys)
    print("Parsed JSON:", parsed_json)
except Exception as e:
    print("Error:", e)

Step 4: Run the Script Run your script to see the parsed JSON`output.

python parse_json_from_markdown.py

If everything is set up correctly, you should see the following output:

{'name': 'John Doe', 'age': 30, 'email': 'john.doe@example.com'}

The Complete Code

The following code snippets provide the complete implementation of the two functions for parsing JSON embedded in markdown.

Function 1: parse_json_markdown

This function removes the markdown formatting and parses the JSON.

import json

def parse_json_markdown(json_string: str) -> dict:
    # Remove the triple backticks if present
    json_string = json_string.strip()
    start_index = json_string.find("```json")
    end_index = json_string.find("```", start_index + len("```json"))

    if start_index != -1 and end_index != -1:
        extracted_content = json_string[start_index + len("```json"):end_index].strip()
        
        # Parse the JSON string into a Python dictionary
        parsed = json.loads(extracted_content)
    elif start_index != -1 and end_index == -1 and json_string.endswith("``"):
        end_index = json_string.find("``", start_index + len("```json"))
        extracted_content = json_string[start_index + len("```json"):end_index].strip()
        
        # Parse the JSON string into a Python dictionary
        parsed = json.loads(extracted_content)
    elif json_string.startswith("{"):
        # Parse the JSON string into a Python dictionary
        parsed = json.loads(json_string)
    else:
        raise Exception("Could not find JSON block in the output.")

    return parsed

Function 2: parse_and_check_json_markdown

This function uses parse_json_markdown to parse the JSON and then checks for the presence of expected keys.

import json

def parse_and_check_json_markdown(text: str, expected_keys: list[str]) -> dict:
    try:
        json_obj = parse_json_markdown(text)
    except json.JSONDecodeError as e:
        raise Exception(f"Got invalid JSON object. Error: {e}")
    for key in expected_keys:
        if key not in json_obj:
            raise Exception(
                f"Got invalid return object. Expected key `{key}` "
                f"to be present, but got {json_obj}"
            )
    return json_obj

Conclusion

Parsing JSON data embedded in markdown can be useful for various applications. By using the provided functions, you can easily extract and parse JSON from markdown files in your Python projects. This method ensures that your JSON data is correctly formatted and contains the necessary keys, making your code more robust and reliable.