Introduction to Data Storage
In the world of technology, data comprises all interactions. How we store this data is a vital element to what we are able to do with said data. When you think of data storage, many things come to mind, most of them likely valid. There is no one “correct” way to store data, however, this article will specifically cover JavaScript Object Notation (JSON), and its structure validation tool: JSON schema.
JSON
JSON is simply a format, or notation, for object-based and array-based storage. What this means is that it is designed to hold data in a specific, repeatable format that can be both easily edited and read. JSON is frequently used to transfer data between disparate applications or environments because it is language-independent, meaning it can be used in many different programming languages. JSON is versatile, yet simple, with the two primary methods of data denotation described below:
Object storage
Key-value pairs denoted by curly braces ‘{ }’
{
"Name": "Ian Gappinger"
}
Array storage
A single key containing multiple ordered values denoted by brackets ‘[ ]’
{
"Articles": [
"Test 1",
"Test 2",
"Test 3"
]
}
Combining both of these data structures gives us a simplistic, yet capable nested data structure which can be very easily handled.
{
"Name": "Ian Gappinger",
"Articles": [
"Test 1",
10.1,
{
"Test 3": "Awesome value!",
"Test 4": true
},
{
"Test 5": 10
}
]
}
Note a few important things in the above JSON. First, objects in JSON are separated by commas, whether it be array elements or key-value objects. Next, JSON supports 6 specific data type primitives for values:
- Strings
- Numbers (integers or decimals)
- Booleans
- Objects
- Arrays
- Null
I describe objects and arrays above, but null is simply a value that is not defined, but is still a valid data type. Finally, in this example we can see mixing and matching in full effect with an array containing an object with multiple key-value pairs as an element, array elements also don’t need to be of the same type.
This is a very brief introduction to JSON, so not all facets of the notation are covered here. However, having this context of what is “allowed” in JSON is very important for the main topic of this article.
JSON Schema
What is a Schema?
We have already discussed the first half of this topic, JSON, so now it’s time to cover the schema part of the term.
Simply put, a schema is a static, expected formula which you can compare something to. In the case of JSON, a JSON schema is a layout which describes what we can expect our JSON data to look like.
What Does a Schema Look Like?
In my previous example I gave an example with many different data types strewn all over the place randomly. What if we wanted to reasonably rely on the fact that the JSON we receive always matches the same types of data, but not necessarily the exact values. Using a JSON schema, we can describe every object as a name, and an expected type, along with several other specifications which I will describe later. The JSON schema for the previous example of JSON would look like this:
{
"$schema": "https://json-schema.org/draft-07/schema",
"type": "object",
"properties": {
"Name": {
"type": "string"
},
"Articles": {
"type": "array",
"items": [
{
"type": "string"
},
{
"type": "number"
},
{
"type": "object",
"properties": {
"Test3": {
"type": "string"
},
"Test 4": {
"type": "boolean"
}
},
"required": [
"Test3",
"Test 4"
]
},
{
"type": "object",
"properties": {
"Test 5": {
"type": "integer"
}
},
"required": [
"Test 5"
]
}
]
}
},
"required": [
"Name",
"Articles"
]
}
JSON Schema Structure
Breaking down this structure may look imposing at first, but realize that each grouping of attributes represents something already seen in our data. The “$schema” object represents the version of the JSON Schema Draft which we want to use as our basis for adherence. Several schema draft versions exist, including draft-04, draft-06, draft-07, draft-2020-12, and some more. While the version picked is not the most important decision, you should generally use the newest versions. However, you may need to use an older draft if your project has particular limitations, such as an integration with a legacy system..
Next, our “type” attributes denote what specific data type a piece of data is represented by. There are multiple options for this, but generally you will have your overarching “object” which then contains the properties that make up your actual data objects. Each object is listed by its “key”, or name, and then it has its type declared. If the object in question is an array, then it will have an “items” identifier that lists each array element and its type.
Finally, each particular data piece can have multiple other identifiers which vary depending on the intended schema. In our case, the “required” identifier is used to denote what data we must have regardless of other included data. Any fields not marked as required can be omitted without causing the data to fail against the schema validation.
Why are Schemas so Important?
Imagine you are looking at images of sea vessels with your friend and you ask: “Hey, what kind of ship is that?”, wanting to know the specific designation for that size and shape of ship. Your friend replies with “Gold”. While the response was not necessarily wrong, it was not useful in answering your question. Had you specified “Hey, is that a carrier or a frigate?” you would have received an answer which satisfied your request and given you the information you needed.
We can apply this same concept to sending and receiving JSON data. When sending data between two different systems, for example, a web application and a PowerShell script, you may wish to act on the data in a way that must be repeatable.
A JSON schema enables you to dictate an acceptable format of data such that you can act on it in whatever way you desire without issue. Much in the same way, having your question answered in the right way lets you act on the answer as you originally intended. If you receive data that does not adhere to your schema specification, you can reject it, as it will not allow you to act on it as you had planned.
JSON schemas are an error-preventing, consistent method of exchanging and automatically validating data which can be made actionable by other systems. While there are many applications for such a tool, actual implementations I will save for their own articles, as it is too much for this post.
How do Schemas Enable Automation?
Automation can be implemented in many different ways. This article will focus on automation workflows which engage in cross-system integrations and data transfers. Not all workflows will leverage cross-system integrations, but all workflows should validate the data flowing through them.
One example of an automation tool, which I cover in a different article (WIP), is the CrowdStrike Fusion SOAR platform. Security Orchestration Automation and Response (SOAR) is a technology which enables autonomous action by allowing a system to perform tasks when certain conditions or triggers are met. SOAR often relies on integrations between multiple systems to perform actions. Other examples of automation platforms (not necessarily SOAR platforms) include Github Actions, Zapier, Jira, and more. All of these platforms allow for actions that can be dependent on, or influence other systems. To facilitate the transfer of information between these systems, data is often stored using JSON, but unless that JSON is validated, it has a chance to be invalid and cause your workflows to fail.
The purpose of the JSON schema in an automation is to ensure that the data one system sends to another matches the required format for the actions you plan to take on that data. If you require a valid IP address to perform IP enrichment and lookup with, you must ensure the data type and format are that which matches an IP address. If you somehow sent data that was not an IP address, the schema would detect the invalid data, take some action. Depending on the configuration of the automation, the workflow may terminate, it may notify you of a validation failure, or take some other specified action.
Security Relevance
The validation process prevents errors, which can sometimes not be caught until they cause damage if not properly handled. These errors may even be vulnerabilities, such as injection attacks, or overflow attacks, making validation a very real security concern. JSON validation could technically be performed manually through methods like individual field validation in your code, but JSON schemas eliminate the need for this.
By implementing proper data validation in your workflows, you alleviate some of these security concerns in your applications. A common point of compromise for many different types of applications is injection attacls. which rely on data not being validated, so malicious data can be fed to the application. Injection is a broad risk category, but one that is repeatedly shown to be extremely dangerous, and currently ranks as number 3 in the OWASP top 10 for application security risks.
Tips for Using JSON Schema
Even as I write this, I am no pro at JSON schema. When it comes time to actually make my schema I typically start with sample data I know will reflect (generally) what I expect my data to look like in my workflow. Using this sample JSON, you can use a conversion website such as Transform Tools, or Liquid Technologies. However, some platforms have built-in schema builders, including Microsoft Visual Studio and CrowdStrike Fusion SOAR, among others.
Additionally, the best resource I can advise anyone to go and search is the original documentation, located here. The JSON schema website has a wonderful reference library, and even an introduction to JSON, JSON schemas, and guides on building your first schemas.
Finally, it may be apt to review your available keywords for your schema definitions, such as “description”, “pattern”, and so on. Tons of keywords exist to help make sure your schema accounts for all necessary use cases.
When it comes time to actually make a schema, you don’t need to be an expert. For most of the work I’ve done involving JSON schemas, I’ve simply spent a few minutes researching what exactly I needed to get the job done, then moved on, focusing more on the core logic of my workflows, and the actual benefit I got out of creating them.
Closing Notes
JSON schema is a very useful tool, and will be a valuable part of any automation engineer’s toolbelt. Not everyone who automates needs to be an automation engineer though. Personally, automation is not my core specialization, but it is a handy skill to have as a security practitioner, alongside other abilities such as scripting. If you are working in cyber/infosec, or want to, and don’t know any scripting languages I highly recommend that you learn at least one. That will be your first step into the world of security automation, and from there, knowledge of JSON schema will become a necessity.
This article did not really go into specifics about any real use cases that this can be applied to, and that was to limit the length of this article. The information in this article does not fully cover the capabilities of JSON schema for the same reason, which is why I highly encourage you to explore it more as you come across specific use cases.
I plan to create several articles on use cases related to JSON schemas and security automations in the near future, so stay tuned for those. If you found this article helpful let me know! If there is anywhere you feel it could have been improved, or if I missed some crucial piece of information then go ahead and yell at me in the comments below.
Thanks for reading!


Leave a Reply