Introduction to datasets

In the Berkeley Studio you can make use of datasets. Generally speaking, a dataset is a collection of structured data. Most people are more or less familiar with datasets as they appear in spreadsheet software, like Microsoft Excel. For example, consider the following example in Excel:

A employee spreadsheet in Microsoft Excel
A employee spreadsheet in Microsoft Excel

In this dataset there are columns and rows, presenting data on each employee. Let’s imagine that we want to use this data in a Berkeley Studio model. In the following sections, we’ll explain how datasets work in the Berkeley Studio, how they are made and how we can use them. Datasets are slightly more difficult to grasp than most other elements of the Studio. Therefore, we advise beginners to do the guided tour first. Furthermore, please take a look at model structure and input types.

The structure of a dataset

When using datasets, there is a difference between the structure of the dataset and the actual data. Recall the Excel spreadsheet we used before? The columns in the spreadsheet give structure to the data: they determine where the employee names belong, where the salary and so on. The rows then present the actual data, the employees themselves. The picture below illustrates this:

Data structure in Excel
Data structure in Excel

In the Berkeley Studio, datasets work like this: First we have to create a structure for our dataset and then we can put the data in there. However, in the Berkeley Studio the structure is not created with columns but with a graph. So in order to create a dataset, we first create a graph (the ‘structure’) and then fill it with information (the ‘actual data’). You can put information in a dataset right away (from an Excel file, for example) or you can allow the end user of your model to fill it. For now, let’s take a look at the uses of a dataset.

Why use datasets?

You can use datasets for almost everything. Examples are input for questions, presenting tables to users and accessing data from outside your model. More importantly, datasets can be filtered, parts can be selected and you can throw away parts.

For now, we will use the Excel example. In the next sections, we’re going to make a model that allows the user to view different parts of the dataset and edit it.