10 ways to plot data
Table of Contents
1 Excel
The chart is contained in the Excel file and must be exported as an image file from Excel.
Excel does not leave a lot of room for customization.
It has a setting for bubble charts, but does not provide legends for them. It allows you do set axis boundaries, but it doesn't let you set the gridlines and axis labels independently from the chart's bounds. It is also inconvenient to separate the data into separate series—I had to make a new column for each manufacturer, in which the other manufacturer's entries are removed (this would make it especially hard to dynamically add more manufacturers).
While it is technically possible to make a chart with axes that do not start at a multiple of the major gridlines, or to add a legend for the bubbles' sizes, they would need to be added manually and might be difficult to adjust for changing data (though that could be automated). Instead, I left out the bubble size legend and used a simpler technique to hide where the axes extend beyond the expected lower bound.
Excel is a good tool for quickly dealing with large amounts of data and generating a limited set of visualizations, but it falls short when it comes to customizing those visualizations (unless you want to use VBA).
2 Gnuplot
To generate the image, run gnuplot/cars.gnuplot
from the project directory.
Gnuplot excels for creating simple charts, but it doesn't work as well when you want to manipulate your data before plotting it, such as to separate the manufacturers.
The usual way to prepare data for Gnuplot is to use external tools, so I wrote a python script that separates the manufacturers so Gnuplot can plot them separately.
While Gnuplot has extensive documentation, it is far from intuitive to use, which isn't helped by the abbreviated commands that are often used.
Gnuplot is good for creating simple visualizations, but its more advanced features aren't easy to learn.
3 Octave
To generate the image, run octave/cars.m
from the project directory.
Octave is capable of being used for plots, but plotting data is clearly not its primary function. The documentation isn't very clear for the more advanced options, and clear about all of its capabilities, which makes it a suboptimal tool for creating complex charts.
It also has some limitations, such as an inability to change the color of the
grid lines (the gridcolor
property is documented as unused), which I
overcame by drawing lines on the chart where the grid should be. I was unable
to find a simple way to make the circles transparent; Octave does not appear
to allow scatter plot markers to be transparent.
Octave is a math language; not a visualization language. While it can be used to visualize data, it is clear that the standard libraries were not written with that as a priority.
4 Mathematica
To generate the image, run wolframscript -f mathematica/cars.wls
from the
project directory.
Mathematica is a capable graphing tool, and has extensive documentation, but
getting used to the language takes time. Many of the features can be combined,
but I didn't have time to figure out how. However, one additional feature is
enabled by default—when viewing the chart in the notebook (.nb
) file,
hovering over a point displays the corresponding weight value.
It has extensive visualization capabilities, and using them is just a matter of knowing how to use the langauge and reading the documentation.
Given the choice, I might use Mathematica over any of the previous tools (and Matlab) to generate a static chart, but I would need time to become familiar with it first, and I doubt it surpasses d3 for dynamic content.
5 Matlab
To generate the image, run matlab/cars.sh
from the project directory.
Matlab is very similar to Octave, but better supports graph customization. Certain modifications are available in Matlab that are not in Octave (such as setting transparency and customizing the gridlines' color), and others are more accessible (such as setting the legend's title). The only issues I ran into were that all gridlines, axes, and tick marks in a graph share their line width, and hiding the axes also hides the background.
Additionally, Matlab's more active community makes it easier to find answers to questions about Matlab (such as finding which attributes must be set to customize part of a chart).
As an aside, Matlab also supports categorical data to an extent, which could allow more semantically-accurate manipulation of data sets (though using categorical data for the cars' manufacturers added little).
6 Plotly online editor
I made this chart in Plotly's online editor. The
chart can be viewed online here.
It is also saved as
plotly/cars.html
,
and an image of it is saved as
./img/plotly.png
.
Plotly is specialized for making charts, and was the easiest and fastest of the tools I've used so far. It doesn't offer the level of customization available in the some of the other tools, but more than makes up for it with ease of use.
Notably, Plotly is the first tool I've used that had a simple way to use textual data (the manufacturer column) to differeniate the different series to plot. It then allows individual customization of each series.
Plotly also allows a number of interactive options by default, but there are few options to customize them in the online editor.
Plotly also provides libraries for Python, R, and JavaScript, which allow more advanced customization.
7 Google Sheets
The Google sheet used to create this image can be viewed
here, and the image is saved as
./img/google-sheets.png
.
Google Sheets is generally similar to Excel, but the chart-editing menus feel more like Plotly. It's simple to navigate and perform basic customization, but many options are unavailable. The chart legend can't be moved easily, all series share a background color, and there is no option to add ticks to the axes.
Like Plotly, Google Sheets allows series to be determined by a column of the data. In a spreadsheet application, this is especially useful, because it allows new series (manufacturers) to be added just by adding new data to the sheet. However, the only per-series customization it allows is color.
The greatest failing of Google Sheets is that it doesn't allow the bubbles' size range to be adjusted, instead using an absolute minimum and maximum size. I cicrumvented this by adding a dummy data point of larger size to the chart, but this only allows the maximum bubble size to be decreased, and would have to be updated if the range of sizes in the data changed.
8 d3.js
The d3 chart can be generated by entering the d3
directory and running npm
run build
. This will generate the file as d3/dist/cars.html
.
D3 was fun to use. As a purpose-built library, it doesn't have the failings of some of the other systems I used. By building on the formats that support the internet as we know it (i.e. HTML, CSS, and JavaScript), d3 is able to take advantage of the customization they allow.
Not every option is obvious, but the documentation is relatively clear, and there are few unexpected behaviors. The only customization I looked for and didn't find was the ability to set tick marks by start and step.
Where it felt like I was wrestling with some of the other languages to make charts, d3 was pleasant and straightforward.
9 Flourish
I made this chart using Flourish's free online
editor. An interactive version can be viewed
here.
An image of the chart is saved as
./img/plotly.png
.
Flourish is relatively easy to use, but it has limited customization. It doesn't allow the axis labels to be set explicitly, nor does it allow for minor gridlines.
It also sets a very large minimum size for the chart's dots, which serves poorly for this chart's large range of values.
However, Flourish was the only tool I've used that provided a simple, built-in mechanism that allowed customizable hover text in the chart, which I took advantage of to additional information in the chart when hovering.
10 Kids' Zone
This chart was generated using this Kids' Zone Create a Graph tool and some Python.
To generate the data entry table, run kidszone/reformat.py
in the project
directory. Copy the output and run it in your web browser after setting the
number of points (50) and groups (5) in the Kids' Zone graph tool (create an
XY Bubble chart and go to the data tab to set those values). Then turn off the
labels and view or export the chart.
Kids' Zone isn't a particularly useful visualization tool, as it meant for childrens' use. The user interface doesn't allow data to be uploaded, and it has extremely limited customization options (the grid lines always include the borders, for instance). But by using some Python to transcribe the data to an HTML table and copy/pasting it into my browser's web inspector, I was able to upload almost all of the data and even make the bubbles transparent.
The tool's most important limitation is the maximum of 50 data points per series, which left one data point (the first) out of the chart. This could be remedied by repeating one of the series as a sixth column, but this does not extend to other datasets (and would repeat that series in the legend).
The bubbles also appear to use a relative scale, so I needed to add a dummy point to scale the bubbles to a reasonable size. However, unlike some other tools (e.g. Google charts), this also scales the smaller bubbles down, rather than scaling between a fixed maximum and minimum size.
While it isn't a particularly useful tool for data visualization, it is more kid-friendly than some of the more advanced tools, though I wouldn't recommend it for any child old enough for Google Sheets.
11 Other tools
I attempted to create a chart in Amcharts' online editor, but I couldn't figure out how to use the data in the graph. This tool appears to have quite a few customization options, but the basic setup does not seem to be intuitive.
12 Technical achievements
- The colors in the charts for Mathematica, Octave, Gnuplot, and Matlab can be
customized by modifying values in the file
colors.conf
and runningmake.sh
. - This document, when exported from
notes.org
as HTML, embeds the interactive charts and SVGs for better viewing and interaction, and displays them as PNGs when exported to other formats.
13 Design achievements
- The default color scheme used by the plots that read the custom colors is based on the color scheme found here (after a bit of research), and should be readable by people with most forms of color blindness. The colors were also assigned to manufacturers to minimize similarity between the manufacturers that overlap the most.
- In the Octave, Matlab, and Kids' Zone charts, the manufacturers with the fewest data points are rendered above the others to make them more visible. Plotly did this indcidentally, and most of the remaining charts (all but Excel) do not draw the manufacturers separately.
- This document, when viewed online, embeds the interactive charts so they may be used by the viewer.