Making Manhattan plots as vector graphics in R with ggplot2

Manhattan plots are a widely used tool in statistical genetics to visualise the results of genome-wide association studies (GWAS). While they are simply scatter plots, the number of points to display is often in the millions and this makes them impractical to render as vector graphics in formats such as PDF or SVG. This is unfortunate as these formats are the gold standard for technical plots and are often requested by academic journals when submitting an article for publication.

This repo contains an R function to generate Manhattan plots with ggplot2 that can quickly be exported into a moderately-sized PDF file with ggplot2::ggsave (it can also be exported to SVG, although the resulting file is larger; I haven’t tested other vector formats). It is based on Holtz Yan’s excellent Manhattan plot function, produced for the R Graph Gallery, which I extended by merging overlapping points into single shapes to simplify the resulting output.

The key idea is to use software for processing and plotting geographic features (spefically, I use the sf R package) to convert each data point into a circle (a polygon in simple features language), merge overlapping circles into a single shape (perform a union operation) and finally plot these simplified shapes.

The function to generate these plots is provided in the fn-ggmanh_vec.R script and is named ggmanh_vec. Please see the beginning of this file for a description of the different options available.

To illustrate how to use the function, I provide a simple reproducible example in the script make-manhattan-vec.R. I start by downloading summary statistics of a GWAS of standing height in European-ancestry samples in the UK Biobank which includes 12 million variants from Watanabe et al. (2019) (the article’s preprint is available on bioRxiv and the results can be downloaded from the GWAS Atlas). I then call ggman_vec to make a basic Manhattan plot and export it to PDF format with ggplot2::ggsave, obtaining the file ukb-height-gwas.pdf whose size is 751kB.

Manhattan plot of standing height in the UK Biobank (European-ancestry population)
Click on image to see PDF version

Runtime

Making the example Manhattan plot mentioned above takes approximately 24min using four Intel Skylake 2.6GHz processors, each with 16GB of RAM. A simplified version in which we only plot points with p-value lower than or equal to 0.01 (approximately 1.2 million points) takes only 3min12s with the same resources and produces the file ukb-height-gwas-pv2.pdf. For comparison, a standard rasterised PNG plot takes 2min14s to make with a slightly modified (and not parallelised) version of Holtz Yan’s original function.

Dependencies

The following R packages are required:

In addition, the sf R package requires that the GDAL library be available on the system.

This code has been tested in R 3.6.2 (on CentOS 7.8.2003 with GDAL 3.0.2 installed) with doParallel 1.0.16, dplyr 1.0.6, foreach 1.5.1, ggplot2 3.3.3 and sf 1.0-0.