--- title: "Web facilities" output: bookdown::html_document2: toc: true toc_float: true number_sections: true self_contained: false fontsize: 11pt bibliography: '`r system.file("REFERENCES.bib", package="gaawr2")`' csl: nature-genetics.csl pkgdown: as_is: true vignette: > %\VignetteIndexEntry{Web facilities} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} set.seed(0) knitr::opts_chunk$set( out.extra = 'style="display:block; margin: auto"', fig.align = "center", fig.path = "plumber/", collapse = TRUE, comment = "#>", dev = "png") ``` ```{r, echo=FALSE, message=FALSE, warning=FALSE} pkgs <- c("httr","httpuv","jsonlite","plumber","seqminer") for (p in pkgs) if (length(grep(paste("^package:", p, "$", sep=""), search())) == 0) { if (!requireNamespace(p)) warning(paste0("This vignette needs package `", p, "'; please install")) } invisible(suppressMessages(lapply(pkgs, require, character.only = TRUE))) ``` # Hello, world! ## Python Perhaps this is the simplest way to start from the Linux command line, nevertheless with Python, ```python python3 -m http.server 8000 firefox http://127.0.0.1:8000 ``` which works with `python3 -m http.server` by default when the port number 8000 is available. ## R Our focus here is R with the following script. ```r httpuv::startServer("0.0.0.0", 8000, list( call = function(req) { list( status = 200L, headers = list("Content-Type" = "text/plain"), body = "Hello, world!" ) }) ) ``` Upon accessing `http://127.0.0.1:8000`, we see the message "Hello, world!". Below, we assume that our working directory is the source package, # Data source We envisage a server from which data can be obtained, and two R packages **plumber** and **httpuv** are considered here. ## A meta-analysis The compressed data `IL.18R1-1.tbl.gz` is based on METAL @willer10, involving study-[1|2|3] as follows, ``` SEPARATOR TAB COLUMNCOUNTING STRICT CHROMOSOMELABEL CHR POSITIONLABEL POS CUSTOMVARIABLE N LABEL N as N TRACKPOSITIONS ON AVERAGEFREQ ON MINMAXFREQ ON ADDFILTER AF1 >= 0.001 ADDFILTER AF1 <= 0.999 MARKERLABEL SNP ALLELELABELS A1 A2 EFFECTLABEL BETA PVALUELABEL P WEIGHTLABEL N FREQLABEL AF1 STDERRLABEL SE SCHEME STDERR EFFECT_PRINT_PRECISION 8 STDERR_PRINT_PRECISION 8 GENOMICCONTROL OFF LOGPVALUE ON OUTFILE IL18R1B_dr- .tbl PROCESS study-1-IL18R1B.fastGWA.gz PROCESS study-2-IL18R1B.fastGWA.gz PROCESS study-3-IL18R1B.fastGWA.gz PROCESS study-1-IL18R1B-chrX.fastGWA.gz PROCESS study-2-IL18R1B-chrX.fastGWA.gz PROCESS study-3-IL18R1B-chrX.fastGWA.gz ANALYZE HETEROGENEITY CLEAR ``` Assuming the script is named `IL.18R1.metal`, `IL.18R1-1.tbl.gz` is generated from **htslib**, , as follows, ```bash export protein=IL.18R1 metal ${protein}.metal 2>&1 | \ tee ${protein}-1.tbl.log cat <(head -1 ${protein}-1.tbl) \ <(sed '1d' ${protein}-1.tbl | \ sort -k1,1n -k2,2n) | \ bgzip -f > ${protein}-1.tbl.gz tabix -S1 -s1 -b2 -e2 -f ${protein}-1.tbl.gz rm ${protein}-1.tbl ``` The bgzipped data allows for access by genomic region as shown below. ## plumber Software library `libsodium`, , is required for its installation. It is `an API generator in R`, which has been tested as follows. ```r get_data <- function(filename, region) { query_result <- seqminer::tabix.read(filename, region) hdr <- c("Chromosome", "Position", "MarkerName", "Allele1", "Allele2", "Freq1", "FreqSE", "MinFreq", "MaxFreq", "Effect", "StdErr", "logP", "Direction", "HetISq", "HetChiSq", "HetDf", "logHetP", "N") df <- read.table(text = paste(query_result, collapse = "\n"), sep = "\t", col.names=hdr) return(df) } plbr <- plumber::Plumber$new() plbr$handle("GET", "/tests", function(req, res) { protein <- req$args$protein region <- req$args$region if (is.null(protein) || is.null(region)) { res$status <- 400 return(list(error = "Both 'protein' and 'region' must be provided")) } filename <- file.path("tests",paste0(protein,"-1.tbl.gz")) print(filename) if (!file.exists(filename)) { res$status <- 404 return(list(error = paste("File for", protein, "not found"))) } data <- get_data(filename, region) json_data <- jsonlite::toJSON(data, dataframe = "rows", na = "null") res$setHeader("Content-Type", "application/json") return(json_data) }) options(width = 200) filename <- file.path("tests","IL.18R1-1.tbl.gz") region <- "2:102700000-103800000" data <- get_data(filename, region) head(data,1) plbr$run(port = 8001) ``` Indeed we can see that the first line of data, ``` Chromosome Position MarkerName Allele1 Allele2 Freq1 FreqSE MinFreq MaxFreq Effect StdErr logP Direction HetISq HetChiSq HetDf logHetP N 1 2 102700138 chr2:102700138_A_G a g 0.087 0.0207 0.0641 0.1376 -0.0566 0.0239 -1.75 -?-+n-?--n+ 78.2 36.757 8 -4.894 12799 ``` and ``` Running plumber API at http://127.0.0.1:8001 Running swagger Docs at http://127.0.0.1:8001/__docs__/ ``` ## Data access ### Browser/console So we get query results in JSON format from - **browser**: `http://localhost:8001/tests?protein=IL.18R1®ion=2:102700000-103800000` - **command line interface**: `curl "http://localhost:8001/tests?protein=IL.18R1®ion=2:102700000-103800000"` Additional work required to get output from `curl` to a tab-delimited data, ```bash curl "http://localhost:8001/tests?protein=IL.18R1®ion=2:102700000-103800000" | \ jq -r '.[0] | fromjson | .[] | [ .Chromosome, .Position, .MarkerName, .Allele1, .Allele2, .Freq1, .Effect, .StdErr, .logP, .Direction, .HetISq, .HetChiSq, .HetDf, .logHetP, .N ] | @tsv' ``` where 1. **.[0]**: Access the first element in the outer array (the string containing the JSON). 2. **fromjson**: Parse the string into a JSON object. 3. **.[]**: Iterate over the array inside the parsed JSON. 4. **[ ... ]**: Create an array of the values needed, each corresponds to a column in the TSV output. 5. **@tsv**: Convert the array into tab-separated values. Note also that only selected columns (as in 4) are kept. The simplest way to have the header is add it manually, ```bash ( echo "Chromosome|Position|MarkerName|Allele1|Allele2|Freq1|Effect|StdErr|logP|Direction|HetISq|HetChiSq|HetDf|logHetP|N" | \ sed 's/|/\t/g' curl command as above. ) ``` ### R The query above is easily furnished with **curl**: ```r tmp <- tempfile() curl::curl_download("http://localhost:8001/tests?protein=IL.18R1®ion=2:102700000-103800000", tmp) df <- jsonlite::fromJSON(readLines(tmp)) |> jsonlite::fromJSON(flatten=TRUE) |> as.data.frame() dim(df) ``` giving ``` [1] 4779 18 ``` ## httpuv The package gives a somewhat more involved version as follows, ```r dir.create("content/assets", recursive = TRUE) dir.create("content/lib", recursive = TRUE) s <- httpuv::startServer( host = "0.0.0.0", port = 5000, app = list( call = function(req) { list( status = 200L, headers = list( 'Content-Type' = 'text/html', 'Access-Control-Allow-Origin' = '*', 'Access-Control-Allow-Methods' = 'GET, POST, OPTIONS', 'Access-Control-Allow-Headers' = 'Content-Type' ), body = "Hello world!" ) }, staticPaths = list( "/assets" = "content/assets/", # Static assets "/lib" = httpuv::staticPath("content/lib", indexhtml = FALSE), "/lib/dynamic" = httpuv::excludeStaticPath() ), staticPathOptions = httpuv::staticPathOptions(indexhtml = TRUE) ) ) cat("Server running at http://0.0.0.0:5000\n") s$stop() ``` so mappings are created from `content/[assets, lib]` to `assets` and `lib`, while `httpuv::excludeStaticPath()` indicates that requests to `/lib/dynamic` will not be served as static files but could be handled dynamically by the app logic. # References