RWsearch stands for « Search in R packages, task views, CRAN and in the Web ».
This vignette introduces the following features cited in the README file:
.1. Provide a simple non-standard evaluation instruction to read and evaluate non-standard content, mainly character vectors.
.2. Download the files that list all available packages, archived packages, check_results and task views available on CRAN at a given date and rearrange them in convenient formats. The downloaded files are (size on disk in June 2021):
- crandb_down() => *crandb.rda* (4.9 Mo),
- archivedb_down() => *CRAN-archive.html* (3.3 Mo),
- checkdb_down() => *check_results.rds* (7.6 Mo),
- tvdb_down() => *tvdb.rda* (31 ko).
.3. List the packages that has been added, updated and removed from CRAN between two downloads or two dates.
.4. Search for packages that match one or several keywords in any column of the crandb data.frame and by default (with mode or, and, relax) in the Package name, Title, Description, Author and Maintainer columns (can be selected individually).
RWsearch has its own instruction to read and
evaluate non-standard content. cnsc()
can
replace c()
in most places. Quoted
characters and objects that exist in .GlobalEnv are evaluated.
Unquoted characters that do not represent any objects in
.GlobalEnv are transformed into quoted characters. This saves a
lot of typing.
Since R 3.4.0, CRAN updates every day the file /web/packages/packages.rds that lists all packages available for download at this date (archived packages do not appear in this list). This file is of a high value since, once downloaded, all items exposed in the DESCRIPTION files of every package plus some additional information can be explored locally. This way, the most relevant packages that match some keywords can be quickly found with an efficient search instruction.
The following instruction downloads from your local CRAN the file /web/packages/packages.rds, applies some cleaning treatments, saves it as crandb.rda in the local directory and loads a data.frame named crandb in the .GlobalEnv:
crandb_down()
# $newfile
# crandb.rda saved and loaded. 13745 packages listed between 2006-03-15 and 2019-02-21
ls()
# [1] "crandb"
The behaviour is slightly different if an older file crandb.rda downloaded a few days earlier exists in the directory. In this case, crandb_down() overwrites the old file with the new file and displays a comprehensive comparison:
crandb_down()
# $newfile
# [1] "crandb.rda saved and loaded. 13745 packages listed between 2006-03-15 and 2019-02-21"
# [2] "0 removed, 4 new, 52 refreshed, 56 uploaded packages."
# $oldfile
# [1] "crandb.rda 13741 packages listed between 2006-03-15 and 2019-02-20"
# $removed_packages
# character(0)
# $new_packages
# [1] "dang" "music" "Rpolyhedra" "Scalelink"
# $uploaded_packages
# [1] "BeSS" "blocksdesign" "BTLLasso" "cbsodataR"
# ...
# [53] "tmle" "vtreat" "WVPlots" "xfun"
In recent years, CRAN has been growing up very fast but has also changed a lot. From the inception of RWsearch to its first public release, the number of packages listed in CRAN has increased from 12,937 on August 18, 2018 to 13,745 packages on February 21, 2019 as per the following table.
Date | Packages | Comments |
---|---|---|
2018-08-18 | 12,937 | |
2018-08-31 | 13,001 | |
2018-09-30 | 13,101 | |
2018-10-16 | 13,204 | |
2018-11-18 | 13,409 | |
2018-12-04 | 13,517 | 28 new packages in one single day |
2018-12-29 | 13,600 | |
2018-01-19 | 13,709 | |
2018-01-29 | 13,624 | 85 packages transferred to archive |
2019-02-11 | 13,700 | |
2019-02-21 | 13,745 |
This difference of 808 packages hides the real numbers: 1010 new packages were added to CRAN and 202 packages were archived during the same period. RWsearch easily reveals these numbers with the instruction
lst <- crandb_comp(filename = "crandb-2019-0221.rda", oldfile = "crandb-2018-0818.rda")
lst$newfile
# [1] "crandb-2019-0221.rda 13745 packages listed between 2006-03-15 and 2019-02-21"
# [2] "202 removed, 1010 new, 2853 refreshed, 3863 uploaded packages."
Here are some more recent values. 5234 new packages were added to CRAN and 1307 packages were removed from CRAN between February 21, 2019 and June 2, 2021, a period of 832 days. The apparent increase of 4.72 package per day is indeed the difference between 6.3 new packages added and 1.57 packages removed per day.
Date | Packages | Comments |
---|---|---|
2019-02-21 | 13,745 | |
2019-10-02 | 15,000 | |
2020-07-04 | 16,003 | |
2021-01-22 | 17,006 | 41 new packages in one single day |
2021-06-02 | 17,672 |
lst2 <- crandb_comp(filename = "crandb-2021-0602.rda", oldfile = "crandb-2019-0221.rda")
lst2$oldfile
# [1] "crandb-2019-0221.rda 13745 packages listed between 2006-03-15 and 2019-02-21"
lst2$newfile
# [1] "crandb-2021-0602.rda 17672 packages listed between 2006-03-15 and 2021-06-02"
# [2] "1307 removed, 5234 new, 6176 refreshed, 11410 uploaded packages."
Extracting the packages that have been recently uploaded in
CRAN is of a great interest. For a given
crandb data.frame loaded in
.GlobalEnv, the function
crandb_fromto()
allows to search betwen
two dates or by a number of days before a certain date. Here, the result
is calculated between two calendar dates whereas the above item
$uploaded_packages
returns the difference between
two files (which are not saved exactly at midnight).
crandb_fromto(from = -1, to = "2019-06-02")
# [1] "additive" "affinity" "analogsea" "audiometry"
# ...
# [89] "trekcolors" "trekfont" "udpipe" "vegclust"
# [93] "wordpressr" "xROI"
>
The function
crandb_fromto(from = "2021-01-01", to = "2021-06-02")
returns 4600 packages and suggests that 26 % of the published CRAN
packages have been refreshed in the last 6 months.
Five instructions are available to search for
keywords in crandb and extract the
packages that match these keywords:
s_crandb()
,
s_crandb_list()
,
s_crandb_PTD()
,
s_crandb_AM()
,
s_crandb_tvdb()
. Arguments
select, mode, sensitive, fixed
refine the search.
Argument select can be any (combination of) column(s) in crandb. A few shortcuts are “P”, “T”, “D”, “PT”, “PD”, “TD”, “PTD”, “A”, “M”, “AM” for the Package name, Title, Description, Author and Maintainer.
s_crandb() accepts one or several keywords and
displays the results in a flat format. By default, the search is
conducted over the Package name, the package Title and the Description
(“PTD”) and with mode = "or"
. It can be refined to package
Title or even Package name. mode = "and"
requires the two
keywords to appear in all selected packages.
s_crandb(Gini, Theil, select = "PTD")
# [1] "adabag" "binequality" "boottol" "copBasic"
# [5] "CORElearn" "cquad" "deming" "educineq"
# [9] "genie" "GiniWegNeg" "IATscores" "IC2"
# [13] "lctools" "mblm" "migration.indices" "npcp"
# [17] "rkt" "rpartScore" "RSAlgaeR" "rsgcc"
# [21] "scorecardModelUtils" "segregation" "SpatialVS" "Survgini"
# [25] "tangram" "valottery" "vardpoor"
s_crandb_list() splits the results by keywords.
Here, argument select = "P"
returns 28 and 22 packages
whereas argument select = "PT"
would have returned 151 and
106 packages and argument select = "PTD"
would have
returned 952 and 491 packages. This refining option is one of the most
interesting features of RWsearch.
s_crandb_list(search, find, select = "P")
# $search
# [1] "ACEsearch" "AutoSEARCH" "bsearchtools" "CRANsearcher"
# [5] "discoverableresearch" "doebioresearch" "dosearch" "elasticsearchr"
# [9] "ExhaustiveSearch" "fabisearch" "FBFsearch" "ForwardSearch"
# [13] "lavaSearch2" "pdfsearch" "pkgsearch" "randomsearch"
# [17] "ResearchAssociate" "rpcdsearch" "RWsearch" "searchConsoleR"
# [21] "searcher" "SearchTrees" "shinySearchbar" "tabuSearch"
# [25] "TreeSearch" "uptasticsearch" "VetResearchLMM" "websearchr"
# $find
# [1] "colorfindr" "DoseFinding" "echo.find" "EcotoneFinder"
# [5] "featurefinder" "FindAllRoots" "findInFiles" "FindIt"
# [9] "findpython" "findR" "findviews" "geneSignatureFinder"
# [13] "LncFinder" "MinEDfind" "mosaic.find" "packagefinder"
# [17] "pathfindR" "pathfindR.data" "PetfindeR" "TCIApathfinder"
# [21] "UnifiedDoseFinding" "wfindr"
select | search | find |
---|---|---|
“PTD” | 952 | 491 |
“D” | 913 | 446 |
“PT” | 151 | 106 |
“T” | 143 | 98 |
“P” | 28 | 22 |
s_crandb_PTD() splits the results by Package name, package Title and Description.
s_crandb_PTD(kriging)
# $Package
# [1] "constrainedKriging" "DiceKriging" "kriging" "MuFiCokriging"
# [5] "OmicKriging" "quantkriging"
# $Title
# [1] "ARCokrig" "atakrig" "autoFRK" "constrainedKriging"
# [5] "DiceKriging" "DiceOptim" "fanovaGraph" "FRK"
# [9] "geoFKF" "intkrige" "krige" "kriging"
# [13] "KrigInv" "LatticeKrig" "ltsk" "moko"
# [17] "MuFiCokriging" "quantkriging" "SK" "spatial"
# $Description
# [1] "ARCokrig" "atakrig" "autoFRK" "blackbox"
# [5] "CensSpatial" "constrainedKriging" "convoSPAT" "DiceEval"
# [9] "DiceKriging" "diffMeshGP" "EnvExpInd" "fanovaGraph"
# [13] "fields" "FRK" "GauPro" "geofd"
# [17] "geoFKF" "georob" "geostats" "GPareto"
# [21] "gstat" "intkrige" "krige" "kriging"
# [25] "LatticeKrig" "LSDsensitivity" "ltsk" "moko"
# [29] "MuFiCokriging" "OmicKriging" "phylin" "profExtrema"
# [33] "psgp" "quantkriging" "sgeostat" "SK"
# [37] "spatial" "SpatialTools" "sptotal" "stilt"
s_crandb_AM() splits the results by package Author and package Maintainer.
s_crandb_AM(Kiener, Dutang)
# $Kiener
# $Kiener$Author
# [1] "DiceDesign" "FatTailsR" "incase" "NNbenchmark" "RWsearch"
#
# $Kiener$Maintainer
# [1] "FatTailsR" "NNbenchmark" "RWsearch"
#
# $Dutang
# $Dutang$Author
# [1] "actuar" "biglmm" "ChainLadder" "expm"
# [5] "fitdistrplus" "GNE" "gumbel" "kyotil"
# [9] "lifecontingencies" "mbbefd" "NNbenchmark" "OneStep"
# [13] "plotrix" "POT" "randtoolbox" "rhosp"
# [17] "rngWELL" "RTDE" "tsallisqexp"
#
# $Dutang$Maintainer
# [1] "GNE" "gumbel" "mbbefd" "OneStep" "POT"
# [6] "randtoolbox" "rhosp" "rngWELL" "RTDE" "tsallisqexp"
s_crandb_tvdb() is an instruction for task view maintenance. Please, read the corresponding vignette.
s_sos() is a wrapper of the
sos::findFn()
function provided by the
excellent package sos. It goes deeper
than s_crandb()
as it searchs for keywords
inside all R functions of all R packages (assuming 30 functions per
package, about 17,672 x 30 = 530,160 pages). The query is sent to the
University of Pennsylvania and the result is displayed as an html page
in the browser. The server has recenty encountered some problems and
might be down.
The result can be converted to a data.frame. Here, the browser is launched only at line 3.
The search is global and should be conducted with one word preferably. Searching for ordinary keywords like search or find is not recommanded as infinite values are returned most of the time.