1 Introduction

The aim of TKCat (Tailored Knowledge Catalog) is to facilitate the management of data from knowledge resources that are frequently used alone or together in research environments. In TKCat, knowledge resources are manipulated as modeled database (MDB) objects. These objects provide access to the data tables along with a general description of the resource and a detail data model generated with ReDaMoR documenting the tables, their fields and their relationships. These MDB are then gathered in catalogs that can be easily explored an shared. TKCat provides tools to easily subset, filter and combine MDBs and create new catalogs suited for specific needs.

The TKCat R package is licensed under GPL-3.

This vignette describes how to create and administrate a TKCat ClickHouse instance. An other vignette focuses how TKCat can be used with a ClickHouse database. Users should also refer to the general TKCat user guide.

2 Instantiating the ClickHouse database

2.1 Install ClickHouse, initialize and configure the TKCat instance

The ClickHouse docker container supporting TKCat, its initialization and its configuration procedures are implemented here: S01-install-and-init.R

This script should be adapted according to requirements and needs.

The data are stored in the TKCAT_HOME folder.

2.2 Cleaning and removing a TKCat instance

Stop and remove the docker container.

# In shell
docker stop ucb_tbn_tkcat
docker rm ucb_tbn_tkcat
docker volume prune -f
# Remove the folder with all the data: `$TKCAT_HOME`.`
sudo rm -rf ~/Documents/Projects/TKCat_UCB_TBN

3 User management

User management requires admin right on the database.

3.1 Creation

k <- chTKCat(user="pgodard")
create_chTKCat_user(k, login="lfrancois", contact=NA, admin=FALSE)

The function will require to setup a password for the new user.

3.2 Drop

drop_chTKCat_user(k, login="lfrancois")

4 chMDB management

4.1 chMDB Creation

Before MDB data can be uploaded, the database should be created.

create_chMDB(k, "CHEMBL", public=FALSE)

By default chMDB are not public. It can be changed through the public parameter when creating the chMDB or by using the set_chMDB_access afterward.

set_chMDB_access(k, "CHEMBL", public=TRUE)

Then, users having access to the chMDB can be identified with or without admin rights on the chMDB. Admin rights allow the user to update the chMDB data.

add_chMDB_user(k, "CHEMBL", "lfrancois", admin=TRUE)
# remove_chMDB_user(k, "CHEMBL", "lfrancois")
list_chMDB_users(k, "CHEMBL")

4.2 Populating chMDB

Each chMDB can be populated individualy using the as_chMDB() function. The code chunk below shows how to scan a directory for all fileMDB it contains. The as_memoMDB() function load all the data in memory and checks that all the model constraints are fulfilled (this step is optional). The overwrite parameter of the as_chMDB() function allows updating the data in the database.

lc <- scan_fileMDBs("fileMDB_directory")
## The commented line below allows the exploration of the data models in lc.
# explore_MDBs(lc)
for(r in toFeed){
   message(r)
   lr <- as_memoMDB(lc[[r]])
   cr <- as_chMDB(lr, k, overwrite=TRUE)
}

4.3 Deleting a chMDB

Any admin user of a chMDB can delete the corresponding data.

empty_chMDB(k, "CHEMBL")

Only a system admin can delete the chMDB from the ClickHouse database.

drop_chMDB(k, "CHEMBL")

5 Collection management

The aim of collections is described in the general user guide

Collections needs to be added to a chTKCat instance in order to support collection members of the different chMDB. They can be taken from the TKCat package environment, from a JSON file or directly from a JSON text variable. Additional functions are available to list and remove chTKCat collections.

add_chTKCat_collection(k, "BE")
list_chTKCat_collections(k)
remove_chTKCat_collection(k, "BE")

6 Implementation

6.1 Data models

6.1.1 Default database

The default database stores information about chTKCat instance, users and user access.

6.1.2 Modeled databases

Modeled databases (MDB) are stored in dedicated database in chTKCat. Their data model is provided in dedicated tables described below.

7 Acknowledgments

This work was entirely supported by UCB Pharma (Early Solutions department).