
03.09.2023 - Thank you
I would like to thank FNHW for the opportunity to give a talk on graph databases as part of the Summer School Life Science.

As promised, I will shed further light on the area of Cypher and list more links to make it easier to work with. As discussed in the presentation, I will also introduce Pubchem's data access with a code example.
Step-by-step guide - 10.09.2023
-
Inspecting the JSON response (work ongoing)
-
Creating the first Node (work ahead)

Intro - Loading data from PubChem into Neo4j
PubChem is a free-to-access repository of chemical molecules and their activities against biological assays. It's part of the National Center for Biotechnology Information (NCBI), which falls under the United States National Library of Medicine, a branch of the National Institutes of Health (NIH).
The primary purpose of PubChem is to serve as a resource for the global scientific community by providing a comprehensive and easily accessible database of chemical substances. This includes their chemical structures, properties, and bioactivity data. Researchers, students, and educators alike can tap into PubChem to find detailed information on chemical compounds, their interactions, and their roles in various biological processes.
Furthermore, PubChem aids in advancing the fields of cheminformatics and molecular biology by offering tools for researchers to analyze and visualize complex datasets, helping to drive innovative discoveries in medicine, biology, and chemistry.
Accessing Data using PubChem Rest API
The PubChem RESTful API (often referred to as the "PUG REST" for "Power User Gateway REST") provides methods to query for compounds, substances, assays, and much more. Users can fetch data in various formats, such as XML, JSON, and CSV, among others.
Through the RESTful API, one can perform tasks like:
-
Retrieving compound information based on specific criteria.
-
Downloading structures of compounds in different formats.
-
Searching for compounds based on specific assays or biological activities.
-
And many more advanced queries and tasks.
Let's take a look at 2-Acetoxybenzoic acid also known as Aspirin. The PubChem CID is 2244.
The HTTP API request can be composed according to your own needs. The following instruction page contains all the key words.

Testing the API call in your Browser
The HTTP URL is composed of the following elements:
________________________________________________________________________________________________________________________________________
https://pubchem.ncbi.nlm.nih.gov/rest/pug/<input specification>/<operation specification>/[<output specification>][?<operation_options>]
________________________________________________________________________________________________________________________________________
API Rest Call : https://pubchem.ncbi.nlm.nih.gov/rest/pug
<input specification>. : <domain>/<namespace>/<identifiers>
/compound/cid/2244
<operation specification> : /record
<output specification> : /JSON
________________________________________________________________________________________________________________________________________
Thus, the final URL is:
https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/record/JSON
Preparing the local Neo4j database
So the next step is to call this URL in Cypher to create a node from the JSON response. Now lets create and execute a Cypther statement in a empty graph database. Following steps help you to install and create a empty local graph database instance:
-
Download Neo4j desktop here
-
Click NEW to create a new Project "i.e. FHNW Demo"
-
Click ADD to create a empty Database "i.e. "Demo API"

4. Click START to start the new database
5. Click Demo API and on the Plugin tab choose APOC + Install and Restart
(APOC is Neo4j internal library and needed for our external API Request)

6. After restarting click OPEN and start the Neo4j Browser

Neo4j Browser is starting and ready to interact with our empty database :-)

Calling the HTTP within the Neo4j Browser

Press the blue arrow to execute the cypher command.
You should see the following response. The api_response object is exactly the same as the JSON response that you received when you called it from the URL in your browser.
https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/record/JSON
We now have a successful connection in Neo4j Browser that returns a JSON object.
The next steps would be to create the first node in our empty graph database.

Inspecting the JSON file
To understand the structure of JSON files please use a tool like VISUAL STUDIO CODE and install the Visual Studio Code Extension "JSON EDITOR" by Nick DeMajo.
The JSON Editor helps you to examine the structure of the PubChem response File.
In the "JSON Editor" on the right side you see that the compound 2244 consist of 1 object holding 1 Compound Object. The compound Object <PC_Compound> has a has an Array index of <0> since we only requested 1 Compound. Under this all the data of the compound are listed.
-
<id> The id holds only the CID -> 2244 value
-
<atoms>. Holds further information about the atoms (more later)
-
<bonds>. Holds further information about the bonds (more later)
-
<coord>. Holds further information about the coordinates (more later)
-
<charge>. Holds further information about the atoms (more later)
-
<props>. Contains 22 properties describing the compound
-
<count>. Summary of the compound
Is the following example we are only interested in the properties part <Props>.
PubChem JSON
JSON Editor

To create our first compound in the database we will need <id>, <props> and the summary <count > part of the json file.
In following sample we advise Cypher to take only the <prop> part of the file.
So we UNWIND (cypther command) the whole value but specify to take the props part only.

When This stament is entred in the Neo4J Browser the following response is returned:
Note that 22 record had been streamed. One of the properties is the FINGERPRINT of the compound.

Creating a Node
Here is ongoing work. Stay tuned!


Standard IUPAC International Chemical Identifier (InChI). It
inCHi
PROPERTY
https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/CID/2244/property/MolecularFormula/JSON
https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/CID/2244/property/SubstanceSynonym/JSON
SUBSTANCE_SYNONYM
https://pubchem.ncbi.nlm.nih.gov/rest/pug_view/data/compound/2244/JSON
Pubchem Data Specification:
https://pubchem.ncbi.nlm.nih.gov/docs/data-specification