2.2 PUG-RESTY Activity

2.2 PUG-RESTY Activity#

Getting Molecular Properties through PubChem’s PUG REST Web Interface#

PubChem’s Power User Gateway(PUG) REST interface is a web-based Application Programming Interface (API) that allows for programmatic access to PubChem’s chemical database. Specifically it is a RESTful API, meaning it follows the Representational State Transfer (REST) interface. REST uses standard web protocols (like HTTP) to enable users to retrieve information using simple URL-based queries.

As a REST API, each request for data sent to the server is independent and contains all the information required to fulfill the request. The server does not retain any information from previous requests or even other user sessions. Therefore, it can be easily automated programmatically since connections or sessions do not need to be maintained.

In this notebook, we will explore the API and retrieve chemical data from PubChem programmatically.

Overview

Objectives#

Learn the basic approach to getting data from PubChem through PUG-REST

Retrieve a single property of a single compound.

Retrieve a single property of multiple compounds

Retrieve multiple properties of multiple compounds.

Write a for loop to make the same kind of requests.

Process a large amount of data by splitting them into smaller chunks

1. The Shortest Code to Get PubChem Data#

Let’s suppose that we want to get the molecular formula of prednisone (a synthetic steroid that reduces inflammation). You can get this by doing a search for predinsone at PubChem or you can get this data direct from your web browser (Chrome, Safari, Internet Explorer, etc) by typing in the following URL:

https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/prednisone/property/MolecularFormula/txt

Getting the same data using a computer program is not very difficult. This task can be with three lines of Python code.

Line 1: First, we import the requests library (https://3.python-requests.org/), which is a collection of pre-written Python code that makes it easy to access information from the web. This library allows us to send requests for information to PubChem and handle the response in our Python code.

import requests

Line 2: Second, we need to ask the server for data. We use the get() function from the requests library. This kind of request is called a GET request, which is a common way to ask a web service for information. The PUG-REST request URL (enclosed within a pair of quotes ' ' or " " ) is provided as the argument inside the parentheses. The result will be stored in a variable called res which is short for “result” or “response”. This creates a Response object, which contains all the data sent back from the server, including the actual information we asked for.

res = requests.get('https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/prednisone/property/MolecularFormula/txt')

Line 3: The res variable contains not only the requested data but also some additional information about the request itself. To view the actual data returned by the server, we need to access the text of the response. This is done using res.text, which gives us the response as a string. The text is a property of the Response object we created in line 2. To view the response data, we can use the print() function to display the result in the notebook.

print(res.text)

C21H26O5

As another example, the following code retrieves the number of heavy (non-hydrogen) atoms of butadiene. (Note: we don’t need to import the requests library here as we have done so already. All we need to do is make the request and print out the result)

res = requests.get('https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/butadiene/property/HeavyAtomCount/txt') print(res.text)

4

While many of the properties you can retrieve from PubChem are fairly intuitive, it’s important to use the exact property names that PubChem recognizes (the correct syntax.) For example, we have used HeavyAtomCount and MolecularFormula. You can find a list of available properties and their correct syntax in the PubChem documentation. Click here to open a browser window with a list of properties in the PubChem database.

Exercise 1a:
Retrieve the molecular weight of ethanol in a “text” format.

# Write your code in this cell:

Exercise 1b:
Retrieve the number of hydrogen-bond acceptors of aspirin in a “text” format.

# Write your code in this cell:

2. Formulating PUG-REST request URLs using variables#

In the earlier examples, we passed the PUG REST request URL directly into the requests.get() function by typing the full URL inside the parentheses. However, it’s also possible to build the URL using variables. The example below shows how you can create a request URL by combining (concatenating) smaller variables into a complete URL, which is then passed to requests.get(). This approach makes it easier to change parts of the request—such as the compound name or property without rewriting the whole URL.

A PUG-REST request URL encodes three pieces of information (input, operation, output), preceded by the prologue common to all requests. In the follwoing code cell, these pieces of information are stored in four different variables (pugrest, pugin, pugoper, pugout) and combined into a new variable url.

pugrest = "https://pubchem.ncbi.nlm.nih.gov/rest/pug" pugin = "compound/name/water" pugoper = "property/MolecularFormula" pugout = "txt" url = pugrest + '/' + pugin + '/' + pugoper + '/' + pugout print(url)

https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/water/property/MolecularFormula/txt

Notice in the code cell above that we built the URL by concatenating variables and manually adding ‘/’ between each part. An alternative way to do this is by using Python’s join() function, which is used to combine a list of strings into a single string inserting a specified separator between each element. This can make your code cleaner and more flexible when working with multiple pieces of a URL.

url = "/".join( [pugrest, pugin, pugoper, pugout] ) print(url)

https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/water/property/MolecularFormula/txt

Here, the strings stored in the four variables are joined by the / character as a separator. Notice that the four variables are placed inside square brackets [ ], which creates a list containing those strings. The list is passed to the .join() function, which combines the elements into a single URL with / inserted between each part.

Now we can pass the url to request.get().

res = requests.get(url) print(res.text)

H2O

Warning
Avoid using in or input as a variable name in python. The word in is a reserved keyword and input is the name of a built-in function. See Python Reserved Words for a list of reserved words. In the example above, the variables are prefixed with pug to avoid these naming conflicts.

Exercise 2a:
Retrieve the rotatable bond count of phenylalanine in a “text” format using variables to generate the URL.

# Write your code in this cell:

Exercise 2b:
Retrieve the charge of phosphate in a “text” format using variables to generate the URL.

# Write your code in this cell:

3. Making multiple requests using a for loop#

It might seem unneccesarily complicated to use variables to build a request URL compared to typing in one full URL directly into requests.get(). If you are only making one request, it definitely is easier to write the one full URL.

However, if you are making many similar requests, manually typing each URL would be time-consuming and very error-prone. In this case, it is much more efficient to store the common parts of the url as variables and use a loop to generate full URLs for the part that changes. For example, suppose you want to retrieve the SMILES strings for five different molecules. This approach would save a lot of repetitive typing and make your code easier to modify and reuse.

Note: In PubChem the ConnectivitySMILES property contains the connectivity layer only, and does NOT include stereochemistry or isotope, while the SMILES property includes both stereochemical and isotopic information)

# create a list of compound names that we would like SMILES for names = [ 'cytosine', 'benzene', 'motrin', 'aspirin', 'zolpidem' ]

Now the chemical names are stored in a list called names.

Using a for loop, we can go through each chemical name in the list, create a request URL by updating the name in the appropriate spot, and then retrieve the desired data for each molecule. This allows us to automate multiple requests efficiently, as shown in the example below.

pugrest = "https://pubchem.ncbi.nlm.nih.gov/rest/pug" pugoper = "property/ConnectivitySMILES" pugout = "txt" for myname in names: # loop over each element in the "names" list pugin = "compound/name/" + myname url = "/".join( [pugrest, pugin, pugoper, pugout] ) res = requests.get(url) print(myname, ":", res.text)

cytosine : C1=C(NC(=O)N=C1)N benzene : C1=CC=CC=C1 motrin : CC(C)CC1=CC=C(C=C1)C(C)C(=O)O aspirin : CC(=O)OC1=CC=CC=C1C(=O)O zolpidem : CC1=CC=C(C=C1)C2=C(N3C=C(C=CC3=N2)C)CC(=O)N(C)C

Warning
When you make a lot of programmatic access requests using a loop, you should limit your request rate to or below five requests per second. Please read the following document to learn more about PubChem’s usage policies: https://pubchemdocs.ncbi.nlm.nih.gov/programmatic-access$_RequestVolumeLimitations

Violation of usage policies may result in the user being temporarily blocked from accessing PubChem (or NCBI) resources**

In the for loop example above, we are only processing five chemical names, so it’s unlikely that we’ll exceed PubChem’s limit of five requests per second. However, it is much more common to work with hundreds or thousands of chemical names and it is very likely that the code will exceed this limit.

To prevent overloading the server and to follow PubChem’s usage guidelines, it is a good idea to slow down the request rate by adding short delay to each request. This can be done using the sleep() function from Python’s time module.

For example, let’s suppose you have 12 chemical names to process (though in a real project, you will have many more). Rather than sending these requests immediately one after another, we will add a short pause to ensure we stay under the limit and be a good API user.

names = [ 'water', 'benzene', 'methanol', 'ethene', 'ethanol', \ 'propene','1-propanol', '2-propanol', 'butadiene', '1-butanol', \ '2-butanol', 'tert-butanol']

import time pugrest = "https://pubchem.ncbi.nlm.nih.gov/rest/pug" pugoper = "property/ConnectivitySMILES" pugout = "txt" for i in range(len(names)): # loop over each index (position) in the "names" list pugin = "compound/name/" + names[i] # names[i] = the ith element in the names list. url = "/".join( [pugrest, pugin, pugoper, pugout] ) res = requests.get(url) print(names[i], ":", res.text) time.sleep(0.2) # pause for 0.2 seconds between requests to be polite to the server

water : O benzene : C1=CC=CC=C1 methanol : CO ethene : C=C ethanol : CCO propene : CC=C 1-propanol : CCCO 2-propanol : CC(C)O butadiene : C=CC=C 1-butanol : CCCCO 2-butanol : CCC(C)O tert-butanol : CC(C)(C)O

There are three noteworthy items in the above example:

First, the for loop interates from 0 to [len(names) − 1], that is, [0, 1, 2, 3, …,11]. (Recall python counts start at 0)

The variable i is used (in names[i]) to generate the input part (pugin) of the PUG-REST request URL.

After each request we pause for 0.2 seconds to ensure we are only making 5 requests per second (0.2sec x 5/sec = 1 sec)

The following is another method for breaking up the requests. In this case after every 5 requsts, we will pause the program for 1 second to ensure we don’t overload the server.

import time pugrest = "https://pubchem.ncbi.nlm.nih.gov/rest/pug" pugoper = "property/ConnectivitySMILES" pugout = "txt" for i in range(len(names)): # loop over each index (position) in the "names" list pugin = "compound/name/" + names[i] # names[i] = the ith element in the names list. url = "/".join( [pugrest, pugin, pugoper, pugout] ) res = requests.get(url) print(names[i], ":", res.text) if ( i % 5 == 4 ) : # the % is the modulo operator and returns the remainder of a calculation (if i = 4, 9, ...) time.sleep(1)

water : O benzene : C1=CC=CC=C1 methanol : CO ethene : C=C ethanol : CCO propene : CC=C 1-propanol : CCCO 2-propanol : CC(C)O butadiene : C=CC=C 1-butanol : CCCCO 2-butanol : CCC(C)O tert-butanol : CC(C)(C)O

In this version of the code, we are still looping through the list of molecule names and updating the pugin variable each time. However, we’ve introduced a new variable i, which keeps track of how many requests have been made. Using an if statement, we check wether i is a multiple of 5. If i is, we pause the program for one second using time.sleep(1). This method also helps us stay within PubChem’s limit of 5 requests per second by introducing a short break after every 5 requests.

It’s important to note that PubChem’s request volume limit can be lowered during times of high server load. This process is called dynamic request throttling. When throttling is active, the server includes information in the HTTP response headers indicating the current system load and any temporary per-user limits. Based on this information, users are expected to adjust the speed at which they send requests to avoid overwhelming the system. We’ll explore how to interpret and respond to this throttling information later in the course.

Exercise 3a:
Retrieve the XlogP values of linear alkanes with 1 ~ 12 carbons.

Use the chemical names as inputs

Use a for loop to retrieve the XlogP value for each alkane.

Use the sleep() function to stop the program for one second for every five requests.

# Write your code in this cell:

Exercise 3b: Retrieve the **stereochemical** SMILES of the 20 common amino acids.

Use the chemical names as inputs. Because the 20 common amino acids in living organisms predominantly exist as one chrial form (the L-form), the names should be prefixed with “L-” (e.g., “L-alanine”, rather than “alanine”), except for “glycine” (which does not have a chiral center).

Use a for loop to retrieve the stereochemical SMILES for each amino acid.

Use the sleep() function to stop the program for one second for every five requests.

# Write your code in this cell

4. Getting multiple molecular properties#

So far, all the examples in this notebook show how to retrieve a single molecular property for a single compound. We were able to automate this by using a for loop to repeat this process for a series of comounds.

In this example, we are going to request four properties: the hydrogen-bond donor count, the hydrogen-bond acceptor count, the XLogP, and the topological polar surface area(TPSA) for 5 compounds. Rather than request by name, we are specifying by their PubChem Compound ID (CID) listed in a comma-separated format (CSV withing the URL.)

pugrest = "https://pubchem.ncbi.nlm.nih.gov/rest/pug" pugin = "compound/cid/4485,4499,5026,5734,8082" pugoper = "property/HBondDonorCount,HBondDonorCount,XLogP,TPSA" pugout = "csv" url = "/".join( [pugrest, pugin, pugoper, pugout] ) # Construct the URL print(url) print("-" * 55) # Print "-" 55 times (to print a line for readability) res = requests.get(url) print(res.text)

https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/4485,4499,5026,5734,8082/property/HBondDonorCount,HBondDonorCount,XLogP,TPSA/csv ------------------------------------------------------- "CID","HBondDonorCount","HBondDonorCount","XLogP","TPSA" 4485,1,1,2.200,110.0 4499,1,1,3.300,110.0 5026,1,1,4.300,123.0 5734,1,1,0.2,94.6 8082,1,1,0.800,12.0

PubChem enforces a standard time limit of 30 seconds per request. If you try to retrieve too many properties for too many compounds all at once, the request might take longer than this limit, resulting in a timeout error.

To avoid this, it’s a good idea to split your list of compounds into smaller chunks and process each chunk separately. This keeps each request fast enough to complete within the time limit and helps ensure successful data retrieval.

cids = [ 443422, 72301, 8082, 4485, 5353740, 5282230, 5282138, 1547484, 941361, 5734, \ 5494, 5422, 5417, 5290, 5245, 5026, 4746, 4507, 4499, 4497, \ 4494, 4474, 4418, 4386, 4009, 4008, 3949, 3926, 3878, 3784, \ 3698, 3547, 3546, 3336, 3333, 3236, 3076, 2585, 2520, 2351, \ 2312, 2162, 1236, 1234, 292331, 275182, 235244, 108144, 104972, 77157, \ 5942250, 5311217, 4564402, 4715169, 5311501]

# We will break the list of CIDs into chunks of 10 CIDs each chunk_size = 10 if ( len(cids) % chunk_size == 0 ) : # check if total number of cids is divisible by 10 with no remainder num_chunks = len(cids) // chunk_size # sets number of chunks else : # if divide by 10 results in remainder num_chunks = len(cids) // chunk_size + 1 # add one more chunk print("# Number of CIDs:", len(cids) ) print("# Number of chunks:", num_chunks )

# Number of CIDs: 55 # Number of chunks: 6

pugrest = "https://pubchem.ncbi.nlm.nih.gov/rest/pug" pugoper = "property/HBondDonorCount,HBondAcceptorCount,XLogP,TPSA" pugout = "csv" csv = "" #sets a variable called csv to save the comma separated output for i in range(num_chunks) : # sets number of requests to number of data chunks as determined above idx1 = chunk_size * i # sets a variable for a moving window of cids to start in a data chunk idx2 = chunk_size * (i + 1) # sets a variable for a moving window of cids to end in a data chunk pugin = "compound/cid/" + ",".join([ str(x) for x in cids[idx1:idx2] ]) # build pug input for chunks of data url = "/".join( [pugrest, pugin, pugoper, pugout] ) # Construct the URL res = requests.get(url) if ( i == 0 ) : # if this is the first request, store result in empty csv variable csv = res.text else : # if this is a subsequent request, add the request to the csv variable adding a new line between chunks csv = csv + "\n".join(res.text.split()[1:]) + "\n" if (i % 5 == 4): time.sleep(1) print(csv) print(len(csv.splitlines())) # count the number of lines in the output

"CID","HBondDonorCount","HBondAcceptorCount","XLogP","TPSA" 443422,0,5,3.1,40.2 72301,0,5,3.2,40.2 8082,1,1,0.800,12.0 4485,1,7,2.200,110.0 5353740,2,5,3.5,76.0 5282230,2,5,3.2,84.9 5282138,1,8,4.400,120.0 1547484,0,2,5.800,6.5 941361,0,4,6.000,6.5 5734,1,5,0.2,94.6 5494,0,6,5.0,57.2 5422,0,8,6.4,61.9 5417,0,5,3.2,40.2 5290,2,5,2.6,62.2 5245,5,8,-3.1,148.0 5026,1,8,4.300,123.0 4746,1,1,6.8,12.0 4507,1,7,2.900,110.0 4499,1,7,3.300,110.0 4497,1,8,3.100,120.0 4494,1,8,2.900,134.0 4474,1,8,3.800,114.0 4418,1,5,4.100,45.2 4386,2,3,4.400,49.3 4009,2,5,3.5,76.0 4008,1,9,5.600,117.0 3949,0,7,4.9,34.2 3926,1,5,6.0,35.6 3878,2,5,1.4,90.7 3784,1,8,4.300,104.0 3698,2,3,-0.2,68.0 3547,1,5,1.0,70.7 3546,3,5,-0.5,132.0 3336,1,1,5.5,12.0 3333,1,5,3.900,64.6 3236,0,2,3.8,20.3 3076,0,6,3.1,84.4 2585,3,5,4.200,75.7 2520,0,6,3.800,64.0 2351,0,3,5.3,15.7 2312,0,2,4.6,12.5 2162,2,7,3.000,99.9 1236,1,8,6.800,114.0 1234,0,7,3.800,73.2 292331,2,3,3.900,49.3 275182,1,8,6.1,72.9 235244,1,8,6.7,72.9 108144,2,5,3.9,117.0 104972,1,6,3.300,72.7 77157,1,4,3.2,49.8 5942250,2,5,3.5,76.0 5311217,1,7,4.500,90.9 4564402,0,4,4.1,45.5 4715169,2,3,-1.6,63.3 5311501,0,4,4.4,43.7 56

pugrest = "https://pubchem.ncbi.nlm.nih.gov/rest/pug" pugin = "compound/cid/4485,4499,5026,5734,8082" pugoper = "property/HBondDonorCount,HBondDonorCount,XLogP,TPSA" pugout = "csv" chunk_size = 10 csv2 = "" for start in range(0, len(cids), chunk_size): chunk = cids[start : start + chunk_size] # last chunk is shorter; slicing is safe pugin = "compound/cid/" + ",".join(map(str, chunk)) url = "/".join([pugrest, pugin, pugoper, pugout]) res = requests.get(url) lines = res.text.splitlines() if start == 0: csv2 = res.text else: csv2 += "\n".join(lines[1:]) + "\n" # optional throttling if (start // chunk_size) % 5 == 4: time.sleep(1) print(csv2) print(len(csv2.splitlines())) # count the number of lines in the output

Homework

Below is the list of CIDs of known antiinflmatory agents (obtained from PubChem via the URL: https://www.ncbi.nlm.nih.gov/pccompound?LinkName=mesh_pccompound&from_uid=68000893). Download the following properties of those compounds in a comma-separated format: Heavy atom count, rotatable bond count, molecular weight, XLogP, hydrogen bond donor count, hydrogen bond acceptor count, TPSA, and stereochemical SMILES.

Split the input CID list into smaller chunks (with a chunk size of 100 CIDs).

Process one chunk at a time using a for loop.

Do not forget to add sleep() to comply with the usage policy.

cids = [ 471, 1981, 2005, 2097, 2151, 2198, 2206, 2214, 2244, 2307, 2308, 2313, 2355, 2396, 2449, 2462, 2466, 2581, 2662, 2794, 2863, 3000, 3003, 3033, 3056, 3059, 3111, 3177, 3194, 3230, 3242, 3282, 3308, 3332, 3335, 3342, 3360, 3371, 3379, 3382, 3384, 3394, 3495, 3553, 3612, 3672, 3715, 3716, 3718, 3778, 3824, 3825, 3826, 3935, 3946, 3965, 4009, 4037, 4038, 4044, 4075, 4159, 4237, 4386, 4409, 4413, 4487, 4488, 4495, 4534, 4553, 4614, 4641, 4671, 4692, 4781, 4888, 4895, 4921, 5059, 5090, 5147, 5161, 5208, 5228, 5339, 5352, 5359, 5362, 5468, 5469, 5475, 5480, 5509, 5733, 5743, 5744, 5745, 5753, 5754, 5755, 5834, 5865, 5875, 5876, 5877, 6094, 6213, 6215, 6247, 6436, 6741, 7090, 7497, 8522, 9053, 9231, 9642, 9782, 9878, 10114, 10154, 10170, 10185, 10206, 12555, 12938, 13802, 14982, 15209, 16490, 16533, 16623, 16639, 16752, 16923, 17198, 19161, 20469, 21102, 21700, 21800, 21826, 21975, 22419, 23205, 26098, 26248, 26318, 28718, 28871, 30869, 30870, 30951, 31307, 31378, 31508, 31635, 31799, 31800, 32153, 32327, 32798, 33958, 35375, 35455, 35935, 36833, 37425, 38081, 38503, 39212, 39941, 40000, 40632, 41643, 43261, 44219, 47462, 47795, 50294, 50295, 51717, 54445, 54585, 57782, 59757, 60164, 60490, 60542, 60712, 60726, 60864, 61486, 62074, 62924, 63006, 63019, 64704, 64738, 64746, 64747, 64927, 64945, 64971, 64982, 65394, 65464, 65655, 65679, 65762, 66249, 67417, 68700, 68704, 68706, 68731, 68749, 68819, 68865, 68869, 68917, 71246, 71354, 71364, 71386, 71398, 71414, 71415, 71771, 72158, 72300, 73400, 82153, 84003, 84429, 90763, 91626, 91670, 100472, 102011, 104762, 104943, 107641, 107738, 107793, 108068, 108130, 114753, 114840, 114917, 114999, 115239, 119032, 119286, 119365, 119607, 119828, 119871, 121928, 121957, 122139, 122179, 122182, 123619, 123673, 123723, 124978, 128191, 128229, 128571, 133021, 134896, 146364, 151075, 151166, 152165, 155354, 155761, 156391, 158103, 159557, 162666, 164676, 167928, 168928, 174093, 174277, 176155, 177976, 180604, 183088, 189821, 192156, 196122, 196840, 196841, 200674, 201587, 219121, 222786, 229860, 235244, 236702, 259846, 263373, 275182, 292331, 425990, 439503, 439533, 441335, 441336, 442534, 442993, 443943, 443949, 443967, 444036, 445154, 445858, 446925, 479503, 485711, 490428, 501254, 522325, 546807, 578771, 584547, 610479, 633091, 633097, 636374, 636398, 656604, 656656, 656852, 657238, 667550, 927704, 969510, 969516, 1548887, 1548910, 2737488, 3033890, 3033980, 3045402, 3051696, 3055172, 4129359, 4306515, 4483645, 5018304, 5185849, 5280802, 5280914, 5280915, 5281004, 5281071, 5281515, 5281522, 5281792, 5282183, 5282193, 5282230, 5282387, 5282402, 5282492, 5283542, 5283734, 5284538, 5284539, 5311051, 5311052, 5311066, 5311067, 5311093, 5311101, 5311108, 5311169, 5311180, 5318517, 5320420, 5322111, 5352624, 5353725, 5353726, 5353740, 5353864, 5354499, 5377381, 5420804, 5420805, 5458396, 5472495, 5481958, 5701991, 5702036, 5702148, 5702212, 5702252, 5702287, 5745214, 5942250, 6420050, 6429274, 6437368, 6437387, 6438873, 6447131, 6453785, 6473881, 6509979, 6708733, 6710677, 6714002, 6917783, 6917852, 6917894, 6918172, 6918173, 6918332, 6918445, 6918452, 6918612, 6925666, 7060958, 7251185, 9554199, 9798098, 9799453, 9841438, 9843941, 9846332, 9865808, 9868219, 9869053, 9871508, 9875547, 9883509, 9897518, 9897771, 9907157, 9913795, 9919776, 9926694, 9934547, 10363606, 10918539, 11158972, 11513733, 11561674, 11616712, 11870423, 11949636, 11954221, 11954316, 11954353, 11954369, 11957468, 11961431, 11972243, 11972532, 12300053, 12313906, 12313911, 12606303, 12634263, 12714644, 12874922, 13018150, 13020033, 13041095, 14010989, 14515707, 14798494, 15895902, 16051947, 16132369, 16213022, 16213698, 16218996, 16219353, 16220118, 16759566, 16760658, 17750985, 17753757, 18526330, 18632363, 18647121, 18943026, 20054915, 21120116, 21637635, 21637642, 21893738, 21893804, 21982135, 22141508, 22811280, 23509770, 23631982, 23653552, 23657872, 23663407, 23663409, 23663418, 23663959, 23663989, 23665411, 23665999, 23667642, 23669636, 23674183, 23674255, 23674745, 23675763, 23680530, 23681059, 23684814, 23688663, 23693301, 23694214, 23702389, 24181458, 24721429, 24761485, 24799587, 24847961, 24847981, 24867460, 24867465, 24867475, 24883465, 24916955, 25077872, 25113755, 25796773, 40469526, 44119558, 44202892, 44260118, 44266812, 44386560, 45006151, 45006158, 45039955, 45356876, 45356931, 45357558, 45357932, 45358013, 45358120, 45358130, 45358140, 45358148, 45358149, 45488525, 46174093, 46397498, 46780650, 46780910, 46783539, 46783786, 46783814, 46863906, 46878350, 46882877, 50989825, 51026956, 51340230, 51398089, 53384387, 53394893, 53486221, 53486290, 53486322, 54194814, 54605501, 54675840, 54676228, 54677470, 54677971, 54677972, 54677977, 54682045, 54684589, 54690031, 54697648, 54708862, 54714524, 56841932, 56842111, 56845155, 57347755, 57486087, 67668959, 67804972, 67986221, 70470286, 70678885, 71306882, 71587162, 72774967, 72941490, 72941625, 73758129, 73759663, 73759808, 74787565, 77906397, 78577433, 90488794, 91711382, 91826463, 91873711, 91881846, 92131836, 92462493, 102004404, 102601886, 117072385, 117072403, 117072410, 118701141, 118701402, 118984459, 122130078, 122130111, 122130185, 122130213, 122130768, 122173054, 122173183, 122361610, 123134657, 124081055, 124463365, 126968472, 126968501, 126968801, 126969212, 126969455, 129009998, 129010022, 129010033, 129010043, 129316829, 129317859, 129317898, 129628207, 129628892, 129670532, 129735029, 131632430, 131635023, 131676243, 131750284, 131954647, 131954667, 132399051, 132399058, 133112890, 133126366, 133126370, 133562807, 133659920, 133687604, 134129698, 134159361, 134460917, 134612785, 134687786, 134688123, 134688323, 134688977, 134689786, 134693106, 134693125, 134693234, 134694728, 134694860, 135413496, 135413505, 135414247, 135484078, 135515521, 135565709, 136040192, 137177332, 137699687, 137705034, 137705717, 137705725, 137705994, 137706376, 137706400, 137795135, 138059757, 138107776, 138113311, 138113507, 138113581, 138114182, 138114743] len(cids)

# Write your code in this cell.

Acknowledgment#

This module is essentially a copy of the 2019 Cheminformatics module 1: “Getting Molecular Properties through PUG-REST” created by Sunghwan Kim with a Public Domain copyright by Robert Belford. Minor edits are made by Ehren Bucholtz, including fixing some deprecated SMILES definitions within PubChem’s glossary.

previous

2.1 PubChem Data Types

next

Module 3: Public Chemical Databases

Contents

2.2 PUG-RESTY Activity

Getting Molecular Properties through PubChem’s PUG REST Web Interface

Objectives

1. The Shortest Code to Get PubChem Data

2. Formulating PUG-REST request URLs using variables

3. Making multiple requests using a for loop

4. Getting multiple molecular properties

Acknowledgment

By Robert Belford

© Copyright 2023.