2.2 PUG-REST Activity#
Getting Molecular Properties through PubChem’s PUG REST Web Interface#
PubChem’s Power User Gateway(PUG) REST interface is a web-based Application Programming Interface (API) that allows for programmatic access to PubChem’s chemical database. Specifically it is a RESTful API, meaning it follows the Representational State Transfer(REST) interface. REST uses standard web protocols (like HTTP) to enable users to retrieve information using simple URL-based queries.
As a REST API, each request for data sent to the server is independent and contains all the information required to fulfill the request. The server does not retain any information from previous requests or even other user sessions. Therefore, it can be easily automated programmatically since connections or sessions do not need to be maintained.
In this notebook, we will explore the API and retrieve chemical data from PubChem programmatically.
Objectives#
Learn the basic approach to getting data from PubChem through PUG-REST
Retrieve a single property of a single compound.
Retrieve a single property of multiple compounds
Retrieve multiple properties of multiple compounds.
Write a
for
loop to make the same kind of requests.Process a large amount of data by splitting them into smaller chunks
1. The Shortest Code to Get PubChem Data#
Let’s suppose that we want to get the molecular formula of prednisone (a synthetic steroid that reduces inflammation). You can get this by doing a search for predinsone at PubChem or you can get this data direct from your web browser (Chrome, Safari, Internet Explorer, etc) by typing in the following URL:
https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/prednisone/property/MolecularFormula/txt
Getting the same data using a computer program is not very difficult. This task can be with three lines of Python code.
Line 1: First, we import the requests
library (https://3.python-requests.org/), which is a collection of pre-written Python code that makes it easy to access information from the web. This library allows us to send requests for information to PubChem and handle the response in our Python code.
import requests
Line 2: Second, we need to ask the server for data. We use the get()
function from the requests library. This kind of request is called a GET request, which is a common way to ask a web service for information. The PUG-REST request URL (enclosed within a pair of quotes ' '
or " "
) is provided as the argument inside the parentheses. The result will be stored in a variable called res
which is short for “result” or “response”. This creates a Response object, which contains all the data sent back from the server, including the actual information we asked for.
res = requests.get('https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/prednisone/property/MolecularFormula/txt')
Line 3: The res
variable contains not only the requested data but also some additional information about the request itself. To view the actual data returned by the server, we need to access the text of the response. This is done using res.text
, which gives us the response as a string. The text
is a property of the Response object we created in line 2. To view the response data, we can use the print()
function to display the result in the notebook.
print(res.text)
C21H26O5
As another example, the following code retrieves the number of heavy (non-hydrogen) atoms of butadiene. (Note: we don’t need to import the requests library here as we have done so already. All we need to do is make the request and print out the result)
res = requests.get('https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/butadiene/property/HeavyAtomCount/txt')
print(res.text)
4
While many of the properties youcan retrieve are from PubChem are fairly intuitive, it’s important to use the exact property names that PubChem recognizes (the correct syntax.) For example, we have used HeavyAtomCount
and MolecularFormula
. You can find a list of available properties and their correct syntax in the PubChem documentation. Click here to open a browser window with a list of properties in the PubChem database.
Retrieve the molecular weight of ethanol in a “text” format.
# Write your code in this cell:
Retrieve the number of hydrogen-bond acceptors of aspirin in a “text” format.
# Write your code in this cell:
2. Formulating PUG-REST request URLs using variables#
In the earlier examples, we passed the PUG REST request URL directly into the requests.get()
function by typing the full URL inside the parentheses. However, it’s also possible to build the URL using variables. The example below shows how you can create a request URL by combining (concatenating) smaller variables into a complete URL, which is then passed to requests.get()
. This approach makes it easier to change parts of the request—such as the compound name or property without rewriting the whole URL.
A PUG-REST request URL encodes three pieces of information (input, operation, output), preceded by the prologue common to all requests. In the follwoing code cell, these pieces of information are stored in four different variables (pugrest
, pugin
, pugoper
, pugout
) and combined into a new variable url
.
pugrest = "https://pubchem.ncbi.nlm.nih.gov/rest/pug"
pugin = "compound/name/water"
pugoper = "property/MolecularFormula"
pugout = "txt"
url = pugrest + '/' + pugin + '/' + pugoper + '/' + pugout
print(url)
https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/water/property/MolecularFormula/txt
Notice in the code cell above that we built the URL by concatenating variables and manually adding ‘/’ between each part. An alternative way to do this is by using Python’s join()
function, which is used to combine a list of strings into a single string inserting a specified separator between each element. This can make your code cleaner and more flexible when working with multiple pieces of a URL.
url = "/".join( [pugrest, pugin, pugoper, pugout] )
print(url)
https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/water/property/MolecularFormula/txt
Here, the strings stored in the four variables are joined by the /
character as a separator. Notice that the four variables are placed inside square brackets [ ]
, which creates a list containing those strings. The list is passed to the .join()
function, which combines the elements into a single URL with /
inserted between each part.
Now we can pass the url to request.get()
.
res = requests.get(url)
print(res.text)
H2O
Retrieve the rotatable bond count of phenylalanine in a “text” format using variables to generate the URL.
# Write your code in this cell:
Retrieve the charge of phosphate in a “text” format using variables to generate the URL.
# Write your code in this cell:
3. Making multiple requests using a for loop#
It might seem unneccesarily complicated to use variables to build a request URL compared to typing in one full URL directly into requests.get()
. If you are only making one request, it definitely is easier to write the one full URL.
However, if you are making many similar requests, manually typing each URL would be time-consuming and very error-prone. In this case, it is much more efficient to store the common parts of the url as variables and use a loop to generate full URLs for the part that changes. For example, suppose you want to retrieve the SMILES strings for five different molecules. This approach would save a lot of repetitive typing and make your code easier to modify and reuse.
Note: In PubChem the ConnectivitySMILES property contains the connectivity layer only, and does NOT include stereochemistry or isotope, while the SMILES property includes both stereochemical and isotopic information)
# create a list of compound names that we would like SMILES for
names = [ 'cytosine', 'benzene', 'motrin', 'aspirin', 'zolpidem' ]
Now the chemical names are stored in a list called names
.
Using a for
loop, we can go through each chemical name in the list, create a request URL by updating the name in the appropriate spot, and then retrieve the desired data for each molecule. This allows us to automate multiple requests efficiently, as shown in the example below.
pugrest = "https://pubchem.ncbi.nlm.nih.gov/rest/pug"
pugoper = "property/ConnectivitySMILES"
pugout = "txt"
for myname in names: # loop over each element in the "names" list
pugin = "compound/name/" + myname
url = "/".join( [pugrest, pugin, pugoper, pugout] )
res = requests.get(url)
print(myname, ":", res.text)
cytosine : C1=C(NC(=O)N=C1)N
benzene : C1=CC=CC=C1
motrin : CC(C)CC1=CC=C(C=C1)C(C)C(=O)O
aspirin : CC(=O)OC1=CC=CC=C1C(=O)O
zolpidem : CC1=CC=C(C=C1)C2=C(N3C=C(C=CC3=N2)C)CC(=O)N(C)C
When you make a lot of programmatic access requests using a loop, you should limit your request rate to or below five requests per second. Please read the following document to learn more about PubChem’s usage policies:
https://pubchemdocs.ncbi.nlm.nih.gov/programmatic-access$_RequestVolumeLimitations
Violation of usage policies may result in the user being temporarily blocked from accessing PubChem (or NCBI) resources**
In the for
loop example above, we are only processing five chemical names, so it’s unlikely that we’ll exceed PubChem’s limit of five requests per second. However, it is much more commone to work with hundreds or thousands of chemical names and it is very likely that the code will exceed this limit.
To prevent overloading the server and to follow PubChem’s usage guidelines, it is a good idea to slow down the request rate by adding short delay to each request. This can be done using the sleep()
function from Python’s time
module.
For example, let’s suppose you have 12 chemical names to process (though in a real project, you will have many more). Rather than sending these requests immediately one after another, we will add a short pause to ensure we stay under the limit and be a good API user.
names = [ 'water', 'benzene', 'methanol', 'ethene', 'ethanol', \
'propene','1-propanol', '2-propanol', 'butadiene', '1-butanol', \
'2-butanol', 'tert-butanol']
import time
pugrest = "https://pubchem.ncbi.nlm.nih.gov/rest/pug"
pugoper = "property/ConnectivitySMILES"
pugout = "txt"
for i in range(len(names)): # loop over each index (position) in the "names" list
pugin = "compound/name/" + names[i] # names[i] = the ith element in the names list.
url = "/".join( [pugrest, pugin, pugoper, pugout] )
res = requests.get(url)
print(names[i], ":", res.text)
time.sleep(0.2) # pause for 0.2 seconds between requests to be polite to the server
water : O
benzene : C1=CC=CC=C1
methanol : CO
ethene : C=C
ethanol : CCO
propene : CC=C
1-propanol : CCCO
2-propanol : CC(C)O
butadiene : C=CC=C
1-butanol : CCCCO
2-butanol : CCC(C)O
tert-butanol : CC(C)(C)O
There are three noteworthy itmes in the above example:
First, the for loop interates from 0 to [
len(names)
− 1], that is, [0, 1, 2, 3, …,11]. (Recall python counts start at 0)The variable
i
is used (innames[i]
) to generate the input part (pugin
) of the PUG-REST request URL.After each request we pause for 0.2 seconds to ensure we are only making 5 requests per second (0.2sec x 5/sec = 1 sec)
The following is another method for breaking up the requests. In this case after every 5 requsts, we will pause the program for 1 second to ensure we don’t overload the server.
import time
pugrest = "https://pubchem.ncbi.nlm.nih.gov/rest/pug"
pugoper = "property/ConnectivitySMILES"
pugout = "txt"
for i in range(len(names)): # loop over each index (position) in the "names" list
pugin = "compound/name/" + names[i] # names[i] = the ith element in the names list.
url = "/".join( [pugrest, pugin, pugoper, pugout] )
res = requests.get(url)
print(names[i], ":", res.text)
if ( i % 5 == 4 ) : # the % is the modulo operator and returns the remainder of a calculation (if i = 4, 9, ...)
time.sleep(1)
water : O
benzene : C1=CC=CC=C1
methanol : CO
ethene : C=C
ethanol : CCO
propene : CC=C
1-propanol : CCCO
2-propanol : CC(C)O
butadiene : C=CC=C
1-butanol : CCCCO
2-butanol : CCC(C)O
tert-butanol : CC(C)(C)O
In this version of the code, we are still looping through the list of molecule names and updating the pugin
variable each time. However, we’ve introduced a new variable i
, which keeps track of how many requests have been made. Using an if
statement, we check wether i
is a multiple of 5. If i
is, we pause the program for one second using time.sleep(1)
. This method also helps us stay within PubChem’s limit of 5 requests per second by introducing a short break after every 5 requests.
It’s important to note that PubChem’s request volume limit can be lowered during times of high server load. This process is called dynamic request throttling. When throttling is active, the server includes information in the HTTP response headers indicating the current system load and any temporary per-user limits. Based on this information, users are expected to adjust the speed at which they send requests to avoid overwhelming the system. We’ll explore how to interpret and respond to this throttling information later in the course.
Retrieve the XlogP values of linear alkanes with 1 ~ 12 carbons.
Use the chemical names as inputs
Use a for loop to retrieve the XlogP value for each alkane.
Use the sleep() function to stop the program for one second for every five requests.
# Write your code in this cell:
Use the chemical names as inputs. Because the 20 common amino acids in living organisms predominantly exist as one chrial form (the L-form), the names should be prefixed with “L-” (e.g., “L-alanine”, rather than “alanine”), except for “glycine” (which does not have a chiral center).
Use a for loop to retrieve the stereochemical SMILES for each amino acid.
Use the sleep() function to stop the program for one second for every five requests.
# Write your code in this cell
##4. Getting multiple molecular properties
So far, all the examples in this notebook show how to retrieve a single molecular property for a single compound. We were able to automate this by using a for
loop to repeat this process for a series of comounds.
In the example, we are going to request four properties: the hydrogen-bond donor count, the hydrogen-bond acceptor count, the XLogP, and the topological polar surface area(TPSA) for 5 compounds. Rather that request by name, we are specifying by their PubChem Compound ID (CID) listed in a comma-separated format (CSV withing the URL.)
pugrest = "https://pubchem.ncbi.nlm.nih.gov/rest/pug"
pugin = "compound/cid/4485,4499,5026,5734,8082"
pugoper = "property/HBondDonorCount,HBondDonorCount,XLogP,TPSA"
pugout = "csv"
url = "/".join( [pugrest, pugin, pugoper, pugout] ) # Construct the URL
print(url)
print("-" * 55) # Print "-" 55 times (to print a line for readability)
res = requests.get(url)
print(res.text)
https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/4485,4499,5026,5734,8082/property/HBondDonorCount,HBondDonorCount,XLogP,TPSA/csv
-------------------------------------------------------
"CID","HBondDonorCount","HBondDonorCount","XLogP","TPSA"
4485,1,1,2.200,110.0
4499,1,1,3.300,110.0
5026,1,1,4.300,123.0
5734,1,1,0.2,94.6
8082,1,1,0.800,12.0
PubChem enforces a standard time limit of 30 seconds per request. If you try to retrieve too many properties for too many compounds all at once, the request might take longer than this limit, resulting in a timeout error.
To avoid this, it’s a good idea to split your list of compounds into smaller chunks and process each chunk separately. This keeps each request fast enough to complete within the time limit and helps ensure successful data retrieval.
cids = [ 443422, 72301, 8082, 4485, 5353740, 5282230, 5282138, 1547484, 941361, 5734, \
5494, 5422, 5417, 5290, 5245, 5026, 4746, 4507, 4499, 4497, \
4494, 4474, 4418, 4386, 4009, 4008, 3949, 3926, 3878, 3784, \
3698, 3547, 3546, 3336, 3333, 3236, 3076, 2585, 2520, 2351, \
2312, 2162, 1236, 1234, 292331, 275182, 235244, 108144, 104972, 77157, \
5942250, 5311217, 4564402, 4715169, 5311501]
# We will break the list of CIDs into chunks of 10 CIDs each
chunk_size = 10
if ( len(cids) % chunk_size == 0 ) : # check if total number of cids is divisible by 10 with no remainder
num_chunks = len(cids) // chunk_size # sets number of chunks
else : # if divide by 10 results in remainder
num_chunks = len(cids) // chunk_size + 1 # add one more chunk
print("# Number of CIDs:", len(cids) )
print("# Number of chunks:", num_chunks )
# Number of CIDs: 55
# Number of chunks: 6
pugrest = "https://pubchem.ncbi.nlm.nih.gov/rest/pug"
pugoper = "property/HBondDonorCount,HBondAcceptorCount,XLogP,TPSA"
pugout = "csv"
csv = "" #sets a variable called csv to save the comma separated output
for i in range(num_chunks) : # sets number of requests to number of data chunks as determined above
idx1 = chunk_size * i # sets a variable for a moving window of cids to start in a data chunk
idx2 = chunk_size * (i + 1) # sets a variable for a moving window of cids to end ina data chunk
pugin = "compound/cid/" + ",".join([ str(x) for x in cids[idx1:idx2] ]) # build pug input for chunks of data
url = "/".join( [pugrest, pugin, pugoper, pugout] ) # Construct the URL
res = requests.get(url)
if ( i == 0 ) : # if this is the first request, store result in empty csv variable
csv = res.text
else : # if this is a subsequent request, add the request to the csv variable adding a new line between chunks
csv = csv + "\n".join(res.text.split()[1:]) + "\n"
if (i % 5 == 4):
time.sleep(1)
print(csv)
"CID","HBondDonorCount","HBondAcceptorCount","XLogP","TPSA"
443422,0,5,3.1,40.2
72301,0,5,3.2,40.2
8082,1,1,0.800,12.0
4485,1,7,2.200,110.0
5353740,2,5,3.5,76.0
5282230,2,5,3.2,84.9
5282138,1,8,4.400,120.0
1547484,0,2,5.800,6.5
941361,0,4,6.000,6.5
5734,1,5,0.2,94.6
5494,0,6,5.0,57.2
5422,0,8,6.4,61.9
5417,0,5,3.2,40.2
5290,2,5,2.6,62.2
5245,5,8,-3.1,148.0
5026,1,8,4.300,123.0
4746,1,1,6.8,12.0
4507,1,7,2.900,110.0
4499,1,7,3.300,110.0
4497,1,8,3.100,120.0
4494,1,8,2.900,134.0
4474,1,8,3.800,114.0
4418,1,5,4.100,45.2
4386,2,3,4.400,49.3
4009,2,5,3.5,76.0
4008,1,9,5.600,117.0
3949,0,7,4.9,34.2
3926,1,5,6.0,35.6
3878,2,5,1.4,90.7
3784,1,8,4.300,104.0
3698,2,3,-0.2,68.0
3547,1,5,1.0,70.7
3546,3,5,-0.5,132.0
3336,1,1,5.5,12.0
3333,1,5,3.900,64.6
3236,0,2,3.8,20.3
3076,0,6,3.1,84.4
2585,3,5,4.200,75.7
2520,0,6,3.800,64.0
2351,0,3,5.3,15.7
2312,0,2,4.6,12.5
2162,2,7,3.000,99.9
1236,1,8,6.800,114.0
1234,0,7,3.800,73.2
292331,2,3,3.900,49.3
275182,1,8,6.1,72.9
235244,1,8,6.7,72.9
108144,2,5,3.9,117.0
104972,1,6,3.300,72.7
77157,1,4,3.2,49.8
5942250,2,5,3.5,76.0
5311217,1,7,4.500,90.9
4564402,0,4,4.1,45.5
4715169,2,3,-1.6,63.3
5311501,0,4,4.4,43.7
Below is the list of CIDs of known antiinflmatory agents (obtained from PubChem via the URL: https://www.ncbi.nlm.nih.gov/pccompound?LinkName=mesh_pccompound&from_uid=68000893). Download the following properties of those compounds in a comma-separated format: Heavy atom count, rotatable bond count, molecular weight, XLogP, hydrogen bond donor count, hydrogen bond acceptor count, TPSA, and stereochemical SMILES.
Split the input CID list into smaller chunks (with a chunk size of 100 CIDs).
Process one chunk at a time using a for loop.
Do not forget to add sleep() to comply with the usage policy.
cids = [ 471, 1981, 2005, 2097, 2151, 2198, 2206, 2214, 2244, 2307, 2308, 2313, 2355, 2396, 2449, 2462, 2466, 2581, 2662, 2794, 2863, 3000, 3003, 3033, 3056, 3059, 3111, 3177, 3194, 3230, 3242, 3282, 3308, 3332, 3335, 3342, 3360, 3371, 3379, 3382, 3384, 3394, 3495, 3553, 3612, 3672, 3715, 3716, 3718, 3778, 3824, 3825, 3826, 3935, 3946, 3965, 4009, 4037, 4038, 4044, 4075, 4159, 4237, 4386, 4409, 4413, 4487, 4488, 4495, 4534, 4553, 4614, 4641, 4671, 4692, 4781, 4888, 4895, 4921, 5059, 5090, 5147, 5161, 5208, 5228, 5339, 5352, 5359, 5362, 5468, 5469, 5475, 5480, 5509, 5733, 5743, 5744, 5745, 5753, 5754, 5755, 5834, 5865, 5875, 5876, 5877, 6094, 6213, 6215, 6247, 6436, 6741, 7090, 7497, 8522, 9053, 9231, 9642, 9782, 9878, 10114, 10154, 10170, 10185, 10206, 12555, 12938, 13802, 14982, 15209, 16490, 16533, 16623, 16639, 16752, 16923, 17198, 19161, 20469, 21102, 21700, 21800, 21826, 21975, 22419, 23205, 26098, 26248, 26318, 28718, 28871, 30869, 30870, 30951, 31307, 31378, 31508, 31635, 31799, 31800, 32153, 32327, 32798, 33958, 35375, 35455, 35935, 36833, 37425, 38081, 38503, 39212, 39941, 40000, 40632, 41643, 43261, 44219, 47462, 47795, 50294, 50295, 51717, 54445, 54585, 57782, 59757, 60164, 60490, 60542, 60712, 60726, 60864, 61486, 62074, 62924, 63006, 63019, 64704, 64738, 64746, 64747, 64927, 64945, 64971, 64982, 65394, 65464, 65655, 65679, 65762, 66249, 67417, 68700, 68704, 68706, 68731, 68749, 68819, 68865, 68869, 68917, 71246, 71354, 71364, 71386, 71398, 71414, 71415, 71771, 72158, 72300, 73400, 82153, 84003, 84429, 90763, 91626, 91670, 100472, 102011, 104762, 104943, 107641, 107738, 107793, 108068, 108130, 114753, 114840, 114917, 114999, 115239, 119032, 119286, 119365, 119607, 119828, 119871, 121928, 121957, 122139, 122179, 122182, 123619, 123673, 123723, 124978, 128191, 128229, 128571, 133021, 134896, 146364, 151075, 151166, 152165, 155354, 155761, 156391, 158103, 159557, 162666, 164676, 167928, 168928, 174093, 174277, 176155, 177976, 180604, 183088, 189821, 192156, 196122, 196840, 196841, 200674, 201587, 219121, 222786, 229860, 235244, 236702, 259846, 263373, 275182, 292331, 425990, 439503, 439533, 441335, 441336, 442534, 442993, 443943, 443949, 443967, 444036, 445154, 445858, 446925, 479503, 485711, 490428, 501254, 522325, 546807, 578771, 584547, 610479, 633091, 633097, 636374, 636398, 656604, 656656, 656852, 657238, 667550, 927704, 969510, 969516, 1548887, 1548910, 2737488, 3033890, 3033980, 3045402, 3051696, 3055172, 4129359, 4306515, 4483645, 5018304, 5185849, 5280802, 5280914, 5280915, 5281004, 5281071, 5281515, 5281522, 5281792, 5282183, 5282193, 5282230, 5282387, 5282402, 5282492, 5283542, 5283734, 5284538, 5284539, 5311051, 5311052, 5311066, 5311067, 5311093, 5311101, 5311108, 5311169, 5311180, 5318517, 5320420, 5322111, 5352624, 5353725, 5353726, 5353740, 5353864, 5354499, 5377381, 5420804, 5420805, 5458396, 5472495, 5481958, 5701991, 5702036, 5702148, 5702212, 5702252, 5702287, 5745214, 5942250, 6420050, 6429274, 6437368, 6437387, 6438873, 6447131, 6453785, 6473881, 6509979, 6708733, 6710677, 6714002, 6917783, 6917852, 6917894, 6918172, 6918173, 6918332, 6918445, 6918452, 6918612, 6925666, 7060958, 7251185, 9554199, 9798098, 9799453, 9841438, 9843941, 9846332, 9865808, 9868219, 9869053, 9871508, 9875547, 9883509, 9897518, 9897771, 9907157, 9913795, 9919776, 9926694, 9934547, 10363606, 10918539, 11158972, 11513733, 11561674, 11616712, 11870423, 11949636, 11954221, 11954316, 11954353, 11954369, 11957468, 11961431, 11972243, 11972532, 12300053, 12313906, 12313911, 12606303, 12634263, 12714644, 12874922, 13018150, 13020033, 13041095, 14010989, 14515707, 14798494, 15895902, 16051947, 16132369, 16213022, 16213698, 16218996, 16219353, 16220118, 16759566, 16760658, 17750985, 17753757, 18526330, 18632363, 18647121, 18943026, 20054915, 21120116, 21637635, 21637642, 21893738, 21893804, 21982135, 22141508, 22811280, 23509770, 23631982, 23653552, 23657872, 23663407, 23663409, 23663418, 23663959, 23663989, 23665411, 23665999, 23667642, 23669636, 23674183, 23674255, 23674745, 23675763, 23680530, 23681059, 23684814, 23688663, 23693301, 23694214, 23702389, 24181458, 24721429, 24761485, 24799587, 24847961, 24847981, 24867460, 24867465, 24867475, 24883465, 24916955, 25077872, 25113755, 25796773, 40469526, 44119558, 44202892, 44260118, 44266812, 44386560, 45006151, 45006158, 45039955, 45356876, 45356931, 45357558, 45357932, 45358013, 45358120, 45358130, 45358140, 45358148, 45358149, 45488525, 46174093, 46397498, 46780650, 46780910, 46783539, 46783786, 46783814, 46863906, 46878350, 46882877, 50989825, 51026956, 51340230, 51398089, 53384387, 53394893, 53486221, 53486290, 53486322, 54194814, 54605501, 54675840, 54676228, 54677470, 54677971, 54677972, 54677977, 54682045, 54684589, 54690031, 54697648, 54708862, 54714524, 56841932, 56842111, 56845155, 57347755, 57486087, 67668959, 67804972, 67986221, 70470286, 70678885, 71306882, 71587162, 72774967, 72941490, 72941625, 73758129, 73759663, 73759808, 74787565, 77906397, 78577433, 90488794, 91711382, 91826463, 91873711, 91881846, 92131836, 92462493, 102004404, 102601886, 117072385, 117072403, 117072410, 118701141, 118701402, 118984459, 122130078, 122130111, 122130185, 122130213, 122130768, 122173054, 122173183, 122361610, 123134657, 124081055, 124463365, 126968472, 126968501, 126968801, 126969212, 126969455, 129009998, 129010022, 129010033, 129010043, 129316829, 129317859, 129317898, 129628207, 129628892, 129670532, 129735029, 131632430, 131635023, 131676243, 131750284, 131954647, 131954667, 132399051, 132399058, 133112890, 133126366, 133126370, 133562807, 133659920, 133687604, 134129698, 134159361, 134460917, 134612785, 134687786, 134688123, 134688323, 134688977, 134689786, 134693106, 134693125, 134693234, 134694728, 134694860, 135413496, 135413505, 135414247, 135484078, 135515521, 135565709, 136040192, 137177332, 137699687, 137705034, 137705717, 137705725, 137705994, 137706376, 137706400, 137795135, 138059757, 138107776, 138113311, 138113507, 138113581, 138114182, 138114743]
len(cids)
# Write your code in this cell.
Acknowledgment#
This module is essentially a copy of the 2019 Cheminformatics module 1: “Getting Molecular Properties through PUG-REST” created by Sunghwan Kim with a Public Domain copyright by Robert Belford. Minor edits are made by Ehren Bucholtz, including fixing some deprecated SMILES definitions within PubChem’s glossary.