Biomedical Statistics and Informatics
Volume 1, Issue 1, December 2016, Pages: 24-34

Statistical Aspects of the Interrelation Between the Biological Activity of Chemical Compounds and Their Molecular Structure

Mukhomorov V. K.

Universita Degli Studi di Napoli "Federico II", Via Cintia, Napoli, Italy

Email address:

To cite this article:

Mukhomorov V. K. Statistical Aspects of the Interrelation Between the Biological Activity of Chemical Compounds and Their Molecular Structure. Biomedical Statistics and Informatics. Vol. 1, No. 1, 2016, pp. 24-34. doi: 10.11648/j.bsi.20160101.14

Received: December 7, 2016; Accepted: December 27, 2016; Published: January 20, 2017


Abstract: An attempt was made to construct an adequate model of interrelation of radioprotective properties of biologically active chemical compounds with their electronic and information factors. Biological activity (radiation protective effects) of chemical compounds has been analyzed in relation to their electronic sign and the information function. Statistical comparison of qualitative indices has revealed that electronic and information signs the most informative characteristics of the molecules responsible for radiation protective action. Correlation equations are given for electronic and information dependent change in the antiradiation properties of the molecule. Quantitative estimates were made associating the protective efficiency of the chemical compounds under study with variations in the electronic parameters and dose of chemicals.

Keywords: Bioactivity, Statistics, Molecular Structure, Electronic Sign, Information Function, Radioprotector, Statistical Criterion, Contingency, Correlation


1. Introduction

Knowledge of quantitative stochastic interrelation between the chemical structure of a molecules and its biological activity has important theoretical and practical significance. It is necessary both to clarify the mechanism of biochemical action of molecules, and to search for promising new drugs. It is known that the classical apparatus of probability theory and mathematical statistics is the basis of the stochastic simulation of natural phenomena. The main party of such research is to estimation of the closeness of causal relationships between explanatory parameters and response of the biological system.

Causal relationship implies that their recurrence lead to the same consequences. However, a causal relationship can be subject to fluctuations due to random deviations. These fluctuations are due to the uncontrolled and unaccounted factors and are identified by statistical laws.

One of the most relevant issues of modern chemistry of biologically active substances is the problem of creating new effective radioprotectors. The main demands on these drugs are low effective dose, low toxicity and lack of side effects. The existence of side effects significantly limits the practical applicability of radioprotectors. Statistical methods are the most rational in solving problems that are associated with the study of action of a combination of factors on the biosystem. Since the effect of the interaction of drugs with biosystem depends on many conditions, then it has a probabilistic nature. Therefore it is preferable to use a probabilistic model.

It is not always possible to construct an adequate model which describes the relationship of the chemical structure of the compound with its biological activity. If the model is overloaded with a large number of non-essential characteristics use such model becomes almost impossible. At the same time, nothing can compensate for the shortcomings of the model, if the main link has been lost. Therefore, an adequate model should be as close as possible to simulate the basic properties of chemical compounds. Figuring out of the connection between molecular structure and biological activity will allow to carry out a targeted search for new chemicals, and also can contribute to deciphering the mechanisms of their bioactivity.

2. Method and Discussion

For a description of the interrelation of bioactivity with molecular structure, we use the descriptors (attributes), the calculation of which requires knowledge of only the structural formula of chemical compounds. We take into account the remark of Alexander P. and Bacq Z. [1] on the importance of the primary chemical structure of the drug in the mechanism of protection against ionizing radiation.

We use the average number of electrons in the outer shell of the atoms as a sign of the molecule [2]:

(1)

where  is the number of atoms of i-th kind;  is a number of electrons in the outer electron shell. The summation is performed over all the atoms in a molecule N is the total number of atoms. In [3] it was shown that the empirical pseudopotential can be represented in the following analytical form

(2)

where  and  are amendments to the Coulomb potential. Amendments depend on the distance r between the molecule and the electron.

Two groups of chemical compounds are given in Table 1 [4, 5]. The first group contains chemical compounds with an effective radioprotective effect (dose ≤ 1mM / kg; the survival of more than 50%, chemical compounds are marked with "+" sign). The second group contains chemical compounds, which have no anti-radiation activity at high doses: Dose> 2mM / kg (these chemicals are marked with "-" sign). This choice of the chemical compounds imposes restriction on the size of the sample.

Our goal is to find a classification rule that statistically reliable divides the active and non-active chemical compounds. To do this, we use the association method (statistical methods for rates and proportions) for signs which have an alternative variation ("yes" or "no"). Observations and sign (Z) of molecules can be represented as 2 × 2 table or tetrachoric table (Table 2). We will carry out the analysis of the interrelation of chemical compounds bioactivity and the magnitude of sign of Z.

Table 1. Electronic and information factors of chemical compounds.

N Chemical compounds I. P. Doze, mM/kg, [4, 5] A. R. P. [4, 5] Z*) H, bit
1 H2N (CH2)4CH (NH)2CH2SH 0.34 + 2.346 1.460
2 H2NC (=NH)CH2CH2SH 0.61 + 2.571 1.611
3 H2NCH2CH2SCN 0.49 + 2.833 1.730
4 H2NCH2CH2СH2NHCH2CH2SH 0.56 + 2.273 1.418
5 (CH3)2NC(=NH)CH2SH 0.85 + 2.471 1.545
6 (CH3)3CNHCSNHCH2CH2OH 0.71 + 2.444 1.583
7 CH2=CHCH2NHCH2CH2SH 0.85 + 2.333 1.411
8 CH3CH2CH (NH2)CH2SH 0.95 + 2.235 1.378
9 (CH3)2CH (CH2)5NH (CH2)S2O3H 0.07 + 2.556 1.628
10 CH3 (CH2)6NH (CH2)2S2O3H 0.29 + 2.556 1.628
11 H2C=C (CH3)CH2SC (=NH)NH2 0.31 + 2.556 1.568
12 CH3NH (CH2)3NHCH2CH2CH2SPO3H2 0.31 + 2.606 1.798
13 H2N (CH2)5NHCH2CH2SPO3H2 0.62 + 2.606 1.798
14 H2NCH2C (CH3)2CH2NHCH2CH2SPO3H2 0.62 + 2.606 1.798
15 CH2=C (NH2)CH2CH2SH 0.15 + 2.400 1.472
16 H2N (CH2)5CH (NH2)CH2SPO3H2 0.21 + 2.606 1.798
17 Cyclo-C6H11NHP (O) (OH)SH 0.19 + 2.741 1.818
18 H2N (CH2)3NHCH2CH2CH2SPO3H2 0.32 + 2.667 1.849
19 H2NCH2CH (CH3)CH2NHCH2CH2SPO3H2 0.44 + 2.667 1.849
20 H2NCH2CH2CH (CH3)NHCH2CH2SPO3H2 0.66 + 2.667 1.849
21 L (+)=H2N (CH2)4CH (NH2)CH2SPO3H2 0.14 + 2.667 1.849
22 H2N (=NH)CH2SSCH2CH2 (=NH)NH2 0.07 + 2.667 1.641
23 H2NCH2CH2CH2NHCH2CH2SPO3H2 0.07 + 2.741 1.904
24 H2NC (=NH)NHCH2CH (CH3) CH2NH (CH2)SPO3H2 0.07 + 2.813 1.945
25 H2NC (=NH)NH (CH2)3NH (CH2)3SPO3H2 0.08 + 2.743 1.897
26 H2NC (=NH)CH2SH 0.13 + 2.727 1.686
27 H2N (CH2)3NHCH2CH (OH)CH2SPO3H2 0.82 + 2.774 1.890
28 CH3CH2CH2CH2NHP (O) (OH)SH 0.15 + 2.667 1.868
29 H2C=CHCH2NHCH2CH2SH 0.85 + 2.333 1.411
30 H2NCH2CH (OH)CH2NHCH2CH2SPO3H2 0.33 + 2.857 1.943

The continuation of the Table 1

31 H2NC (=NH)NHCH2CH2NHCH2CH2SPO3H2 0.10 + 2.897 1.997
32

0.10 + 2.813 1.649
33

0.05 + 2.813 1.649
34 H2N CH2CH2SCN 0.49 + 2.833 1.730
35 H2NCH2CH2SSCH2CH2NH2 0.99 + 2.500 1.571
36 H2N (NH)CNHCH2CH2S2O3H 0.50 + 3.300 2.082
37 H2N (CH2)3NH (CH2)2SPO5H2 0.31 + 2.875 1.919
38 H2N- (CH2)4NH (CH2)2SPO5H2 0.77 + 2.875 1.919
39 CH3CONHCH2CH2SS (CH2)4SO2H 0.17 + 2.813 1.781
40 H2NCH2CH2SS CH2CONH2 0.60 + 2.727 1.794
41 L (-)-H2NCH2CH2CH (NH2)CH2SPO3H2 0.63 + 2.833 1.966
42 H2NC (=NH)NH (CH2)3NHCH2CH2SPO3H2 0.10 + 2.813 1.945
43 HO2S (CH2)4-SSS- (CH2)4SO2H 0.06 + 2.971 1.739
44 H2O3PSCH2CH2NH (CH2)3NH CH2CH2 SPO3H2 0.35 + 2.974 2.014
45 H2NCH2CH2SSCH2COOH 0.30 + 3.000 1.918
46 CH3S (CH2)3NHC (=NH)CH2S2O3H 0.19 + 3.000 1.939
47 H2NCH2CH (NH2)CH2SPO3H2 1.00 + 2.952 2.032
48 H2NCH2CH2SPO3H2 0.64 + 3.125 2.078
49 H2NC (=NH)NHCH2CH2SPO3H2 0.25 + 3.143 2.131
50 HSCH2CONHNHCOCH2SH 0.83 + 3.222 2.059
51 Histamine (H- imidazole-4- ethanamine) 0.90 + 2.588 1.447
52 Mexaminum 0.05 + 2.643 1.473
53 Serotonin (5-hydroxytryptamine) 0.06 + 2.720 1.514
54 Thiazolidin 0.85 + 2.333 1.411
55 H2NCH2CH2CH2CH2CH2SSH 0.24 + 2.381 1.454
56 H2NCH2CH2CH2NHCH2CH2CH2SPO3H 0.33 + 2.724 1.883
57  (CH3)2NC6H4CH (OH)S (CH2)NH2 0.78 + 2.600 1.600

The continuation of the Table 1

58 CH2=CHCH2NHCSNH2 6.89 - 2.667 1.640
59 CH3CH (NH2)COSH 11.4 - 2.769 1.823
60 H2NCH2CH2SO2NH2 4.83 - 2.933 1.907
61 H2NSSO3H 4.65 - 4.222 1.891
62 H2NCH2COSH 11.0 - 3.000 1.961
63 CH3CH2CH2NHCSNH2 4.23 - 2.471 1.545
64 HCONHCH2CH (CH3)SH 3.36 - 2.625 1.717
65 H2NCH2CH2COSH 9.51 - 2.769 1.823
66  (CH3)C (SH)CH (NH2)COOH 13.4 - 2.938 1.875
67  (CH3)2NCSSH 4.12 - 2.769 1.669
68 CH3 CH (NH2)COSH 11.4 - 2.769 1.823
69 H2NCOCH (NH2)CH2SH 9.99 - 2.800 1.857
70 H2NC (=NH)SCH2CH2CH2SO3H 10.1 - 3.143 2.012
71  (CH3)2 NNHCH2CH2SH 4.16 - 2.316 1.457
72 CH3CH2OCOCH2NHCSSCH2CH3 5.07 - 2.522 1.491
73 H2C=CHCH2NHC (O)SCH2COOCH2CH3 4.93 - 2.846 1.174
74 HO (CH2)2CH2NHCH2CH2S2O3H 3.72 - 2.960 1.855
75 4- (2- Mercaptooxazolyl)-erythrite 8.97 - 3.000 1.807
76 H2NCH2CH2SC (O)CH2 3.91 - 2.733 1.774
77 BrC6H4O (CH2)4NHCH2CH2S2O3H 2.13 - 3.000 1.878
78 CH3CH2CH2NHCSNH2 4.23 - 2.471 1.545
79 CH3CH2SC (S)NHCH2COOH 5.59 - 3.053 1.925
80 HO2CCH2NHCONHCH2CH2SH 5.62 - 3.048 1.936
81 Tionicotinamide 4.71 - 3.067 1.706
82 CH3SC (O)CH2CH2NHCONHCH2CH2 SC (O)SCH3 12.3 - 3.031 1.918
83 HOCH2 (CHOH)2CH2NHCH2CH2S2O3H 3.07 - 3.067 1.853
84 HOCH2CHOHCH2NHCH2CH2S2O3H 7.60 - 3.077 1.880
85 2-Carboxypyrrolidine-1- dithiocarboxylic acid 5.24 - 3.211 1.958
86 CH3OCOCH2CH2SO2CH2CH (NH2)COOH 3.18 - 3.143 1.901
87 H2NC (=NH)SCH2CH2CH2SO3H 10.1 - 3.143 2.013
88 [H2NC (=NH)NHCH (COOH)CH2S]2- 3.09 - 3.167 2.017
89 N-Oxide 4-mercaptodihydropyridine 7.87 - 2.970 1.892

End of the current Table 1

90 H2NCH2CHOHCH2S2O3H 5.35 - 3.263 1.970
91 H2NCH2CH (CH2OH)S2O3H 4.81 - 3.263 1.970
92 CH3C (=NH)SCH2CH2CH2S2O3H 5.08 - 3.130 1.951
93 2-Furyl-CH2NHC (=NH)CH2S2O3H 4.00 - 3.360 2.049
94 H2NCONHCH2CH2S2O3H 5.00 - 3.474 2.103
95 γ- (S-Purinyl) Thiopropylsulphonic acid 4.42 - 3.407 2.089
96 HCONHCH2 CH (CH3)SH 3.36 - 2.625 1.717
97 CF3CF2CH2OCOCH2CH2NHCH2CH2S2O3H 3.00 - 3.818 2.249
98 (NC)2C=C (SH)2 3.94 - 4.000 1.922
99 1, 2, 5-Thiadiazole-3-carboxylic acid 7.69 - 4.146 1.842
100 1, 2, 5-Thiadiazole -3, 4-dicarboxylic acid 4.60 - 4.462 2.162

*) The number of electrons in the outer shell of an atom: Z (H) = 1, Z (C) = 4, Z (N) = 5, Z (S) = 6, Z (P) = 5, Z (O) = 6, Z (Pb) = 4, Z (Br, F) = 7.

First of all, we need to set the threshold value of the sign Z(th) which statistically significant separates effective radioprotectors from ineffective radioprotectors. We first determine the mean value of the sign of Z for the sample chemical compounds (Table 1). We obtained the following statistics for average value Z:

N = 100,  = 2.87 ± 0.04,  = 2.235,  = 4.462, Sz = 0.40.(3)

Here and  are the minimum and maximum values of the sign Z; Sz is the standard deviation of the sample. The average value of  should be compatible with other elements of the sample. Typically, the maximum and minimum sample elements are questionable. The element of set is out-of-order of the set, if the following inequality holds:

(4)

where f is the number of degrees of freedom.  is the table value of fractile of τ-distribution of the maximum deviation [6]. Let's verify the compatibility of sample points:

(5)

Here f is equal to 100. From inequality (5) it follows that the chemical compound number N = 100 (Z = 4.462) is not compatible with other the sample elements. Consequently, the chemical compound is to be excluded from the sample and calculating the average value must be repeated. After recurrence the calculations, we have found that the chemical compounds numbered 96, 97, 98, and 99 also must be excluded from the sample. Now the average value has the following statistics:

N = 95,  = 2.80 ± 0.03, = 2.235,  = 3.474, Sz = 0.27. (6)

Here  and  are the minimum and maximum values of Z in sample that contains N = 95 elements; f = 95. Sample satisfies the following inequality:

 = 3.094 <  = 14.1, p = 0.88, N = 95. (7)

Thus, the sample is uniform and fits the normal distribution. Here p value determines the significance level of criterion which determines the probability of error (~ 10%); f is the number of degrees of freedom. Wilk-Shapiro criterion is also satisfied: W = 0.989 >  = 0.950.

Now we can determine the average value of Z for the effective and ineffective radioprotectors (N = 95). As a result, we obtained the following statistics:

N1 = 57,  = 2.71 ± 0.03,  = 2.235,  = 3.300, Sz1 = 0.24,

N2 = 38,  = 2.95 ± 0.04,  = 2.316,  = 3.474, Sz2 = 0.27.(8)

Values of Z are located around  and  for the effective and ineffective chemical compounds, respectively. Using tabulated values of t - distribution, we can verify whether the distinction in the average values of sign Z (

> ) statistically significant. First, we compare the variances of samples:  = 1.34 < . That is, the distinction of the dispersions is not statistically significant. Then we use the following inequality [7]:

(9)

Inequality (9) shows that at the 5% significant level the null hypothesis of equality of average values can be rejected. Consequently, the difference between the average values and are statistically significant.

In the first approximation, we can assume that the average value = 2.80 is a threshold that separates chemical compounds with different radioprotective efficiency. However, it is better to choose the threshold value through repeated testing various Z values close to  (for example, within the mean error). You can then use the value of Z which results to a more convincing statistical inference. This approach is demonstrated in the search of the classification rules by statistical methods for rates and proportions.

According to the analysis, it is preferable to choose a threshold is equal to = 2.87. Importantly, the chemical compounds (NN = 97 - 100) have the sign of Z noticeably larger than the average value  and therefore does not violate the inequality:  > .

We need to verify to see whether the separation of chemical compounds into two conditional groups is the result of random factors. Description of classifications, it is convenient to start with the construction of the table of mutual contingency (or association) [8, 9] (cross-selection method). Figure 1 shows the distribution of the chemical compounds by quadrants of the rectangular 2 × 2 table (table of "four fields"). In each cell of the table is indicated the number (frequency) of qij objects. Obviously, the classification model better describes the phenomenon, the closer the contingency table to diagonal form. In which connection for the objects in each quadrant, we do not assume the existence of a functional mathematical relationship between the dependent variable and the explanatory variable.

A                                                                                                                                B

Figure 1. (A). The distribution of chemical compounds in quadrants of 2 × 2 table. In the lower left quadrant are chemical compounds (number q11) with signs of Z ≤ Z(th) and D ≤ D(th) = 1 mM / kg. In the upper right quadrant are chemical compounds (number q22) with signs of Z> Z(th) and D> 2 mM/kg. In the upper left and lower right quadrants specified number of chemical compounds q12 and q21 with crossed signs: Z ≤ Z(th), D > 2 mM/kg and Z > Z(th), D ≤ D(th) = 1 mM/kg.  (B). Increased size of the lower two quadrants of the figure (А).

Contingency (association) method is applicable, if the sample size satisfies the following inequality: . It is generally believed that the frequencies qij meet the inequality of  subject to i j [8].

To determine the Pearson contingency coefficient Φ [9] between the radioprotective efficacy and value of sign Z we use the following equation:

(10)

Here number of degrees of freedom is equal to f = N – 2;  = 45 is number of effective chemical compounds having the sign value  subject to D 1 mM/kg;  = 12 is number of effective chemical compounds having the sign value subject to D(th) = 1 mM/kg;  = 29 is number of effective chemical compounds having the sign value subject to D > 2mM/kg;  = 14 is number of effective chemical compounds having the sign value  subject to D > 2mM/kg; (Table 2). For tetrachoric contingency tables can also be used the Yule coefficient association [8]:

(11)

The coefficient Q = 0.77 point to the existence of the interrelation between the signs. Obviously, this coefficient is in the following range of values: .

Signs RE (the radioprotective efficiency) and Z are independent if the product of the marginal or unconditional proportions is equal to the joint proportion (see Table 2). For example, we obtained the following result: p12 = 0.12. These proportions differ considerably. The greater the distinction, the interdependence of signs RE and Z is greater.

The application of the threshold value  leads to more convincing statistical results than using the average value of In brackets (see Table 2), we reported the statistical results that have been obtained for the average value of. Using the average value also suggests a correlation signs at significance level α = 0.05. In this case, the strength of the interrelation too weak:  = 0.19. Therefore, it is preferable to use the threshold value 2.87. The adequacy of the model, we can verify using the value of the empirical error. The error is determined by the fraction of misclassified objects: . Using the data in Table 2, we found the following value of the empirical error of the model:. Application of the threshold value  reduces the empirical error of model (approximately 21%).

Table 2. The table of classifications.

The sign of Z Radioprotective efficacy (RE) The total sum
Effective chemical compound, D  1mM/kg Inefficient chemical compounds, D > 2mM/kg

q11 = 45 (36) q21 = 14 (12) 59 (48)
p11 = 0.45 (0.36) p21 = 0.14 (0.12) P1 = 0.59 (0.48)

q12 = 12 (21) q22 = 29 (31) 41 (52)
p12 = 0.12 (0.21) p22 = 0.29 (0.31) P2 = 0.41 (0.52)
The total sum 57(57) 43 (43) N =100
1P = 0.57 (0.57) 2P = 0.43 (0.43)

1.00

Q = 0.77 (0.63), = 0.39 (0.19), *),

SE =0.09 (0.09), K = 0.43 (0.32), |rtet| = 0.68 (0.53), Δ = 0.26 (0.33).

*) chi-square we calculated using the equation (17).

Let's see the representativeness of the sample (Table 1). Using a table of random numbers [10], we will make a partial sample of data Table 1. The method of random numbers avoids involuntary and systematic mistakes in the preparation of the sample. As a result, we obtained the following sequence of random numbers:

03, 47, 43, 73, 86, 36, 96, 46, 63, 71, 62, 33, 26, 16, 80, 45, 60, 11, 14, 10, 74, 24, 67,42, 81, 57, 20, 53, 32, 37, 27, 07, 51, 79, 89, 76, 66, 56, 50, 90.                 (12)

A series of random numbers, we can obtain, starting from any point of the table of random numbers. We wrote all the random numbers that do not exceed number of 96 [6]. Comparing these numbers with random numbers of chemical compounds Table 1, the partial sample of 40 items was obtained. In a partial sample the sequence of chemical compounds represented by "with an open mind" [10]. Statistics of the partial sample is as follows:

N = 40,  = 2.82 ± 0.04,  = 2.316,  = 3.300, Sz = 0.23.

N1 = 24,  = 2.78 ± 0.04,  = 2.333,  = 3.300, Sz1 = 0.21,

N2 = 16,  = 2.88 ± 0.07,  = 2.316,  = 3.263, Sz2 = 0.25.(13)

This result is similar to the statistics (6), at while the sign of Z is represented in the same proportion as in the original sample.

The standard error of contingency coefficient we can be assessed using the following equation:

(14)

Testing of the significance is carried out by using chi - test [9]:

(15)

i.e., at the α = 0.05 significance level of the null hypothesis can be rejected. For normally distributed data, you can additionally use the tetrachoric coefficient (-1 ≤ rtet ≤ 1) association:

(16)

However, if the distribution of frequencies on borders of two-by-two table is non-uniformly, then coefficient becomes unreliable. Therefore, commonly used [8, 9], Pearson goodness of fit (adjusted for continuity of Yates):

(10.8)

(17)

Here N = q11 + q12 + q22 + q21 is the sum of all frequencies. The inequality (17) shows that there is a statistically significant interrelation of signs. However, the criterion (17) does not give an idea of the strength of the signs interrelation. The assessment of closeness of the linkage between the signs can be obtained by using the coefficient of mutual contingency Pearson:

(18)

The indicator of mean-square of mutual conjugation  is equal to:

(19)

Using equation (18) we determine the coefficient of mutual contingency K = 0.43(0.32), which confirms the interrelation of dichotomous signs.

Study of the interrelationship of the molecules structure - activity showed that the electronic sign of Z is associated with the Shannon informational function [11]:

(20)

where , and the following ratios are met for : , ; ,  is number of varieties of atoms in the molecule, N is the total number of atoms. Ratio  determines the relative share of i-th kind of the atom in the molecule [12]. Shannon function is an integral characteristic of the molecule that determines the measure of uncertainty (or diversity) of the structure of chemical compound. The smaller value of the function H, the more diverse (on the relative content of atoms) a multicomponent system.

Using the data of Table 1 we define the average value of the information function:

N = 100,  = 1.80 ± 0.02,  = 1.174,  = 2.249, SH = 0.21.(21)

We verify the compatibility of the elements of the sample on the basis of H:

(22)

Here f is equal to 100. Consequently, the sample does not contain incompatible elements. Statistics of average values of information functions for effective radioprotectors will be as follows:

N1 = 57,  = 1.76 ± 0.03,  = 1.378,  = 2.131, SH1 = 0.21.(23)

This subset is close to a normal distribution: , and the following inequality satisfies to the criterion of Wilk-Shapiro: W = 0.951 >  = 0.947. Let's see the compatibility of the elements of this subset:

(24)

Here f is equal to 57. These inequalities are point to the lack of incompatible elements.

For the inefficient radioprotectors statistics of the average value will be as follows:

N2 = 43,  = 1.85 ± 0.03,  = 1.174,  = 2.249, SH2 = 0.20.(25)

Checking of elements of the second subset leads to inequalities:

(26)

Here f is equal to 43. From the second inequality it follows that the chemical compound number 16 (H = 1.174 bit) is incompatible with the other elements of the subset. After excluding this element, we obtained the following statistics for the information function:(1.174)

N2 = 42,  = 1.87 ± 0.03,  = 1.457,  = 2.249, SH2 = 0.17.(27)

This subset is close to a normal distribution: . Wilk-Shapiro criterion exceeds the critical value: W = 0.964 >  = 0.942. The examination of uniformity of the subset leads to the following inequalities:

(28)

Here f is equal to 42. Thus, the subset comprises only compatible elements. Let's see whether the distinction between the average values of  and statistically significant. We predefine the distinction between the variances of  and :  = 1.52 < . That is, the distinction in variance is not statistically significant. Therefore, we must use the following inequality:

N = 99, N1 =57, N2 = 42, SH1 = 0.21, SH2 = 0.17.     (29)

Inequality (29) rejects the null hypothesis on equality of the average values of the information functions.

Again, we will use the association method of qualitative signs. We choose as the boundary value the following value of the information function (21):  = 1.80bit. The numerical data are contained in Table 3.

Table 3. The table of classifications.

The sign of H, bit Radioprotective efficacy (RE) The total sum
Effective chemical compounds, D  1mM/kg Inefficient chemical compounds, D > 2mM/kg

q11 = 31 q21 = 11 42
p11 = 0.31 p21 = 0.11 P1 = 0.42

q12 = 26 q22 = 32 58
p12 = 0.26 p22 = 0.32 P2 = 0.58
The total sum 57 43 N =100
1P = 0.57 2P = 0.43

Q = 0.55,  = 0.07, *),

SE = 0.10, K = 0.25, |rtet| = 0.46, Δ = 0.37.

*) Chi-square we calculated using the equation (17).

Thus, the sign of H serves as the boundary between effective radioprotectors and ineffective chemicals. Variation of the threshold H(av) = 1.80bit does not improve the statistical results.

Let's examine these classification rules for chemical compounds that have anti-radiation activity. These chemical compounds were not included in the original sample: 1) NH2CH2CH2CH2SH (Dose: 3.79mM/kg; Z = 2.73, H = 1.43bit), 2) (CH3)2S=O (Dose: 6.4-12.8mM/kg; Z = 2.60, H = 1.57bit), 3) NH2CH2CH2NHCOCH2SH (Dose; Z = 2.63, H = 1.77bit), 4) cysteine (Dose: 1.56-1.94mM/kg; Z = 2.36, H = 1.49bit), 5) disulfide β – mercaptoethylamine (Dose: 0.99-1.18mM/kg; Z = 2.50, H = 1.57bit), 6) Sβ aminoethylisothiuronium (AET) (Dose: 1.68-2.10mM/kg; Z = 2.63, H = 1.63bit), 7) (CH3)2N-C6H5-CH (OH)-S-CH2CH2NH2 (Dose: 0.88-1.77mM/kg; Z = 2.55, H = 1.56bit). Obviously, signs of these chemical compounds satisfy the inequalities:, .

The analysis has shown the molecular signs of Z and H are interconnected. For the effective radioprotectors the interrelation can be described by the following linear regression (Fig. 2):

Figure 2. Scattering pattern of the electronic and information signs. H and Z values are taken from Table. 1; - - - - linear approximation (31). We have marked by triangles (Δ) the effective radioprotectors (N1 = 57). We have marked by bold dots (•) the chemical compounds that do not have effective radioprotective effect (N2 = 43). ─── non-linear approximation which is defined by the equation (32).

, R = 0.87 >  = 0.22, N1 = 57, S1 = 0.122.(30)

The absolute term A and the regression coefficient B are equal to:

A = – 0.332 ± 0.338, SA = 0.169, B = 0.772 ± 0.124, SB = 0.062,

RMSE = 0.109, ,

F =153.3 >>  = 7.12, t = 9.5 >

(31)

Here statistics  estimates the variance from the regression line; SA and SB are standard errors of the regression parameters; R is the sample correlation coefficient. Number of connections is equal to m = 1; number of degrees of freedom is equal to f = N1 - m - 1 [8]. The confidence limits for the free term A and the regression coefficient B at a significance level α = 0.05 were determined according to the formula: .

For chemical agents which do not possess effective radiation protective action, this interrelation is nonlinear (Figure 2) and can be approximated by the following analytical form:

, , , ,

N2 = 43, R = 0.89, RMSE = 0.074,

(32)

We are able to obtain additional information about the nonlinear dependence of H(Z) (Figure 2) if we will make a variation series of the grouped chemical compounds. It is usually used six to eight groups for the sample size N ≈ 40-60. Previously we have to rank the variation series (for example, in ascending order of Z). Then the data are grouped on the factor of Z. It is convenient the groupment to make at regular intervals. We chose number of groups equal to n = 6. We can determine the width of the interval using the following equation: . For each group, we calculate the mean values  and , here i = 1, 2,..., 6. After that the ratios of the average values are compared. We give some relationships:

,

,

,

(33)

The parameter  should be close to a constant value for the linear approximation. The frequency of sample units in groups (3(1), 9(2), 10(3), 13(4), 5(5), 3(6)) close to a normal distribution: W = 0.902 > = 0.788.

The separation into different groups allows us to calculate the empirical correlation ratio  = 0.84; here  is the variance between groups,  is the total variance of the original sample of 43 elements. Obviously, the empirical ratio  varies from zero to unity. The ratio  allows us to quantify the impact of Z on the variation of the resultant variable H.

Hereafter we may calculate the following theoretical correlation ratio: . Here  = 0.02 is the variance of balanced values of information function. The variance of empirical (actual) values of resultant variable is equal to S2 = 0.025. The theoretical correlation ratio is equal to  = 0.89 (coefficient of the determination is equal to  = 0.79). It is known, the non-linear relationship between the signs is a strong (on a scale Cheddoka) if there is the following inequality: 0.7 < < 0.9.

Figure 3. (1) Scattering pattern of the electronic and information signs for radioprotector series: CH3(CH2)mNHCH2CH2SSO3H (m = 0, 1,…, 17) [4]. The regression equation: ,, , , RMSE = 0.0006. (2) Series of chemical compounds: CH3(CH2)mNH(CH2)nSPO3H2 (m = 2, 3, 4, n = 2, 3) [13]. The regression equation: , , , , RMSE = 0.0009. (3) Series of chemical compounds: NH2(CH2)mSH (m = 2, 3, 4, 5) [5]. The regression equation: , , , ,  RMSE = 0.0011.

Figure 4. Relationship of the information function with an additional contribution of π in the hydrophobicity of radioprotectors: (1) CH3(CH2)mNHCH2CH2SSO3H, (2) CH3(CH2)mNH(CH2)nSPO3H2 and (3) NH2(CH)mSH. (1) The regression line is approximated by the following function: H(π) = Ažexp (-Cžπ) + B, A = 0.682 ± 0.004, B = 1.26 ± 0.004, C = 0.198 ± 0.003, RMSE = 0.002. (2) The regression line is approximated by the following function: H(π) = Ažexp (-Cžπ) + B, A = 0.968 ± 0.030, B = 1.32 ± 0.04, C = 0.191 ± 0.013, RMSE = 0.0004. (3) The regression line is approximated by the following function: H(π) = Ažexp (-Cžπ) + B, A = 0.541 ± 0.015, B = 1.11 ± 0.02, C = 0.337 ± 0.023, RMSE = 0.0007.

Figure 5. The frequency of appearance nitrogen (1), oxygen (2), hydrogen (3), sulfur (4), carbon (5) and phosphorus (6) in molecules (Table 1). In the figure (D) there is no line for phosphorus. In Table 1, there is only one ineffective agent that contains phosphorus atom.

As analysis has shown the information function relates to the value of π. The value of π = 0.52 [14] defines an additional contribution of the group atoms CH2 in hydrophobicity of molecules. Figure 4 shows this relationship for radioprotectors: CH3(CH2)mNHCH2CH2SSO3H (m = 0, 1,…, 17), CH3(CH2)mNH(CH2)nSPO3H2 (m = 2, 3, 4, n = 2, 3), NH2(CH2)mSH (m = 2, 3, 4, 5).

The positive interrelation between the signs of Z and H is not random. Information function determines the diversity of the molecular structure, which in turn is determined by the number of different atoms, forming a bound complex of atoms, i.e., molecules. At the same time, the structure of the molecule is not an arbitrary set of various atoms, but is determined by the valence electrons in the outer electron shell. Apparently, this quantum-chemical property establishes the interrelation of two signs of Z and H for molecular structures. RMSE is so low that regression equations (Figures 3 and 4) to converge towards functional dependence.

Some distinctions between effective and inefficient radioprotectors we can get if we will analyze the frequency of the atoms appearance in the molecule. Figure 5 shows the frequency of occurrence of atoms (C, H, N, O, S, P) in the molecule.

Using the data of Table 1, we can approximately indicate the frequency of occurrence of atoms in a molecule of hypothetical effective agent (for a homogeneous sample): P ~ 1, S ~ 1, N ~ 2, O ~ 3, C ~ 5-6, H ~ 17 (Figure 5)1. At the same time the most probable distribution of atoms in the inefficient agents (hypothetical molecule) will be as follows: P ~ 1, N ~ 1, O ~ 1, S ~ 2, C ~ 4, H ~ 8-10.

3. Conclusion

The proposed classification rules allow to identify the similarities between the molecular structures. These rules can be practically useful in a preliminary forecast of bioactivity of new chemical compounds. It should be noted that for the calculation of signs of Z and H is only required the knowledge of the chemical structural formula. This makes much easier the work in a preliminary searching for new bioactive chemicals. Classification rules allow you to set whether you can expect from a chemical compound effective biological action. The ability to separate the biologically active chemical compounds from the inactive chemical compounds on the basis of the sign of Z, apparently is due to the existence of the real molecular electrostatic potential. The magnitude of this potential varies from molecule to molecule. Moreover, there is a threshold of the electrostatic potential for effective chemical compounds which is lower of some value (in absolute value). The method described in this article, has yielded positive results when researching antifungal activity and toxicity of chemical compounds [15]. This method was also used in the analysis of the activity of carcinogenic chemicals [16].

However, it should be noted that these rules are not sensitive to iso-electronic molecular systems, as well as for the isomer molecules. This approach gives the most reliable results when analyzing the homologous series of chemical compounds. Homologous series are generally characterized by the signs that satisfy the compatibility condition.

Abbreviation

I. P. - intraperitonel, A. R. P. - antiradiation protection, RE - radioprotective efficiency, RMSE – Root Mean Square Error.


References

  1. Alexander P., et al. (1955) Mode of action of some substances which protect against the lethal effects of x-rays. Radiat. Res. Vol. 2 (2): p. 392.
  2. Veljkovič V., Lalovič D. (1977) Simple theoretical criterion of chemical carcinogenicity. Experientia. Vol.33(9): p.1228.
  3. Veljkovič V., Lalovič D. (1973) General model pseudopotential for positive ions. Phys. Lett A. Vol. 45 (1): p. 59.
  4. Sweeney T. R. (1979) A Survey of Compounds from the Antiradiation Drug Development Program. Washington.
  5. Romantcev E. F. (1968) Radiation and chemical protection. Moscow. (in Russian).
  6. Handbook of Applicable Mathematics. (1984). Vol. VI. Statistics. Part B. John Willey & Sons. Chichester-New York-Brisborne-Toronto-Singapore.
  7. Pustyl'nik E. I. (1978) Statistical methods for the analysis and processing of observations. Moscow (in Russian).
  8. Förster E., Rönz B. (1979) Methoden der Korrelations – und Regressionanalyse. Berlin.
  9. Fleiss J.L. (1981) Statistical Methods for Rates and Proportions. Chichester-New York-Brisborne-Toronto-Singapore.
  10. Urbach V. Y. (1975) Statistical analysis in biological and medical studies. Moscow (in Russian).
  11. Shannon C. (1948) A mathematical theory of communication. Bell. Techn. Journal. Vol. 27: p. 379.
  12. Mukhomorov V. K. (2012) Modeling of chemical compounds bioactivity. Relationships of structure - bioactivity. Lambert Academic Publisher. Saarbrűcken. Germany. p. 165. ISBN: 978-3-659-19941-7. (in Russian).
  13. Yaschunsky V. G. (1975) Progress in the search for chemical protective agents against radiation. Russian Chemical Reviews. Vol. 44 (3): p. 260.
  14. Leo A., Hansch C., Elkins D. (1971) Partition coefficients and their uses. Chem. Reviews. Vol.71 (6): p.525.
  15. Mukhomorov V. K. (2014) Bioactivity-structure. Interrelation of electronic and information factors of biologically activity of chemical compounds. Trends Journal of Sciences Research. Vol. 1 (1): p. 38.
  16. Mukhomorov V. K. (2011) Entropy approach to the study of biological activity of chemical compounds: The other side of radioprotectors. Advances in Biological Chemistry. Vol. 1 (1): p. 1.

Footnotes

1This sequence of numbers is close to the Fibonacci series: 1, 1, 2, 5, 8, 13.

Article Tools
  Abstract
  PDF(415K)
Follow on us
ADDRESS
Science Publishing Group
548 FASHION AVENUE
NEW YORK, NY 10018
U.S.A.
Tel: (001)347-688-8931