package molenc
Install
Dune Dependency
Authors
Maintainers
Sources
sha256=510ab13cf4ba517c6ef74d1c37087146fa28736e26c7bd4bd91503c5bfe53944
md5=54ab92f2ce0572c195cde171994dae19
Description
Chemical fingerprints are lossy encodings of molecules. molenc allows to encode molecules using unfolded-counted fingerprints (i.e. a potentially very long but sparse vector of positive integers).
Currently, Faulon fingerprints are supported. In the future, atom pair fingerprints might be added. Currently, atom types are the quadruplet (#pi-electrons, element symbol, #HA neighbors, formal charge). In the future, pharmacophore features might be supported (a more abstract/fuzzy atom typing scheme).
Bibliography:
Carhart, R. E., Smith, D. H., & Venkataraghavan, R. (1985). Atom pairs as molecular features in structure-activity studies: definition and applications. Journal of Chemical Information and Computer Sciences, 25(2), 64-73.
Kearsley, S. K., Sallamack, S., Fluder, E. M., Andose, J. D., Mosley, R. T., & Sheridan, R. P. (1996). Chemical similarity using physiochemical property descriptors. Journal of Chemical Information and Computer Sciences, 36(1), 118-127.
Faulon, J. L., Visco, D. P., & Pophale, R. S. (2003). The signature molecular descriptor. 1. Using extended valence sequences in QSAR and QSPR studies. Journal of chemical information and computer sciences, 43(3), 707-720.
OpenSMILES specification. Craig A. James et. al. v1.0 2016-05-15. http://opensmiles.org/opensmiles.html
Published: 24 Jul 2019
README
molenc
Molecular encoder using rdkit and OCaml.
The implemented fingerprint is J-L Faulon's "Signature Molecular Descriptor". This is an unfolded-counted chemical fingerprint. Such fingerprints are less lossy than most well-known chemical fingerprints like ECFP4.
Some advantages: such fingerprints don't create feature collisions upon encoding. Also, upon encoding, a feature dictionary is created. It can be used later on to map a given feature index to an atom environment.
We recommend using a radius of zero to one (molenc_d -r 0:1 ...).
The fingerprint can be run using atom types (#pi-electrons, element symbol, #HA neighbors, formal charge) or rdkit pharmacophore features (TODO) (Donor, Acceptor, PosIonizable, NegIonizable, Aromatic, Hydrophobe), if you want a fuzzier description of your molecules.
Bibliography
Carhart, R. E., Smith, D. H., & Venkataraghavan, R. (1985). Atom pairs as molecular features in structure-activity studies: definition and applications. Journal of Chemical Information and Computer Sciences, 25(2), 64-73.
Kearsley, S. K., Sallamack, S., Fluder, E. M., Andose, J. D., Mosley, R. T., & Sheridan, R. P. (1996). Chemical similarity using physiochemical property descriptors. Journal of Chemical Information and Computer Sciences, 36(1), 118-127.
Faulon, J. L., Visco, D. P., & Pophale, R. S. (2003). The signature molecular descriptor. 1. Using extended valence sequences in QSAR and QSPR studies. Journal of chemical information and computer sciences, 43(3), 707-720.
Dependencies (8)
-
ocaml
>= "4.04.0" & < "5.0"
- conf-python-3
- conf-rdkit
- minicli
-
dolog
< "4.0.0"
- batteries
-
dune
< "3.0"
- bst
Dev Dependencies
None
Conflicts
None