add data mining actors (weka) and cheminformatics actors
Joseph Maria requested the incorporation of data mining actors based on WEKA and cheminformatics actors based on CDK. Email correspondence between him and im McPhillips follows:
Tim McPhillips wrote:
I think this is an excellent idea, it would greatly increase the number of available data mining algorithms under Kepler and benefit from all current and future developments in Weka.
I am not a weka specialist, but as far as I know, Weka uses standard interfaces for most of its components and algorithms (e.g. one interface for all classifiers, one for all filters, ... ), so it should be possible to write some fairly generic wrapper(s) to incorporate weka functionality into Kepler.
There is another machine learning package "RapidMiner" (the former "Yale") (http://rapid-i.com/, http://sourceforge.net/projects/yale) which extends Weka, it might be useful to look into that to see how they have incorporated Weka or even use this as a basis for incorporation into Kepler.
Josep Maria Campanera Alsina wrote:
Hi all again,
I'd like to know if there are any plan in the kepler project related to two very useful Java open source tools:
- WEKA, http://www.cs.waikato.ac.nz/ml/weka/ . The most known and popular Java library for data mining.
- CDK - The chemistry development kit, http://cdk.sourceforge.net . Java library for structural chemo- and bioinformatics .
In other words,
(1) Are there any plan to integrate them into kepler core in the near future?
(2) Are there any kepler workflow available that already uses these tools?
(3) What would the strategy be to integrate them into the platforms, I mean since they are in JAVA are there any "easy" standard procedure to implement/embed them into Kepler?
Hopefully, we will see these useful tools embed in to kepler soon!