From: Ben Goertzel (ben@goertzel.org)
Date: Sat May 11 2002 - 09:24:40 MDT
> > In bio & finance we got really awesome results in terms of being able to
> > recognized fancier patterns than anyone else.
>
> What results? Why are they awesome? According to my
> already-given estimate
> of the system, Novamente does have a genuine AI capability in the
> domain of
> pattern recognition and may be able to achieve a genuine AI capability in
> solving some goal-oriented problems in the patterns it can recognize, so
> recognizing fancier patterns than anyone else is exactly what I *think*
> Novamente ought to be able to do, but it would still help to have
> a specific
> example of a novel pattern that Novamente recognized. I could, after all,
> be too optimistic.
Here is a simple example of a pattern recognized by the system quite
recently, using some *very* simple methods compared to the full Novamente
capability...
*******
LOW, MOD_LOW, EXTR_HIGH, DECREASE, INCREASE are predicates applied to
numbers w/respect to a given set of numbers (establishing a Context). Here
the context is a dataset of gene expression values regarding the yeast cell
cycle.
SIC1, PCL2, CLN3, SWI5 are shorthands for gene expression values.
For instance, SIC1 is a shorthand for
expression(GeneNode SIC1) which is a function from time-values to numbers
The variable D is a category learned from the Stanford Genome Database
A relatively simple pattern in a yeast gene expression dataset, with a
fairly clear human meaning, is then:
---- C = ( LOW(SIC1) OR MOD_LOW(SIC1) ) AND ( LOW(PCL2) ) AND (LOW(CLN3)) OR (MOD_LOW(CLN3) ) D = "involved in transcriptional regulation of CUP1" C AND EXTR_HIGH( SWI5) --> DECREASE(SWI5) AND INCREASE(D) C AND (MOD_HIGH(SWI5) OR HIGH(SWI5)) --> INCREASE(SWI5) AND INCREASE(D) ---- ********** I've omitted the quantitative truth values of the patterns... Many patterns found are more complicated by far, but in this application simple patterns are favored because the patterns are intended to be viewed by human biologists so they can stimulate the human mind to further hypothesis. No one told the system to look at these particular 4 genes (there are 6100 yeast genes), or to look at transcriptional regulation as opposed to hundreds of other properties of genes in databases the system was given.... We did tell the system to look for patterns involving combinations of certain predicates though. This is classic "data mining" -- the system is thrown a bunch of data and looks for interesting patterns. It happens to be a datamining problem that has proved unsolvable for traditional datamining methods however, in spite of several years' effort. (Only several years because gene expression data only recently became available due to the recent invention of gene chips & spotted microarrays.) It's far from AGI of course ... the system is not posing its own problems based on its own experience, it's being given a particular data analysis problem by a script and then spitting out answers after hours of processing. It does involve two levels of analysis, a breakdown that's sort of interesting from a cognitive perspective. level 1 is finding some part of the perceptual data that's worth paying attention to (e.g. a "coherent" set of genes), Level 2 is finding the best possible patterns in this coherent/attention-worthy part. Of course I am leaving out all the details of how this app was set up, which involves loads of tricks. There is a paper to be submitted to Journal of Computational Biology that tells more, but it specifically doesn't go into the details of Novamente AI, instead trying to explain the application-specific behavior of Novamente in more conventional terms (so as to be able to explain what's going on in the context of an isolated research paper). Previous approaches to the gene expression data analysis problem are much simpler and less useful, involving either clustering, or recognition of simple pairwise patterns between genes, or linear time dependencies. These previous approaches do not reveal much of the genetic regulatory structure underlying the dataset, but Novamente-based analysis does. Again, please do not accuse me of confusing this work with AGI. Of course, it's "just" datamining. We find that doing this sort of application, in addition to its intrinsic value (better understanding of gene interactions helps lead to disease cures etc.), helps us to tune the various mechanisms involved. There is a risk that tuning a cognitive mechanism for datamining is actually tuning it to perform BADLY on more closely AGI-related tasks, but we have a theoretical perspective that at least helps us to ward off this problem. Also, please don't think that all our work is narrowly focused on such datamining apps. Most of it now is focused on building basic stuff that's useful both for datamining and for AGI, but there's also some work going on that is not useful for datamining in the short term, only for AGI. Finally, this particular R&D is at an early stage. In 6 months we should have a *really solid* system for gene expression data analysis, we're just beginning with this application area at the moment. -- Ben
This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:38 MDT