Visitors   Views   Downloads

Outline for an information theoretic search engine

View preprint
You know how they say "age is but a number"? There's "just a number" for document ranking by a search engine/conversational AI. It plays a role in statistical mechanics and information theory. A cool number, therefore.
@earnmyturns @yoavgo @annargrs Great to hear Dr Pereira's voice. I still think DL is overly complex, and that ranking is best done by assigning as a number for it, the minimal entropy of a documents in comparison to the query.
@drnic1 I also believe physics principles can play a role in data science. I propose a search engine claming it is based on the same minimal entropy principles used to fight RADAR back scatter. RADAR is used in Astrophysics.
@GaryMarcus I am taking the opportunity to once more point out Shannon's Information Theory is underutilized. While this has not been implemented yet, this "conversational AI" design does a lot of what Meena does.
@SmallerNNsPls @yoavgo Maybe I am repeating myself and being a bore. Information theory is underutilized. Data point: use of noisy channel theory in conversational AI yields one relevant document in response to information need.
@dennybritz I have designed a facility for "conversational AI", or search engine. Not coded up yet. Returns one "relevant" document, technically defined.
@odsc Another example: Claude Shannon's Information Theory. Von Neumann suggested to Shannon use of the notion "entropy", which links Inf. Th. and physics, so that some physics formulae may apply in the field of search engines...
@ProgressSW 1) Find the most relevant document in a document collection: 2) Do question answering on that doc with Preference Semantics (Yoris Wilks) and Natural Logic (Manning & MacCartney). Conversational AI! "common sense reasoning".
@atbolsh @ylecun @GaryMarcus Towards a chat bot: (1) Use the code designed here to find the most relevant document in a document set. Not implemented. (2) Knowledge bases (Yorick Wilks) & natural logic (Manning MacCartney) to reason with common sense.
@yoavgo Syntax is but a tale, full of sound and fury, told by an idiot, signifying nothing. So you dont need it, and may instead use "minimal entropy" as a figure relating a document in a set to a query. The figure can be interpreted as "relevance".
@MIT_CSAIL Shannon's work is underutilized. I use "minimum entropy" to relevance rank, and use the most relevant document to perform QA on. I also use MI/KLD, a well-known combination, for near-100% accurate predictive analytics.
@thetalperry @yoavgo Syntax is but fury, told by an idiot, signifying nothing. Relevance of a document vis-a-vis a query is but a number: the minimal entropy of the document compared to the query. Back to the basics with Claude Shannon.
@wellformedness Ken Church also used the Noisy Channel model by Claude Shannon - for spell correction. It inspired me to use the Noisy Channel for designing a search engine/Conversational AI.
@CharlotteHase Could "search relevance" be just a number, much like age, that indicates the "miminal entropy" between information need and a given document in the collection? Could this be part of Stephen Wolfram's research program?
@hardmaru @AaronHertzmann "What Shannon knew" is being underutilized. I am trying to put it to use in designing a based on "entropy", part of his theory.
@stephen_wolfram On the link between information and physics, Von Neuman suggested to Claude Shannon the latter use the notion of entropy. So there is a relation between information theory and statistical mechanics.
@ForbesTech @WallStManeet @katiedjennings Aside for your edification: electrograms can be read by way of Entropy Minimization, which is applied to RADAR echoes engines/conversational AI as well.
RT @kooswilt: @marcusborba @Comatose_D @FavioVaz @heizelvazquez @KirkDBorne @bobehayes @mvollmer1 @YvesMulkers @Strategy_Gal @jenstirrup @r…
@marcusborba @Comatose_D @FavioVaz @heizelvazquez @KirkDBorne @bobehayes @mvollmer1 @YvesMulkers @Strategy_Gal @jenstirrup @rwang0 @schmarzo @data_nerd Nice. I'd like to emphasisize that pure "information theory" is underutilized in my humble opinion. I propose a search engine/method for conversational AI on Shannon's work. I have also found a way to use Shannon to do syntactic bracketing.
@roireichart @jasonbaldridge IMHO, "information theory" is underutilized as statistics for NLP. Grouping words with information theory: Search engine/conversational AI based on "information theory".
@HealthITNews Chat bot/Conversational AI design based on information theory:
@jasonbaldridge McLelland: "In short, we argue that language evolved for communication about situations and our systems should address this goal." Viewing language as communication, I designed a search engine -no implementation yet.
@HC_Finance So radiology uses "entropy minimization", just like dealing with RADAR backscatter. And my proposed search engine/chat bot. I say we have a univeral communication principle here. "Communication improves as it is less chaotic", in mathematical detail.
@BethCarey12 People need to be able to talk to their computer. This has been a long-standing goal of NLP. Chomsky worked on "command and control" systems. My effort, returns the 1-best doc from a collection, then applies "common sense reasoning".
@WIRED I recommend the writings of Dutch physicist Erik Verlinde, who concludes the universe is made up of information. And yes, there is a connection between Shannon's "information theory" and "statistical mechanics" centered around entropy.
"PeerJ Preprints" is a venue for early communication or feedback before peer review. Data may be preliminary.

Additional Information

Competing Interests

The author declares that they have no competing interests.

Author Contributions

Koos Vanderwilt wrote the paper, reviewed drafts of the paper.


The author received no funding for this work.

Add your feedback

Before adding feedback, consider if it can be asked as a question instead, and if so then use the Question tab. Pointing out typos is fine, but authors are encouraged to accept only substantially helpful feedback.

Some Markdown syntax is allowed: _italic_ **bold** ^superscript^ ~subscript~ %%blockquote%% [link text](link URL)
By posting this you agree to PeerJ's commenting policies