20th WCP: What Is Information?

In the books and papers on brain science, cognitive science, etc., one of the most frequently used terms is information. We are told that brains and their various subunits — down to the level of a single neuron — process information, store it, retrieve it, transmit it, etc. They do, indeed. The point, however, is that we are not told what information is.

Perhaps information is meant to be understood in the sense first given by C. Shannon? If so, it would be a huge misunderstanding for at least two reasons. First, his approach is entirely content-neutral. It concerns only technical/economical, quantitative problems of data transmission and communication. Brain activity, on the other hand, is concerned with regulation and control, where the content of information matters a lot. Furthermore, since according to Shannon's approach information is what reduces uncertainty, the whole idea presupposes such things as knowledge of a priori probabilities — a requirement which can hardly be attributed to, say, frogs and butterflies. It can serve well the purposes of mathematicians and engineers dealing with well-specified communication problems, but it is useless with regard to the systems which must cope with varieties of environmental stimuli.

I suppose that what is taken for granted here is a commonsense, mentalistic connotation: information is thought to be a piece of knowledge. If this is the assumption being made, we must either flatly reject it because of its strong anthropocentric bias, or we must treat it figuratively, as a conventional term of art with no objective counterpart in reality.

Consider the genetic code, for example. We are often told that genes contain information on all phenotypic properties, as if genes were a kind of blueprint. The cells themselves do not of course "know" that their aggregated activity will in some distant future lead to definite phenomenal properties. In no phase of the processes involved is "information" concerning these phenomenal properties available to the genes or cells. It is we who know that such a relationship obtains. Now, if any information at all is to be available to the cells, it must be something which determines their activity. And a future state of affairs cannot, of course, do that; that would be sheer teleology.

My thesis is that "information" is — epistemologically — a realist category: what I call information is something "out there" in the objective reality, not just a useful term of art residing in the investigator's mind only. It has no mass, energy, or spatial extension, it cannot be seen, touched, or smelled. Nevertheless it is a distinct, objective entity.

What, then, is information?

If information is "out there", then it should be defined in terms appropriate to the systems able to make use of it, that is in terms of the resources available to such systems. Therefore our definition should be formulated in terms of what was already present in the physical world several billion years ago, for it was then that first informational systems emerged. What was there, then? Well, different physical substances — gases, liquids, solid bodies — having different structures and properties and bound by different physical forces. Was there anything else? Yes, there were differences between them.

Difference is not real entity, to be sure, but we are not bound to limit ourselves to the mode of existence of "real beings". Differences are something objective after all, they are "out there", whether perceived or not. Difference is a kind of relation, namely a relation of nonidentity between physical entities or their properties. As such, difference is an irreal entity — it can be changed by some action but is unable to act on its own.

The fundamental feature of animate systems (which are all informational systems) is their ability to discriminate and select. From single cells to plants to animals and human beings the behavior of animate systems depends on what they can discriminate, be it the concentration of certain substances, or the magnitude of physical parameters like temperature, humidity, or shape. What counts here is thus some detected difference between things that can be distinguished by a system. "Difference" and "detection" are thus two key words in our enterprise of grasping what information is. Information is, roughly speaking, any detected difference. (1)

To arrive at a more precise definition I shall define some auxiliary concepts as follows:

Collection = a spatiotemporally ordered (and thus taken in a collective, rather than a distributive sense) subset of any set, whose elements can be repeated in the collection many times. (2)

Alphabet = a set of physical states which can be realized in, and discriminated by, some system. (3)

Code = an alphabet's collection realized somewhere in the system (=the inner code). Also — any collection of physical states outside the system which can be transformed into the first one or into which the first one can be transformed (=the outer code). (4)

Repertoire = a set of differences in code which can be detected by the system. (5)

And finally,

Information = a repertoire's collection detected somewhere within the system.

Information is an abstract entity. It has no separate existence on its own, because no difference can exist save there are real states of affairs between which the difference holds, and which constitute its code. (6) The same information can be encoded in various ways — e.g, a piece of music can be encoded on magnetic tape, on a piece of paper, on compact disc, etc. Different codes encode the same information if the involved collections are isomorphic.

Separate information, or information "in itself", has no immanent meaning. It's just detected difference, full stop. Yet, information apparently is about something. On the other hand, informational systems can detect only their own states, not states in their environment. Yet information can serve as a reliable basis for adequate, succesful behavior in that environment. How can this be? To what does information owe its significance and usefulness? Two factors seem relevant here. First, for an organism to be able to collect information about its environment, something in that environment must act upon it, affecting its surface receptors. Second, for information to be a reliable guiding factor for an organism's behavior, the environmental impact has to be deterministic (same cause — same effect on repeated trials). Otherwise the information collected would be useless. It would be mere noise, not information.

Fortunately for organisms, the natural world is deterministic enough for information to be sufficiently unambiguous. (7)

In such deterministic circumstances a kind of unequivocal correspondence between the state of the affected animal's receptors and various aspects of the environment obtains. Now, whenever there is some more or less invariant or rigid correspondence between some A and B, B can be said to represent A, and can thus serve as a characteristic or token of that A. It is therefore some objectively occurring correspondence which grants information its representational character, thus making it information about something. And insofar as a detected difference represents some external state of affairs, information has cognitive content. It constitutes a bridge-head on which behavior can rely.

The relation of representation is transitive: if A represents B and B represents C, then A represents C as well. Starting from the receptor/environment interface, therefore, the relation of representation can extend in both directions, into external and internal codes. It does not matter much, then, what kind of and how many transformations of the codes occur, save that they are all deterministic enough. Due to this, information can be transduced, transformed, or processed while still representing the same thing.

The term "difference" needs important further qualification. Difference is a two-place predicate, always holding between some X and Y. Now, depending on the way each of the two members X and Y is present during detection, difference can be of two types — actual and virtual. Difference is actual if what is being detected is a juxtaposition of concurrent states — say, when we see a multicolored surface. Here both X (e.g. "blue") and Y (e.g. "white") are present, both are realized in the visual system, and the difference between them can be said to be "transverse". Difference is virtual if what is being detected is a selection of one from a number of possible states belonging to the system's alphabet. This can happen when, say, we see just blue sky and nothing else. There is no actual differentiation in the perceptual field then — only X is actually present. Yet the visual system does get some information: that what is seen is blue and not red or white. The process of discrimination is here set against certain background states, and the difference is thus "longitudinal", not transverse.

It is important to see that in such "selection-type" cases there really is some difference involved. Consider the measurement of temperature. What is the information here? Is it the height of the mercury column in the thermometer? No, that would be just code, not information. What is information is the virtual difference between the present reading and some conventionally established standard value, e.g. zero-level, or at least a difference between the present reading and other possible readings. If the thermometer was not graduated we would either get no information about temperature at all, or we would have to wait until the mercury climbs or drops to indicate that it is getting warmer or colder.

Now, although differences in such cases are virtual only, the information itself is not. It is just another type of information — indeed, the most elementary one. I will subsequently call it parainformation (by contrast, the hitherto discussed "juxtaposition-type" information will be called structural information).

There are many systems which can deal with parainformation only. Let us examine some of them briefly.

Photocell-operated systems (doors, elevators, etc.) discriminate just two states — "open electrical circuit" and "closed circuit" — disjunctively present. And (para)information occurs whenever one state changes into another.

In homeostatic devices the alphabet also consists of two states, e.g., "above some preset value" and "below it", as in the case of a thermostat. Parainformation is the virtual difference between the actual temperature and the preset one.

Biological cells can also handle situations where only parainformation is available. Let us consider what is called a "genetic code". The alphabet consists here of four nucleotides (commonly abbreviated as C,A,G,U) which can be discriminated by some enzyme. The code here is any linear sequence of the nucleotides in a DNA or RNA chain. At that structural level no information whatever is present. Information enters the scene only when the double helix splits and the enzyme polimeraze detects which one of the four nucleotides occurs at a particular place in the chain, and then adds to it a complementary one (the processes of replication and transcription). What is involved is thus parainformation, for no concurrent states are detected.

What about translation, where we are told that most of the possible 64 triplets of nucleotides in the messenger-RNA (codons), created during transcription of the DNA's structure, "encode" one of the 20 amino acids necessary to build any protein? At first it may seem that what must be detected here are differences between concomitant states, i.e., between particular nucleotides within a triplet (say A/C/G or G/G/C). A closer examination of the mechanisms involved during codon detection reveals, however, that this is not the case. The particular codon is detected by the complementary triplet (anticodon) in the transfer-RNA, which then adds a specific associated amino acid to the polipeptide under construction. What follows is that there is no detection of differences between particular nucleotides in a codon, but only detection of virtual differences between different codons as wholes (that is, selection-type differences). This is, therefore, also a case of parainformation.

Finally, consider computers. In computers the alphabet consists of just two states (say, 0V and 5V, usually referred to as 0 and 1, the so called "bits"). For simplicity's sake, let us limit ourselves to consideration of 8-bit computers. Then, the code is any 8-bit sequence of bits (a "byte"), and the repertoire consists of 256 possible bytes. What is the information involved here? Is it the differences between zeros and ones in a byte? No; bytes are the smallest operational units here, for they stand for elementary parts of programming languages and data (letters, numbers, punctuation marks). It would be silly, then, to independently process bits within a byte — standing, say, for the letter "a". Differences detected in computers are therefore not, say, 11/0/1/00/1/0, but are rather either between subsequent bytes (say, 11101001/11111101) — that is, of juxtaposition-type — or virtual, selection-type differences referring back to the repertoire (otherwise, if two subsequent bytes were identical, as is often the case, there would be no information for the computer, which is not the case). In other words, there is both parainformation and structural information in computers.

Parainformation is the elemental, primordial type of information, for it is the simplest one and it is the building block of any other type of information. There can be no structural information without underlying parainformation. (8) We can therefore say in general that structural information is composed of pieces of parainformation. This suggests the concept of orders of information. Code can be said to be zero-order information, for it represents only a potential to become information. Consequently, parainformation is first-order information, composed of selections from the code; and structural information is second-order information, for juxtaposed members of appropriate differences are themselves pieces of parainformation.

The question arises whether there is higher-order information. The answer is "yes" — it is what I call metainformation.

Due to the fact that information is a collection of differences, i.e. a set in the collective sense (and therefore any subset of such a collection is its element, which does not hold for sets regarded in the distributive sense), it is an additive entity: a sum of information is still information. (9) Separate pieces of information can thus merge to create more information — e.g., when we slowly raise our eyelids allowing more and more data to be received.

But there is another possible case, where combined pieces of information do not extensively merge into a bigger whole, but instead become associated while keeping separate. Notice that all the above presented cases concerned information as currently received from receptors, and then processed. There is, however, another important possibility beyond this, namely, when a system adds previously received information from its own resources (memory) to the current inflow of receptor-based information. If, say, my auditory system detects a series of phonemes "c-a-t", it is information — a collection of differences between adjacent pitches. But if hearing this causes me to imagine a four-legged, furry creature with vibrissae, it is also a kind of information, although my sensory apparatus detected no such creature in its perceptual field. It is added to what is detected by the sense organs. That newly formed, resultant information is an association of separate pieces of information. (10) I call this type of information metainformation because 1/ it is a collection of collections, and 2/ because it "comes after".

The three types of information — parainformation, structural information and metainformation — emerged stepwise during the evolutionary process, which started with organisms able to handle parainformation only (cells, multicellular organisms, plants, primitive animals), then developed creatures able to deal with structural information (more advanced animals like insects, with central nervous system but without plastic memory yet), and finally produced creatures capable of making use of metainformation (of which a prerequisite is having a RAM-type memory).

With these three types of information at hand, we can explain and understand the operation and behavior of all intelligent systems — from a single cell to human beings and computers.

Notes

(1) To become information some objectively occurring difference must be detected by some system. To be able to do that, the system must have detectors. Consider visual system. Its receptors — the rods and cones in the retina — do not detect differences yet. They just react physically to falling light. Detection occurs only at the next stage — in bipolar and ganglion cells. These can detect differences in light intensity or wavelength due to their ingeniously arranged receptive fields, which consist of two parts — usually center and surround — reacting in opposite ways to the same stimulus: if one produces excitation, the other causes inhibition.

(2) If, say, we have a triplet [0,1,2] (where the figures can stand for any physical state or property), then its collections can be not only, say, [10] or [21], but also [11021], etc. (omitted commas indicate that the subsets are taken in the collective sense). Any sequence of nucleotides in RNA — say UGCACGUUA — can be another example of what I mean by a collection.

(3) It is important that both conditions (realization and discrimination) occur. Consider the eardrum. In humans, the auditory system can discriminate frequencies anywhere within a range of 20-20.000Hz. This does not mean, however, that the alphabet consists here of an indefinite number of elements, for the system can hardly discriminate tones separated by less than 2Hz. On the other hand, although acoustic waves at a frequency below the 20Hz threshold do cause the eardrum to oscillate and thus do produce a change of physical state within the auditory system, they are not themselves elements of the alphabet, because they are not discriminated by the auditory nerves.

(4) The process of (longitudinal) transformation, converting a given code into another one within the system or outside it, is encoding. Encoding can consist in, say, the transformation of mechanical oscillations of a microphone's membrane into a train of electronic impulses, tape magnetizations, eardrum oscillations, etc.

(5) If an alphabet consists of, say, three colors — red(r), green(g) and blue(b) — with no mixing allowed, then the repertoire will comprise such elements as r/b, r/g, g/b, but also br/gb, rgb/rbg, and so on, where "/" stands for a difference.

(6) Information is always somehow encoded, there is no such thing as pure, nonembodied information. In consequence, whenever information occurs, there must be a certain underlying matter-energy flow, i.e. a certain physical process involved.

(7) For instance, the wavefront of light impinging upon the retinae is, because of physical laws, homomorphic with the geometry and texture of the surfaces from which it is reflected; ethereal substances affecting an animal's nostrils are characteristic of particular bodies in the environment; air-pressure waves produced by animals or by natural phenomena are fairly specific to them; and so on.

(8) We cannot perceive any visual scene — consisting of certain patterns or gestalts — without seeing constituent colors. We cannot hear a melody without discriminating single pitches. And so on.

(9) This, by the way, explains why information can be processed, e.g. integrated.

(10) The structure thus formed is, formally speaking, a specific form of collection — a graph.