(Following is an attempt to capture some of my thoughts following an interesting student discussion. I think I mostly failed during that discussion, but it was fun.)

How do we know when two sets and have the same “size”? The layman answer is, “When they both have the same number of elements,” or even better, “When we can ‘pair up’ elements from and , with each element a part of exactly one pair.”

The mathematician formalizes this idea as a bijection, that is, a function that is both:

- Injective, or “one-to-one”: if , then . Intuitively, distinct elements of map to distinct elements of .
- Surjective, or “onto”: for every element , there exists some such that . Intuitively, we “cover” all of .

Bijections are handy tools in combinatorics, since they provide a way to count elements in , which might be difficult to do directly, by instead counting elements in , which might be easier.

So far, so good. Now suppose that a student asks *why* injectivity and surjectivity are *the* critical properties of functions that “pair up” elements of two sets?

This question struck a chord with me, because I remember struggling with it myself while learning introductory algebra. The *ideas* behind these two properties seemed vaguely “dual” to me, but the structure of the actual *definitions* looked very different. (I remember thinking that, if anything, the properties that most closely “paired” with one another were *injective* and *well-defined* (i.e., if , then ; compare this with the above definition of an injection). Granted, well-definedness might seem obvious most of the time, but sometimes an object can have many names.)

One way to answer such a question is to provide alternative definitions for these properties, that are equivalent to the originals, but where that elegance of duality is more explicit:

- A function is injective if and only if, for all functions , implies .
- A function is surjective if and only if, for all functions , implies .

It’s a great exercise to show that these new definitions are indeed equivalent to the earlier ones. But if that’s the case, and if these are more pleasing to look at, then why don’t we *always* use/teach them instead?

I think there are at least two practical explanations for this. First, these definitions are *inconvenient*. That is, to use them to show that is a bijection, we end up having to drag a *third* (and fourth) arbitrary hypothetical set (and ) into the mix, with corresponding arbitrary hypothetical functions to compose with . With the usual definitions, we can focus all of our attention solely on the two sets under consideration.

Second, these new definitions can be *dangerous*. Often the function we are trying to show is a bijection preserves some useful structural relationship between the sets and (e.g., may be a homomorphism)… but to use the new definitions, we must consider *all possible* additional functions , not just that restricted class of functions that behaves similarly to . We have strayed into category theory, where monomorphisms and epimorphisms generalize the notions of injections and surjections, respectively. I say “generalize,” because although for sets, these definitions are equivalent, in other categories (rings, for example), things get more complicated, where a monomorphism may not be injective, or (more frequently) an epimorphism may not be surjective.

I’m curious, have you ever come across a nice visual illustration of a bijection, and the distinction between injective and surjective? For me, wordy definitions like these often “click” once I see a picture.

There are at least a couple of different ways to visualize a function f:X->Y, and the properties of injective/surjective/bijective. First, treating X and Y as arbitrary sets (as opposed to, say, real numbers), we can view f as a (directed) bipartite graph, with elements of X in the “left” partite set, and elements of Y on the right, with arrows mapping elements x to elements f(x). The Wikipedia page on injections has several examples of these.

First, *any* function f must have *exactly one* arrow *leaving* every element of X. Furthermore, if f is injective, then no two arrows can “meet” at the same element of Y. Finally, if f is surjective, then there must be at least one arrow *head* to every element of Y.

Another way to visualize what’s happening is to focus on the case where the sets X and Y are both R, the set of real numbers. Then a function f:X->Y can be visualized by drawing a graph in the x-y plane like we did in high school; each point on the graph corresponds to one “arrow” in the bipartite graph visualization described above, mapping some x (on the horizontal axis) to its corresponding y=f(x) on the vertical axis. (For example, f(x)=x, or f(x)=x^2, are both functions.)

Remember the “vertical line test” from high school? That’s testing that the function f is “well-defined” (see the definition in the original post)– each element on the x-axis can’t map to more than one element on the y-axis.

An injection, on the other hand, passes the *horizontal* line test: two different x-values can’t map to the *same* y-value. (f(x)=x is an injection, but f(x)=x^2 is not.) And finally, a surjection “covers/spans” all of the y-axis; i.e., there is a point on the graph for every given y-value. (Again, f(x)=x is a surjection, but f(x)=x^2 is not.)