More than most fields, mathematicians define themselves by their discipline rather than their topic. A proposition is not considered to be mathematically known until it has been proven. Since people employed as academics are judged by the production of new knowledge, mathematicians see their business as proving things. Therefore, mathematics has come to be seen not as the study of numbers, but as the study of things which can be proven. This leads to the curious result that a person who spent their time learning all about numbers, on reading facts about numbers asserted by others, and on thinking about numbers, but not on proving anything or on reading others' proofs, would not be considered a mathematician by many mathematicians, even though most other human beings would call em a mathematician (and according to most dictionary definitions, such a person would be a mathematician).

It should be noted that the identification of mathematics with things that can be proven, where "proven" is identified with "an irrefutable argument", would be objected to by some philosophers, who believe that irrefutable arguments exist about certain philosophical topics which are not in the form of math-style proofs (perhaps "I think therefore I am" is one, or more to the point "You think, therefore you are"; philosophically speaking, it is clear to anyone able to understand that sentence that they think, therefore it follows that they exist; but by the standards of a mathematical proof, no justification has been provided for the premise "you think", and so "you are" has only been proven contingent on the acceptance of the axiom "you think").

In mathematics there are "existence proofs". The name is misleading; an existence proof is really just a proof that some construct could POTENTIALLY exist in the sense that its definition is not inherently contradictory (not a proof of ACTUAL existence in the sense that, e.g., a human with your name and age exists in your town this year). In order words, it is a proof that NOT(it is necessarily true that the construct does not exist).

in mathematics often names for things are 'stolen' from ordinary English and then used to mean something very technical and specific. I say the names are 'stolen' because they are used to mean something very different from the ordinary English meaning. I mention this to save you from wasting time trying to find a substantial connection between mathematical definitions and the ordinary English meanings of their names; typically the connection is so slight so as to be almost irrelevant.

For example, the names 'group' and 'set' both mean something very specific (and different from each other) in math. These technical meanings are NOT derived from some close analysis of the exact difference between what 'group' and 'set' mean in English; rather, in English, both of these words mean collections of objects, and in math, at one time, someone (Galois) came up with some very technical thing that needed a name, and it was a special type of collection of object, so he stole the name 'group' for his concept, and then later on someone else came up with some other very technical thing that needed a name, and it was also a special type of collection of object, and perhaps they knew that 'group' was already claimed, so they stole the name 'set' for their concept. If you took someone who knew English but no math and taught them the definitions of these two concepts and then asked which should be called 'group' and which should be called 'set', they wouldn't reliably be able to tell you.

Similarly we have 'normal matrices' and 'normal subgroups'. The definitions of these technical concepts are only tangentially related to the English definition of 'normal', and you probably would not be able to derive the mathematical definitions of these just from knowing all about matrices and subgroups and also knowing the ordinary English definition of the word 'normal'. Nor would you gain great insight into the concept of 'normalacy' in the ordinary English sense from pondering the mathematics of normal matrices and normal subgroups.

Addition, its inverse subtraction, multiplication, and its inverse division are called elementary arithmetic. If you start with positive integers and only apply addition and multiplication, you will always get positive integers back. If you start with non-negative integers, you will always get non-negative integers back. If you allow subtraction, you may get negative integer outputs, too. If you allow division, you may get rational number outputs (fractions), and you also have the tricky case of division-by-zero, which is usually thought of as 'undefined' or an illegal operation, but in some contexts is defined to be infinity, with the sign of the infinity being the sign of the numerator (often 0/0 is still a special case here; sometimes an 'unsigned infinity' or NaN? is introduced for this) [1].

For rational or real numbers, elementary arithmetic obeys the "field axioms".

There are other things that obey the field axioms; such things are called "fields". One example is 'finite fields'; for each prime number p, there is a finite field "of order" p, which is basically the integers mod p (that is, adding p to any number gives you the same number back; adding p+1 is the same as adding 1; etc; you can see that there are really only p distinct elements in this field). There are also more complicated finite fields eg [2]. The concept of a 'field' does not include exponentiation or further hyperoperations ( https://en.wikipedia.org/wiki/Field_%28mathematics%29#Exponentiation ).

Elementary arithmetic can be axiomatized by the Peano axioms.

"Successor" is the unary operation S(x) = x + 1 (also called "increment").

You may have noticed that addition is repeated successor, multiplication is repeated addition, and exponentiation is repeated multiplication. This suggests that these three operations are really facets of the same concept, and that they could be generalized to a three-input function that can be addition, multiplication, or exponentiation in the first two arguments depending on the third argument. The best-known formalization of this idea is an operation called "hyperoperation".

Some important arithmetic functions are:

- deriving from hyperoperation:
- successor, addition, multiplication, exponentiation (hyperoperations)
- decrement, subtraction, division, logarithm (inverses of hyperoperations)
- "exp" and "ln" are just exponentiation and logarithm with base e ('e' is a constant called Euler's number, about 2.72)
- nth-root ('nroot' or 'rootn') and its special case 'sqrt' (where n=2). Since exponentiation is not symmetric in its arguments, logarithm is one of its inverses, and nth-root is the other one; nth-root can also be defined as exponentiation to a fraction, eg x^(1/2) = sqrt(x) .

deriving from positive vs negative integers:

- abs (absolute value): make the number positive (or at least non-negative): abs(0) = 0, abs(x) = x when x >= 0, and abs(x) = -x when x < 0. You can think of this as isolating the 'magnitude' of the number from its sign.
- sign: sign(x) is -1 when x < -0, 0 when x = 0, 1 when x > 0

deriving from trying to do division with integers instead of reals:

- modulus/remainder ('mod'/'rem'): "what is left over" after you try to do division with integers (instead of reals), eg 13 / 5 = 2.6, but if you confine yourself to integer answers, you can say 13 / 5 = 2 with 3 left over (that is, 5*2 + 3 = 13). Often the 'integer division' operation here is called 'div' or 'quotient', so div(13,5) = 2 and mod(13,5) = 3. Mod and remainder are identical on positive numbers but differ on negatives: mod(-21,4) = 3 but rem(-21,4) = -1, because 21 + 4*6 = 3 but 4*-5 - 1 = -21; so 'rem' is trying to first do 'div' to find the least multiple of the second input that doesn't overshoot the magnitude of the first (so -5 is the greatest-magnitude multiple of 4 that gets close to -20 without overshooting it in magnitude), whereas 'mod' is imagining that the integers are like steps around a circle, where the number of steps required to go all the way around is the second input (so if there are 4 steps around the circle, then after starting at -21 and going around the circle 6 times (4*6 steps), you end up at position 3).

deriving from converting reals to integers:

- floor, ceil: the greatest integer less than the input, the least integer greater than the input; eg floor(5.7) = floor(5.1) = 5, floor(-5.1) = -6, ceil(5.1) = 6.
- round: the nearest integer to the input; eg round(5.1) = 5, round(5.7) = 6, round(-5.1) = -5, round(-5.7) = -6. When the real number input is exactly halfway in between two integers (ie it ends in '.5' eg 3.5), there must be a 'tiebreaking' convention; there is no consensus on which tiebreaking convention to use, some examples are: upon tie, use ceil; upon tie, use floor; upon tie, round towards zero; upon tie, round away from zero; upon tie, round towards nearest even number (the dual 'towards nearest odd number' is rarely used).

You also hear about the 'reciprocal' function, which is just a special case of division: reciprocal(x) = 1/x.

There are also the inequality operators, which can be thought of as Boolean functions, that is, functions from inputs to an output which is always either "TRUE" or "FALSE":

- x < y, x <= y, x == y, x >= y, x > y

The main "trig" (trigonometry) functions are:

- sin
- cos
- tan

These relate the angles in triangles to the ratios of the lengths of the sides of right triangles:

- sin: opposite/hypotenuse
- cos: adjacent/hypotenuse
- tan: opposite/adjacent

That is, the input to these function is an angle, and the output is the indicated ratio. This won't make much sense without the diagram: https://en.wikipedia.org/wiki/Trigonometric_functions#/media/File:Trigonometry_triangle.svg

The input, which are angles, can be in units of degrees (360 degrees go all the way around the circle), or radians (2*pi radians go all the way around the circle).

A mnemonic to help remember which is which is: SOHCAHTOA, which doesn't look very memorable until you read it aloud; it is pronounced SO-KUH-TOE-UH: Sin Opposite over Hypotenuse, Cos Adjacent over Hypotenuse, Tan Opposite over Adjacent.

Another way to think about these is that if you start at the rightmost point of a unit circle (which is point (1,0)) and proceed to trace the circle counterclockwise by the input angle, then the point you end up at has a horizontal coordinate of cos(angle) and a vertical coordinate of sin(angle) (note that cos(0) = 1 and sin(0) = 0). This can be seen by drawing right triangles inside a unit circle: https://en.wikipedia.org/wiki/Trigonometric_functions#/media/File:Webysther_20160305_-_Ciclo_trigonom%C3%A9trico.svg (see also https://www.khanacademy.org/math/trigonometry/unit-circle-trig-func/unit-circle-definition-of-trig-functions/v/unit-circle-definition-of-trig-functions-1 ).

tan(a) is equal to sin(a)/cos(a) (b/c (opposite/hypotenuse)/(adjacent/hypotenuse) = opposite/adjacent), so it's the ratio of the vertical coord to the horizontal coord (note that this ratio goes to infinity at +-pi/2, or +-90 degrees, where the horizontal coordinate is zero). In the unit circle interpretation, tan is the length of a certain tangent vertical line segment, see https://upload.wikimedia.org/wikipedia/commons/c/c7/Tangent-unit-circle.svg and https://www.youtube.com/watch?v=6-tsSJx-P18 .

There are also the inverses of these: asin, acos, atan (which are the same as arcsin, arccos, arctan). Because there are multiple angle inputs that map to the same output with sin, cos, or tan, a convention is needed

Sometimes you also hear about sec (secant), csc (cosecant), cot (cotangent). These are just fancy names for the reciprocals of cos, sin, tan.

https://en.m.wikipedia.org/wiki/Algebraic_number#Examples

todo

define monic polynomial in math chapter

http://mathworld.wolfram.com/ConstructibleNumber.html

hierarchy of reals, closed form, algebraic, transcendental (also define rational, irrational) etc

todo

sub-(whatever; eg subset, subgroup), direct product, homomorphism

relations note: a set can be composed with a relation to get another set

functions

arity

infix notation for binary functions

boolean functions

sets, characteristic functions (mb this is not exactly universal algebra)

injective, surjective, bijective

image

inverse

invertible

inverse image

monotone functions

equivalence relation

equivalence classes

total order

partial order

hasse diagram

preorder

graph

DAG

tree

isomorphism

automorphism

homomorphism

quotient

endomorphism

embedding

inclusion maps

cover

direct products

semilattices

lattices

galois connection

groups

Group theory could have arose (i'm not saying this is how it actually happened, historically) from noticing some properties of addition and multiplication (except that for multiplication to be considered a group, zero must be omitted) and abstracting those properties; then noticing that the same properties are also useful for describing the compositions of permutation operations, and the compositions of rotation and reflection operations. Note that often it makes more sense to think of groups as fundamentally describing the composition of permutation operations, instead of thinking of them as generalizations of addition and multiplication.

For historical reasons, commutative groups are usually called 'Abelian' instead of commutative. I'll say commutative here.

cyclic groups

the symmetric groups

fields

tarski/dedekind's high school algebra axioms (and the tarski high school algebra problem)

example of noncommutative yet associative operations: rotations, string concatenation

example of nonassociative operation: subtraction

example of commutative yet nonassociative operation: "Consider the set of all points on a line (or a plane, or 3-dimensional space --- it won't matter for the example) and let the operation send each pair (P,Q) of points to the midpoint of the segment PQ" -- http://math.stackexchange.com/questions/327422/is-all-algebraic-commutative-operation-always-associative

https://en.wikipedia.org/wiki/Commutative_non-associative_magmas : consider the game rock paper scissors; let's have r,p,s and when you apply 'op' (represented here by infix '*') to two of them, you get the one that 'wins', eg r*p = p*r = p, because paper beats rock. * is commutative but non-associative; r*(p*s) = r != s = (r*p)*s

this table of (the names given to) generalizations of groups (by dropping various group axioms) is useful: [3]

rings and fields: one of the applications of groups is the characterization of addition and multiplication. By adding in more properties of addition and multiplication, we get rings, and from adding in even more, we get fields. Note that not too much intuition can be drawn from the choice of names 'group', 'ring', 'field'; don't try to visualize a ring as a ring on your finger, for example; just imagine that these are arbitrary words used to describe a bunch of things with attached operation(s) which obey certain axioms.

A ring is a set with two binary operations, call them + and * (or addition and multiplication), with the following axioms:

- the ring is a commutative group under +
- the operation * is associative, and it has an identity (btw, structures meeting these two conditions are called 'monoids')
- the operation * is distributive with respect to addition:
- a * (b + c) = a*b + a*c
- (b + c) * a = b*a + c*a

Note that some authors don't have the condition that * has an identity; but other authors call that a 'rng' and reserve 'ring' for the structure given here (the one with the multiplicative identity requirement). A 'commutative ring' means a ring in which the * operation is commutative (the + operation is always commutative); but some books just call commutative rings 'rings'.

A field is a commutative ring such that every element, except zero, has a multiplicative inverse. Note: the 'except zero' there is what prevents us just from saying that addition and multiplication are just two groups on the same set of elements.

An example of an abstract field which feels pretty different from ordinary addition and multiplication over numbers is the finite field with four elements called "GF(4)":

Addition +

0 1 A B |

0

1A B0 1 A B |

1 0 B A |

A B 0 1 |

B A 1 0 |

Multiplication

0 1 A B |

0

1A B0 0 0 0 |

0 1 A B |

0 A B 1 |

0 B 1 A |

Note that when restricted to the subset {0,1}, this field works like boolean logic, where + is XOR and * is AND. Note that, counterintuitively, in this field you can't start with 0 and successively add 1 to generate all of the elements; 0+1 = 1 but 1+1 = 0, so you never get to A or B.

(from [4])

To summarize, groups, rings, and fields are about operators over some set of objects and the properties of these operators. A group is the most general and a field is the most specific, and a ring is in the middle.

Continuing to summarize, a group is a bunch of objects with one binary operation, 'multiplication', such that the operation is associative and invertible and one object is the multiplicative identity (there's also the closure/totality axiom but i'll skip over that in this paragraph). A field is a bunch of objects with two binary operations, 'addition' and 'multiplication', such that addition is a commutative group, multiplication is a commutative group except that it's not fully invertible because zero doesn't have a multiplicative inverse, and we have distributivity of multiplication over addition. A ring is somewhere in the middle; we have two operations, addition and multiplication, and addition is a commutative group, and we have distributivity, but multiplication need not be commutative, nor have any multiplicative inverses.

Matrices are rectangular arrays of numbers. But if that's ALL they were, then why is matrix multiplication this weird thing, instead of just multiplying the corresponding numbers together (elementwise multiplication)? The answer is that, really, 'arrays of numbers' is a data representation that is more general than matrices. Matrices just happen to be one thing that can be represented using rectangular arrays. In math, matrices are fundamentally the same as linear functions. There is a theorem that any linear function can be represented as a 2-d matrix (todo).

(todo define linear functions)

Linear functions (=linear transformations) have been shown to be nothing more than combinations of rotations and stretchings (and creation or deletion of dimensions, but you can think of that as an extreme version of stretching) (todo)

The determinant is the formula for measuring how much area/volume/etc the image of a unit square has after the transformation has been applied to it. If one dimension was deleted, then the transformed unit square has 'zero' area/etc in the image, in the same way that the area of a rectangle is non-zero but the area of a line segment is zero. This is why zero determinants are useful for detecting deletion of dimensions.

Eg a linear transformation with no stretching. The determinant of an isometry is either 1 or -1.

Isometries correspond to 'orthonormal' matrices; an orthonormal matrix is one whose rows and columns are both unit vectors and orthogonal to each other.

Orthonormal matrices have the properties that:

Q^T Q = Q Q^T = I

or in other words,

Q^T = Q^{-1}

The complex analog of 'orthogonal matrix' is 'unitary matrix'. Like orthonormal matrices, unitary matrices represent isometries; the absolute value of their determinant is 1; and they preserve norms, that is, for any two complex vectors, if you multiply each of them by the same unitary matrix and then take their inner product, you get the same result as if you had taken the inner product directly.

http://matrixmultiplication.xyz/

A pair of random high-dimensional vectors are likely to be almost orthogonal.

Unlike in 3-D, most of the volume in a hypercube is near the 'corners'.

Similarly, if instances of some class vary along very many dimensions, it is likely that the vast majority of instances are outliers along at least one dimension.

Covering a high-dimensional hypercube with a grid of points requires exponentially many grid points (in the dimension of the space) -- it is usually infeasible to sample all of these points.

A small set of points in a high-dimensional space can be embedded into a space of much lower dimension in such a way that distances between the points are nearly preserved (the Johnson–Lindenstrauss? lemma [5]).

These sorts of facts can make dealing with high-dimensional spaces unintuitive. They can also be put to use, eg the Random Projection method of dimensionality reduction, the [6] of signal reconstruction for signals known to be sparse, or random construction of LDPC error correcting codes.

"Homotopy of two continuous mappings f,g: X -> Y. A formalization of the intuitive idea of deformability of one mapping into another." -- http://www.encyclopediaofmath.org/index.php/Homotopy

"In topology, two continuous functions from one topological space to another are called homotopic if one can be "continuously deformed" into the other, such a deformation being called a homotopy between the two functions.

"The two dashed paths shown above are homotopic relative to their endpoints. The animation represents one possible homotopy." -- http://en.wikipedia.org/wiki/Homotopy

"More exactly, two mappings and are called homotopic (denoted by ) if there exists a family of continuous mappings f_t : X -> Y, continuously depending on a parameter t \in [0,1], such that f_0 = f, and f_1 = g." -- http://www.encyclopediaofmath.org/index.php/Homotopy

"Consider a world where objects are made of elastic rubber. Two objects are considered equivalent if they can be deformed into each other without tearing the material. If such a transformation between X and Y exists, we say they are topologically equivalent..." -- http://www.ayasdi.com/_downloads/Topological_Analysis_of_Population_Activity_in_Visual_Cortex.pdf

"...it is evident that a possible reason for two objects not to be equivalent is that they differ in the number of holes. Thus, simply counting holes can provide a signature for the object at hand. Holes can exist in different dimensions. A one-dimensional hole is exposed when a one-dimensional loop (a closed curve) on the object cannot be deformed into a single point without tearing the loop. If two such loops can be deformed into one another they define the same hole, which should be counted only once. Analogous definitions can be invoked in higher dimensions. For example, a two-dimensional hole is revealed when a closed two-dimensional oriented surface on the object cannot be deformed into a single point.

This notion of counting holes of different dimensions is formalized by the definition of Betti numbers. The Betti numbers of an object X can be arranged in a sequence, b ( X )=( b 0 , b 1 , b 2 , I ), where b 0 represents the number of connected components, b 1 represents the number of one- dimensional holes, b 2 the number of two-dimensional holes, and so forth. An important property of Betti sequences is that if two objects are topologically equiv- alent (they can be deformed into each other) they share the same Betti sequence. One must note, as we will shortly illustrate, that the reverse is not always true: two objects can be different but have the same Betti sequence. " -- http://www.ayasdi.com/_downloads/Topological_Analysis_of_Population_Activity_in_Visual_Cortex.pdf

total function vs (non-total) partial function

computable vs enumerable vs not enumerable

primitive recursion, mu-recursion/turing machines/lambda calc/etc

universality, interpreters

various machine models, polynomial time emulators

halting problem

Peano arithmetic is sometimes taken to be the definition of the natural numbers. The second-order formulation of Peano arithmetic is quite simple and intuitive; see [7].

Peano arithmetic is incomplete (Gödel's First Incompleteness Theorem (1931)) (there are statements about the natural numbers that are true, but that are unprovable from the Peano axioms; in fact, there is no decidable set of axioms that can lead to proofs of all and only the true statements about the natural numbers (so-called true arithmetic)).

Peano arithmetic was first given in a second-order form (with quantification over sets) with finite axioms, but later was reformulated into a first-order form (also called an 'elementary' form; "In mathematical logic, an elementary theory is one that involves axioms using only finitary first-order logic, without reference to set theory or using any axioms which have consistency strength equal to set theory." -- [8] ) at the cost of (a) having an infinite number of axioms (an axiom schema) instead of a finite number, and (b) some basic properties of addition and multiplication must be included in the first-order axioms (in second-order Peano arithmetic, these properties could be deduced [9]), (c) the first-order ('elementary') Peano arithmetic has non-standard models (the second-order Peano arithmetic has only one model, up to isomorphism).

The restriction of (first-order, I assume?) Peano arithmetic to only addition (no multiplication) is decidable, and similarly the restriction to just multiplication (no addition) is decidable. [10].

Links:

- https://en.wikipedia.org/wiki/Peano_axioms
- Relationships between Constructive, Predicative and Classical Systems of Analysis by Solomon Feferman says "The work described here shows that, as least as far as currently ((scientifically)) applicable mathematics is concerned, we do not need to go beyond systems of strength PA ."
- Section 1.2, subsection The Elementary Theory of Peano Arithmetic, page 19, PDF page 23

Note: is this system different from and stronger than the second-order formulation of Peano arithmetic? it seems to me, yes, because of comprehension axioms, but i'm not sure; [11] makes it sound like Z_2 is just second-order Peano.

Links:

modus ponens/syllogism

direct proof

proof by cases

proof by contrapositive

proof by contradiction

proof by cancellation

proof by symmetry

perhaps see also http://www.people.vcu.edu/~rhammack/BookOfProof/BookOfProof.pdf , http://mail.csufresno.edu/~mnogin/math145fall05/math145fall05-book.pdf for more

how to prove the nonexistence of something: show that every possibility has a set of properties that together preclude them from being the target contradiction diagonalization

(algebra)

didn't we already do this above?

over well-ordered sets

well-founded ness? see https://en.wikipedia.org/wiki/Mathematical_induction#Heuristic_justification vs https://en.wikipedia.org/wiki/Transfinite_induction

complete/strong induction

and extending it even to uncountables

and, or, not nand/nor

refer to http://boole.stanford.edu/cs353/handouts/book3.pdf for Post's theorem: (Post) A necessary and sufficient condition that a set of operations form a basis for the nonzeroary Boolean operations is that it contain operations that are respectively nonaffine, nonmonotone, nonstrict, noncostrict, and nonselfdual

Category theory can be seen as arising from (at least) 3 applications:

- theorems concerning directed graphs, where 'paths' between edges are explicitly represented, and furthermore, in a category, every node must have a self-edge, and every path between nodes must be represented as an explicit edge in between them, and there is a 'composition' operator that takes two edges, an edge from node A to node B and an edge from node B to node C, and returns the edge representing this path.
- a representation of homomorphisms between structures as a graph, where the structures are nodes and the homomorphisms are edges
- a way to look at the foundations of mathematics that is based on functions rather than on sets

todo

constructions of the real numbers

continuity

what else?

derivatives

integrals

defining curves and shapes according to how the slope of the path that is traced out if the shape were drawn one point at a time (eg you can specify a ramp as 'put your pencil on the paper, go straight to the right for awhile, then the path turns upwards so that for every unit of space traversed horizontally, the path rises two units of space vertically')

formulas relating the amount of change in one variable to the amount of change in another

many applications of calculus to dynamics involve looking at prcoesses via their 'shapes' if time were a spatial axis; for instance, if you throw a ball vertically into the air, and then graph it's position on the vertical axis and time on the horizontal, then it'll trace out a slanted line, and the velocity of the ball is the slope of the line.

Linear algebra is about:

- transformations that involve only stretching/shrinking and rotation
- linear transformations, defined as: transformations that distribute over scalar addition and multiplication (eg f(x + y) = f(x) + f(y) and f(kx) = kf(x)
- vectors and matrices
- the concept of 'direction' and coordinate axes, eg North vs East, vertical vs. horizontal, which direction did you throw the ball in, how many dimensions a space has

connecting these:

- vectors can represent a point in a coordinate space, but any point can represent a direction (draw an arrow beginning at the origin and pointing to the point). It turns out that any linear transformation of a vector can be written as a matrix (matrix multiplication is how you apply the transformation to an input), and any matrix multiplied by a vector transforms it linearly. Stretching/shrinking are like rescaling a coordinate axis, and rotations are like exchanging each coordinate axis for another (such that they are all still at right angles to each other)

todo

div, grad, curl

change-of-variables, hessian

Informally, a set of strategies is a Nash equilibrium if no player can do better by unilaterally changing their strategy.

Nash's Existence Theorem: if we allow mixed strategies, then every game with a finite number of players in which each player can choose from finitely many pure strategies has at least one Nash equilibrium. [12]

roots closed form solution thru quartics non-closed form solution for some quintics

taylor series expansion

todo other elementary school algebra II topics, what are they?

"A polynomial is uniquely determined by its roots ((and their multiplicities)) and its value at any single point which is not a root." [13] [14] [15]

"The fundamental theorem of algebra": "every non-constant single-variable polynomial with complex coefficients has at least one complex root." [16]

The polynomial "factor theorem": "a polynomial f(x) has a factor (x - k) if and only if f(k)=0 (i.e. k is a root)" [17]

A polynomial p of degree n "has n roots in the complex plane, if they are counted with their multiplicities." [18]

"that two polynomials of the same degree n with n+1 points in common are the same." [19]

logs

the approximation for log(1+x) for small x

properties of e

intution behind e^(i*pi) = -1

Recursion theory is an old-fashioned name for computability theory. It deals with questions like "what is computable?". It is the branch of mathematics that sprang up around Church and Turing's initial work. Although it deals with computation, it has a logic-y flavor, rather than a computer-sciencey-flavor.

A question is "computable" if you can write a computer program that answers the question (within finite time).

The definition of "if you can write a computer program that answers the question" is rigorously defined in recursion theory. There are multiple, provably equivalent ways of defining this, based on how you define the "computer" and what kind of "programs" it takes; such definitions are called "models of computation". One model of computation is the Turing machine. Another is Church's lambda calculus. Anything which is provably equivalent to the Turing machine is called "Turing-equivalent". The Church-Turing hypothesis is that the formal notion of Turing-equivalence is equivalent to the informal notion of a procedure which can be precisely specified and, for any given input, either accomplished with finite resources, or which does not terminate (note: this informal phrasing is my own and may be wrong).

The "finite resources" part is confusing because, since the amount of resources needed may be different for different inputs, in general there is no upper bound on the amount of resources needed. For example, an ideal Turing machine can execute programs that use an arbitrarily large amount of memory. We say that real "computers" are "Turing machines" but technically speaking they are not, because they have finite memory. However, they are extremely similar in spirit; a Turing machine can be said to be an ideal computer with infinite memory.

Another model of computation is Herbrand-Godel's notion of "general recursion", which first defines something called "primitive recursion" and then extends it to "general recursion" (now you see why this field is named "recursion theory", because that's what some of them called computation back in the beginning.

Incidentally "primitive recursion" can generate almost any practically useful function (an example of a function it can't do is a function called Ackerman's function), but it is somewhat difficult to write programs in the style of primitive recursion (for example, you can't have 'while' loops), so most general-purpose programming languages would not fit the primitive recursive constraints.

Here's an example of something that isn't computable. The "halting problem". Given a computer program, tell me if this program will hang (get caught in an infinite loop) or terminate.

The proof that the halting problem isn't computable is that, if there were an algorithm that, given any program, would say if that program hangs, then you could write a wrapper around that algorithm that said, if the input program would hang, then terminate; but if the input program would terminate, then go into an infinite loop. Now run this program, and as input give it itself. If it computes that it will terminate, then it hangs; and if it computes that it will hang, then it terminates.

A set is computable if, given any potential element, you can compute whether or not that element is in the set.

A synonym of computable is "decidable". This sounds better when the question being answered is a yes/no question. So, the halting problem isn't decidable. A set is decidable if you can decide (using some computer program, within finite time) whether any given element belongs.

If something isn't decidable, it may still be "recursively enumerable". A set is recursively enumerable if there is some computer program which prints out every element of the set one by one (although the set may be infinitely large, as long as any given element will be printed at some finite point in time, it's recursively enumerable).

A synonym of "recursively enumerable" is "semidecidable". If someone asks you if element e is in set S, and S is not decidable, but it is semidecidable, then one thing you can do is run the program that prints out all of the elements of S. If e ever comes up, then you can answer, "yes, it's in the set", but if e never comes up, then you'll be waiting forever. Implementing this strategy in a computer program, you get a program which terminates if the input is in set S, and does not terminate if the input is not in set S.

A computer program which never hangs, no matter what input is given, is called a 'total function'. A computer program which might hang on at least one possible input is called a 'partial function'. This terminology makes sense if whenever the program hangs on some input, you say, "the output corresponding to that input is undefined". The picture is that for total functions, every input has a defined output, but for a partial functions, some inputs have no defined output; partial functions only give you defined outputs for part of the space of inputs.

If you have an 'oracle', that is, a magic box that will evaluate an uncomputable function (such as the halting problem) for you, now what else can you compute, and what is uncomputable (technically speaking, the questions and answers are usually taken to be phrased in terms of the oracle corresponding to an uncomputable (non-recursive) set of natural numbers, and for any natural number you can ask the oracle, "Is this number in your set?")? The "now what else can you compute" is called "relative computability". These sort of questions induce a hierarchy of sets called Turing degrees. A lot of work in recursion theory is results on the structure of Turing degrees. Depending on your philosophical beliefs you might find this work irrelevant.

There are attempts to redo the rest of mathematics while only allowing computable sets. These go by names like as "computable mathematics", "recursive mathematics", "computable model theory", "recursive analysis", etc.

interestingly, one NP problem is finding a proof for a given proposition, given an upper limit on proof length. Therefore, the P vs NP problem is very fundamental.

'Model checking' refers to exploring whether a quantified proposition is true or false by going thru every possible combination of assignments to bound variables and seeing if there are any falsifying assignments. This can only be done if the types of the variables have only a finite number of values (eg it would take an infinite amount of time to check every integer).

some other ppl's comments on fields of study:

" In order to go back to a time when mathematics and physics had not yet diverged, you probably have to return to the days of Euler, or even Newton. If you read William Dunham's excellent book, "The Calculus Gallery", in the very first chapter, you will find a discussion of Newton's very basic work on binomial series--something that physicists use almost every day. However, as that book progresses (it goes in chronological order), once you get to Cauchy, who worked in the early 19th century, you begin to see mathematicians turn their focus to questions that no longer address computational questions that could possibly find direct use by physicists, but instead are more or less exist only in the minds of mathematicians.

There certainly is a branch of real analysis, called "classical analysis", which tends to focus more on concrete examples of infinite series, which of course have roots in basic calculus and physics. You can in fact prove a great deal of interesting things about specific infinite series, but you will not invent algebraic geometry or abstract algebra, or be able to fully appreciate the scope of modern 20th century mathematics, if you confine yourself to studying the properties of just one concrete object. If you do, though, you will eventually find yourself studying complex analysis. Somebody who takes this route--that is, to reject the abstract flavor of 20th mathematics--can learn a great deal about mathematics that also happens to be very useful to physicists. On the other hand, you would be missing out on a great deal of fascinating connections between the abstract mathematics of the 20th century, and applications to problems in computer science.

On the other hand, if you really want to study pure mathematics, using proofs, you'll have to be somewhat patient before you can see applications to physics. These applications will come sporatically. Right away, linear algebra is an example of a very basic pure mathematics subject which is absolutely essential to physics. At about the same level is calculus and infinite series. Then, in mechanics, you will probably encounter what physicists call "analytical mechanics", which is an application of a pure mathematics subject called the "calculus of variations". Your understanding of this subject doesn't have to be very deep to start using it in physics, though. At a slightly higher level is complex analysis, which vastly improves your understanding of infinite series. The next application I can think of is quite a bit more advanced; it concerns general relativity, and could be thought of as an advanced setting for multivariable calculus, but which is inspired by the beauty of Euclidian geometry. I am talking about what used to be called "advanced calculus", but now has many different names. The key object of study is what is called a "manifold". A mathematician would call the subject "differential geometry", whereas a physicist would emphasize the use of objects called tensors. At this point, you will begin to see non-trivial mathematics being used in physics, but with the annoyance that physicists continue to rely on computations rather than complete proofs."

"Some major fields of mathematics are:

1. Algebra. This is like and unlike what you may call algebra today. It is the study of how things are built and decomposed. Indeed, it notes that many "things" can be described entirely in terms of how they are built and decomposed. It is often a good place to begin for programmers as it espouses a way of thinking about the world not dissimilar to the way we model domains in while programming. Some books include Algebra: Chapter 0 by Aluffi and Algebra by MacLane?.

2. Combinatorics. This is the study of "counting", but counting far more complex than anything meant by that word in normal usage. It is often a first field of study for teaching people how to read and speak proofs and theorems and therefore is well recommended. It is also where the subfield of graph theory (mostly) lies which makes it more readily accessible to programmers with an algorithms background. I can recommend West's Introduction to Graph Theory, but only with the caveat that it is incredibly dry and boring---you will get out of it what you put into practicing the proofs and nothing more.

3. Topology. This is the study of what it means for one thing to be "near" another. Similarly, it is the study of what it means to be "smooth". It's a somewhat more abstract topic than the others, but in modern mathematics it holds a privileged role as its theorems tend to have surprising and powerful consequences elsewhere in mathematics. I don't know any good introductory material here---perhaps Munkres' Topology.

4. Calculus and Analysis. This is the study of "smooth things". It is often the culminating point of American high school mathematics curricula because it has strong relationship with basic physics. Due to this interplay, it's a remarkably well-studied field with applications throughout applied mathematics, physics, and engineering. It is also the first "analyst's" field I've mentioned so far. Essentially, there are two broad styles of reasoning in mathematics, the "algebraicist's" and the "analyst's". Some people find that they love one much more than the other. The best intro book I know is Spivak's Calculus.

5. Set Theory. This is, on its surface, the study of "sets" which are, often, the most basic mathematical structure from which all others arise. You should study it eventually at this level to improve your mathematical fluency---it's a bit like learning colloquial English as compared to just formal English. More deeply, it is a historical account of the philosophical effort to figure out what the absolute basis of mathematics is---a study of foundations. To understand Set theory at this level is far more challenging, but instrumental for understanding some pieces of Logic. This can therefore be a very useful branch of study for the computer scientist investigating mathematics. I don't know a good introductory book, unfortunately.

6. Number Theory. This is, unlike the others above excepting "surface" Set theory, a branch which arises from studying the properties of a single, extremely interesting mathematical object: the integers. Probably the most obvious feature of this field is the idea that numbers can be decomposed into "atomic" pieces called prime numbers. That idea is studied generally in algebra, but the properties of prime numbers escape many of the general techniques. I don't know a good introductory book, unfortunately.

7. Measure Theory and Probability Theory. Measure theory is the study of the "substance" of things. It generalizes notions like length, weight, and volume letting you build and compare them in any circumstance. Furthermore, if you bound your measure, e.g. declare that "all things in the universe, together, weigh exactly 1 unit", then you get probability theory---the basis of statistics and a form of logical reasoning in its own right. I don't know a good introductory book, unfortunately.

8. Linear Algebra. A much more "applied" field than some of the others, but one that's surprisingly deep. It studies the idea of "simple" relationships between "spaces". These are tackled in general in (general) algebra, but linear algebra has vast application in the real world. It's also the most direct place to study matrices which are vastly important algebraic tools. I don't know a good introductory book, unfortunately.

9. Logic. A much more philosophical field at one end and an intensely algebraic field at the other. Logic establishes notions of "reasoning" and "judgement" and attempts to state which are "valid" for use as a mathematical language. Type Theory is closely related and is vital for the development of modern programming languages, so that might be an interesting connection. I don't know a good introductory book, unfortunately. "

---

Archimedian property

"...it is the property of having no infinitely large or infinitely small elements....An algebraic structure in which any two non-zero elements are comparable, in the sense that neither of them is infinitesimal with respect to the other, is said to be Archimedean...The Archimedean property appears in Book V of Euclid's Elements as Definition 4: Magnitudes are said to have a ratio to one another which can, when multiplied, exceed one another." -- https://en.wikipedia.org/wiki/Archimedean_property

i believe the following function is 0 at 0, is 1 at infinity, and is asymmetrically sigmoidal with the fastest rise occurring at 1, at which point the value is 1/3:

(2/3.)*(sign(x - 1)*(1-1/(1.+abs(x - 1))) + .5)

(i don't know the name of that function as i made it up; but most likely someone else has discovered it before me)

(todo: the function appears to be smooth, so find an expression for it without 'sign' and 'abs')

other sigmoidal functions: the logistic function 1/(1 + exp(-x)), the Gompertz function e^(-b*e^(-c*t)) (asymmetric), tanh, x/sqrt(1+x^2), arctan, the error function 'erf'

- normalizing to a unitless quantity
- normalize a function to [0, ?]
- normalizing range to [0,1] (ie https://en.wikipedia.org/wiki/Normalization_%28image_processing%29 ; ie 'rescaling' in https://en.wikipedia.org/wiki/Feature_scaling )
- normalizing to mean 0, std 1 (ie 'standardization' in https://en.wikipedia.org/wiki/Feature_scaling)
- normalize to a model probability distribution
- Z-score (align to a gaussian distribution)
- quantile normalization
- https://en.wikipedia.org/wiki/Histogram_equalization

- normalize to vector length of 1 (ie 'Scaling to unit length' in https://en.wikipedia.org/wiki/Feature_scaling)
- https://en.wikipedia.org/wiki/Normalization_%28statistics%29
- algebraic 'normal form's
- normalize the area under a curve to 1 (eg normalize a probability distribution function; [20] )
- todo subtractive and divisive normalization: https://www.google.com/search?client=ubuntu&channel=fs&q=subtractive+and+divisive+normalization

---

"A prime number (or a prime) is a natural number greater than 1 that has no positive divisors other than 1 and itself. A natural number greater than 1 that is not a prime number is called a composite number. For example, 5 is prime because 1 and 5 are its only positive integer factors, whereas 6 is composite because it has the divisors 2 and 3 in addition to 1 and 6." [21]

Also called "the fundamental theorem of arithmetic".

"every integer greater than 1 either is prime itself or is the product of prime numbers, and that this product is unique, up to the order of the factors.[3][4][5] For example,

1200 = 24 × 31 × 52 = 3 × 2 × 2 × 2 × 2 × 5 × 5 = 5 × 2 × 3 × 2 × 5 × 2 × 2 = etc. " [22]

https://en.wikipedia.org/wiki/Euclid's_theorem https://primes.utm.edu/notes/proofs/infinite/

https://en.wikipedia.org/wiki/Primorial

It is called a 'conjecture' because it's something that some people think may be a theorem, but which has not been proven.

https://en.wikipedia.org/wiki/Goldbach's_conjecture

"the prime number theorem, ... says that the probability that a given, randomly chosen number n is prime is inversely proportional to its number of digits, or to the logarithm of n." [23]

---

todo: at least high school geometry

Equilateral, Isosceles and Scalene

right triangle

hypotenuse

Pythagorean theorem

"The sum ... of the interior angles of a triangle ... is always 180 degrees." (in Euclidean space) [24]

interior angle and exterior angle, https://en.wikipedia.org/wiki/Angle#supplementary_angle https://en.wikipedia.org/wiki/Internal_and_external_angle

https://en.wikipedia.org/wiki/Exterior_angle_theorem

"Two triangles are said to be similar if every angle of one triangle has the same measure as the corresponding angle in the other triangle. The corresponding sides of similar triangles have lengths that are in the same proportion, and this property is also sufficient to establish similarity." [25]

todo basic trigonometry in triangles

In 2-D, we call them 'polygons'. In 3-D, we call them 'polyhedra'. The general, n-dimensional term for this sort of thing is 'polytope'.

In 2-D, for every integer 3 and greater choice of number of sides, there is a polygon. But in 3-D, there are only 5 polyhedra which are convex and regular, and we call them "the five Platonic solids".

The 5 Platonic solids (convex regular polytopes in 3-D) are:

- tetrahedron: 4 triangular faces (3-d analog of triangle)
- cube: 6 square faces
- octahedron: 8 triangular faces
- dodecahedron: 12 pentagonal faces
- icosahedron: 20 triangular faces

" In 4 dimensions, there are exactly six regular polytopes. The 'hypertetrahedron' - mathematicians call it the '4-simplex' - with 5 tetrahedral faces: Some people call this a '5-cell', 'pentatope' or 'pentachoron'. The 'hypercube' - science fiction writers call it the 'tesseract' - with 8 cubical faces: The 'hyperoctahedron' - mathematicians call it the '4-dimensional cross-polytope' or '16-cell', with 16 tetrahedral faces: Some people call this an 'orthoplex', or a 'hexadecachoron'. The 'hyperdodecahedron' - mathematicians call it the '120-cell' - with 120 dodecahedral faces. ... The 'hypericosahedron' - mathematicians call it the '600-cell' - with 600 tetrahedral faces: ... Last but not least, the '24-cell', with 24 octahedral faces. This is a denizen of the 4th dimension with no analog in lower dimensions:

You might things would keep getting more complicated in higher dimensions. But it doesn't! 4-dimensional space is the peak of complexity as far as regular polytopes go. From then on, it gets pretty boring. This is one of many examples of how 4-dimensional geometry and topology are more complicated, in certain ways, than geometry and topology in higher dimensions. And the spacetime we live in just happens to be 4-dimensional. Hmm.

There is a kind of hypertetrahedron, called the 'n-simplex', having (n+1) faces, all of which are (n-1)-simplices. There is a kind of hypercube, called the 'n-cube', having 2n faces, all of which are (n-1)-cubes. And there is a kind of hyperoctahedron, called the 'n-dimensional cross-polytope', having 2n faces, all of which are (n-1)-simplices. "

Note that tetrahedrons, hypertetrahedrons, and simplices are analogs of triangles. Note that cubes, hypercubes/tesseracts are analogs of squares.

Links:

In 2-D, there are 3 three non-degenerate regular tessellations/tilings of the plane:

- square
- triangular
- hexagonal

In 3-D, there is only one:

- cube

In 4-D, there are three:

- hypercube
- 16-cell
- 24-cell

In 5-D and above, there is only one:

- hypercube

Links:

---

Links:

- Axiomatic Characterizations of Information Measures by I Csiszár, 2008
- http://www.stat.cmu.edu/~cshalizi/350/2008/lectures/06a/lecture-06a.pdf

---

-- n-th roots of unity: they look scary but if you choose to think of {e^i x} as abstract mumbo-jumbo which is shorthand for cos x + i sin x, and if you think of that as abstract mumbo-jumbo which is shorthand for a position on a circle, then their algebraic form is actually pretty intuitive. Remember, the right point on the circle is labeled 0 (and 1), the left point is -1, the top is i and the bottom is -i.

so: what's the square roots of 1? 1 and -1. What are the fourth roots of 1? Well, in reality, just 1 and -1. But nooo, we want -1 to have roots too. So now we have i*i = -1. So that means, i*i*i*i = 1. And, (-i)^4 = 1, also. But this can easily be visualized by thinking that exponentiation represents multiplication of an angle around the circle; so, since i represents 90 degrees, i^2 represents 180 degrees (-1), and i^3 represents 270 degrees (-i). So, the fourth roots of unity are just those angles A such that if you start at the right point and then go around by A degrees, and you do that four times, then you get back to where you started.

The mathematical reason that exponentiation represents multiplication of an angle is that an angle is represented by the label at the corresponding point at that angle on the circle; so, for example, 90 degrees is labeled i. Now you represent that label with cos x + i sin x; in this case, x must be pi/2 (+ 2\pi n, of course) for the result to be i. And you represent that with e^(i x). Now, if you take (e^ix)^k, going up to the exponent steps down the exponentiation (it's like, if you're a god on earth, you're just an ordinary guy in heaven), turning it into multiplication. So you get e^(ixk); or, e^(i x'), where x' = xk. Now you can see that xk is going to be used as the amount of radians in the new angle. So the old amount (x) got multiplied by k.

So that's the use of this whole contrivance. It's like operator overloading in obejct-oriented programming. Sometimes you want some term (a container object, in OOP terms) to hole an angular value, but you don't want it to work the normal way, where if you add two of these things, then the angular values inside them add. Nooo, you want it so that if you MULTIPLY two of these guys, then the values inside get added. Now, exponentiation is like iterated multiplication, so if you raise one of these guys to 2, then that doubles the angular value that it is holding.

So that's all this stuff is. In physics, imaginary quantities aren't ever measured directly, only real quantities are measured (get it? real?). So if you see a term like ix, you gotta wonder. Well, in physics, those ix's don't actually get used either (in my experience). Whenever you see an ix, it's usually because, somewhere down the line, it's going to be the argument of an exp function (exp(x) = e^x). Now, exp(x) is perfectly normal when x is real, but when x is imaginary, it's just a container object for angles, which turns multiplication on the outside into addition on the inside. And recall that complex numbers, a + ib, when used as the argument of an exp, separate into a real argument and an imaginary one: e^(a + ib) = (e^a)(e^{ib}). The first one is perfectly sensible (recall that it, too, turns multiplication on the outside into addition on the inside), and the second one can be thought of as a container for angles which turns multiplication on the outside into addition on the inside.

i guess a simpler way to say this is: e^x is a container for angles which turns multiplication on the outside into addition on the inside. complex numbers, at least when they are used as the arguments to exps, are just another way to represent angles.

one thing that makes it confusing is that the angle representation for inputs and outputs is different. by this i mean, consider e^{i x} = y. I'll say that "ix" is the "input", and "$y$" is the "output". Let's see what we get for an input of i \pi/2:

e^{i \pi/2} = cos \pi/2 + i sin \pi/2 = i

So, we see that the output $i$ represents the angle \pi/2.

But the input corresponding to this was $i \pi/2$. So, on the input side, $\pi/2$ is represented by $i \pi/2$; but on the output side, $\pi/2$ is represented by $i$.

so, when you see an i, you can think of it as: "something which, when given as an argument to an exp, represents an angle of 1 radian; and, when it is the output of an exp, represents a 90 degree angle".

and the point of using this sort of representation instead of just using radians directly is that, (what? why not just use e^pi and its ilk? is it b/c you want to have the canceling out of increments of 2 \pi done automatically? is it b/c you want to have the complex plane as the output, for convenience?)

Feynman apparently called it a "jewel". I call its use an obstruction (in physics, not in pure mathematics). Now, even I will admit that it makes derivations more concise, but it obscures pictorial intuition.

--

S1 \subseteq S2 is defined to mean: if x \in S1, then x \in S2

You can switch back and forth between these two ways of saying the same thing.

A 'binary relation between two sets S1 and S2' is a table where one set is the 'row header', one set is the 'column header', and each entry in the table is either 'IN' or 'OUT' (or equivalently, TRUE or FALSE). Equivalently, it is the set of coordinates of those table entries that say 'IN'; equivalently, it is a set of pairs (s1, s2) where s1 \in S1 and s2 \in S2; equivalently, it is a subset of S1 X S2.

Notationally, relations are often written as capital letters for example 'R'. If R is a binary relation, as a convenient notation we write "xRy" as a shorthand for "(x,y) \in R"

Binary relations where the sets S1 and S2 are the same set are equivalent to the specification of which edges exist in graphs; in this case, the set is taken to be the set of nodes in the graph, and xRy is taken to be TRUE just when there is an edge from x to y.

An n-ary relation is the multidimensional analog of a binary relation; eg one could have a subset of S1 x S2 x S3. "An example of a ternary relation (i.e., between three individuals) is: "X was introduced to Y by Z", where \left(X, Y, Z\right) is a 3-tuple of persons; for example, "Beatrice Wood was introduced to Henri-Pierre Roché by Marcel Duchamp" is true, while "Karl Marx was introduced to Friedrich Engels by Queen Victoria" is false." -- [26]

todo symmetric, antisymmetric, reflexive, transitive. Symmetric relations on one set are equivalent to undirected graphs.

R^{-1} is just R with the ordering switched, eg

(b,a) \in R^{-1} iff (a,b) \in R

Relational composition is a binary function on relations; it takes two relations as input and produces a relation as output. It is written ';'.

R1;R2 is the relation such that (x,z) \in R1;R2 iff \exists y s.t. (such that) (x,y) \in R1 and (y,z) in R2.

---

sin(x) ~= x when x is small (and d(sin(x))/dx ~= 1 when x is small; which makes sense because d(sin) = cos, and cos(0) = 1)

ln(1+x) =~ x when x is small (and d(ln(1 + x))/dx ~= 1 when x is small; which makes sense because d(ln) = 1/x, d(ln(1+x)) = 1/(1+x) and 1/(1+x) = 1)

(note the general form of both of these; we found a formula whose derivative is 1 when x = 0)

---

Abstract math is the only school subject where i felt like via learning it, i became smarter in general, in addition to learning about that particular field or learning particular skills such as reading quickly. I dunno if i actually became smarter, it just felt that way.

A lot of people think they're bad at math. Well, I've learned a bit of math (even though not as much, and not as well, as someone who majored in math), and i'm here to tell you: it's not you; math is hard for almost everyone. It's just a way of thinking that is unnatural for humans; indeed, math thoughts are so different from the kinds of thoughts that we seem to be 'built for' that they even feel a little 'alien'. Often when i'm learning new math i read some math expression, laboriously understanding each symbol, and then when i get to the end, i go, "huh?". Or, even worse (and, sadly, more common), my mind gets so bored in the middle of the math expression that my mind goes blank and i have to start over at the beginning again. Although i am one of those people who claim to 'enjoy math', for me this process is somewhat unpleasant; it's work that i have to do in order to reach the goal of understanding something new. What i enjoy is the moment of understanding, not the work that it takes to get there; the work is just the price paid to get to the part i like. When i do get through the expression, and i feel i've 'understood' each part of it, and have a glimmer of 'understanding' for the whole thing, often that 'understanding' thought feels like it is fragile, like the memory of a dream, and i lose it easily. I end up having to read over each expression many times before the 'understanding' gets solid enough that i don't lose it. Then i come back another day and find that i forgot the understanding, and i have to do it again (this second time goes much quicker than the first, though).

I bet that a lot of people think they're bad at math because in other areas of life, that sort of thing doesn't happen, and they assume that the people who are good at math don't go through that. They think that because when something new comes up in math class, they see some other people get it right away while they are still confused. It's true that sometimes when something new is presented, i can grasp it quickly; but this is usually because i have had more practice thinking about somewhat similar things in the past. When my mathy friends explain something new to me, my typical experience is not really being able to keep up; that's fine, because i know that if i go back and study it for long enough, i will learn it. It is my belief that anyone (excluding people with very serious mental disabilities) can bring themselves to the same level of understanding of math as anyone else (excluding a few 'geniuses') if they put in enough time (and assuming there is someone else there willing to answer their questions as they come up). Perhaps some people have 'math talent' that allows them to spend less time for the same amount of understanding, but i don't think that some levels of understanding are themselves unreachable (again, except possibly for the levels of understanding reached by a few 'geniuses').

Now there is a creative side to math, and just like art and music, all the studying in the world may not give you the 'math creativity' needed to do new in the field that is as wonderful as those lucky souls who have more talent than you; but creating is different from understanding. I am saying that anyone can understand. You just have to put in the time.

Mathematical thinking is often slower than other kinds of thought. I mean both that you have to have patience, and also that you seem to be using inherently 'lower frequency' parts of mind in order to understand new math. Often, it is not possible to just say to your mind "hurry up!" and have it think about math faster; in fact that is often counterproductive because by doing this you interrupt yourself, bringing the slow processing to halt and causing feelings of frustration. Rather, you have accept the slower pace of thought, and relax into the deep, slower parts of your consciousness.

I'm not saying you should not work towards becoming fast at math. But the way you become fast is through practice and repetition, not through hurrying.

A lot of math requires you to understand previously learned math to a much greater degree than non-math studies. And, unlike some other fields which have a lot of facts which must just be memorized, often math has a deep structure that, if you can grasp it, makes it easier to learn and to remember. This means that, assuming that you are sure that you will later study more math build on some prerequisite, then spending a little more time acquiring a better understanding, and a speedy facility in manipulation, in the prerequisite is an investment that may pay off in saving you time learning the new thing later.

It's amazing how something that seems impossibly confusing at first becomes easier and easier each time you think thru it.

Another amazing thing is that often if you 'sleep on it', or sometimes even just spend some time away from it, something which seemed confusing suddenly seems much less so when you come back to it. For this reason I highly recommend reading math once, and then re-reading it again another day. I think your brain works on this stuff in some unconscious 'background process'.

Also, if some math expression is giving you a lot of trouble, it's okay to give up on it for now and come back tomorrow (assuming you do actually come back to it!). There is a good chance that it will be easier tomorrow. You don't have to get to the point of 'understanding' some confusing piece of math in order for that piece to magically get easier to read the next day (but you DO have to seriously think hard about it for at least a few minutes, whether or not that thought leads to understanding).

You should also try to 'space out' your re-reads. Studies have shown that if you study something twice with a lot of time in between, you remember it for longer than if you studied it twice for the same amount of time but with less time in between.

Besides better understanding, revisiting math many times makes you faster at it. This is important because if you are trying to solve a math problem you want to be able to consider the possible next steps in manipulating each expression quickly enough so that your mind will automatically 'flip through them' and consider them for you, in the same way that if you are in your kitchen and you are thirsty, your mind automatically figures out that there is a cup available in the cabinet that could be filled with water and brought to your mouth. Being able to 'skip over' the small steps in your mind's eye in this way allows you to 'see further' overall, all the way to a distant possible solution.

You understand math a lot better if you do practice problems (if you are not doing homework in a class, and you are looking for practice problems, be sure to find some for which you can get the answers to check afterwards!).

One thing that you should be consciously trying to do when learning or doing math is always converting mathematical concepts and expressions between different corresponding thought representations. If there is a logical statement, maybe find a corresponding English sentence; then maybe find a corresponding graphical diagram. If there is an integer, maybe think of it as corresponding to many factors multiplied together. If there is a binary relation, think of it as a table, and then think of it as a set of pairs, and then think of it as the edges in a graph.

This builds up your intuition. In addition, many mathematical theorems (and definitions) are literally formalized correspondences like this, different ways of conceptualizing the same thing. Furthermore, shifting between different formal representations of the same fact is often a large portion of many proofs.

Many other theorems, if not identifying different ways of conceptualizing the same thing, identify correspondences in terms of "if, using representation A, fact X is true, then if you shift to representation B, fact Y must be true".

Above, i recommend exploratory shifting between different representations as a tactic, both in your mind for the purpose of understanding, and on paper for the purpose of solving math problems. Another thing i recommend is consciously working to build up an internal 'library' of different representations as you learn math. Consider my example of an integer represented as the product of its factors. For example, 12 = (2*2*3). The operation of multiplication on two integers corresponds to the operation of mixing together all of the factors of both of them; eg 6 * 4 = (2*3) * (2*2) = (2*2*2*3). This should become a tool in your mental toolbox; when you see an integer, you can now ask yourself, "what other corresponding representations do i know for integers?" and "a product of factors" is one of the items on that list; if you mentally examine the concept 'product of factors representation of integers', you should have a list of correspondences stored there, such as 'multiplication of integers corresponds to joining together their factors'.

In addition to your library of alternate representations, as you learn you should be building up a library of techniques for manipulating math expressions. Stuff like "if i need to solve a polynomial of degree 2 or less, i can use the quadratic formula", or 'one technique for evaluating an integral is integration-by-parts'. For each technique, you should remember what it gets you (eg solution to an equation; evaluation of an integral) and when it can/should be used (eg when the equation is a polynomial of degree 2 or less; when you have an integral and it would be easier to split it into parts and integrate one part and take the derivative of the other part).

Memorizing the actual technique is less important as it can be looked up, but still very useful, because once you remember the technique and can do it quickly, your brain will start to automatically guess the shape of the result, which will help you notice unexpected places where the application of the technique may be of use.

Building up a library of techniques, and memorizing them, may seem unimportant as they are not as 'fundamental' as fundamental concepts, and they are not as much of an investment towards easier understanding of later fundamental concepts; but this is crucial to (a) being able to quickly read through and understand a series of manipulations, and (b) being able to solve math problems.

Of course, just like with any other subject, you should also be learning an internal library of concepts (such as 'prime numbers') and an internal library of little facts (such as '7 is prime').

By the way, i'm not saying that you have to write down all these 'libraries', although you could. But if you don't have time to do any extra work, then just keep this stuff in mind as you learn.

It is sometimes said that Michelangelo claimed that when looking at an uncarved block of marble, he perceived the possiblity of the sculpture that could be made from this particular block and merely carved away the rest of it, freeing the shape that he had perceived.

To make an analogy to this, sometimes when trying to understand some topic in math, often what you are doing is not so much trying to gain some piece of information, but rather to exorcise erroneous thoughts that you have about the topic, leaving only the actual truth. The process can be subtractive, rather than additive. Like Michelangelo's marble, sometimes your immature understanding already contains the proper understanding of the subject; the problem is just that it also contains a bunch of errors.

When actually working on math, as opposed to studying it in a course, you might think that the thing to do is to clearly conceptualize the problem you wish to solve, and then work directly towards its solution. This is the correct strategy for many areas of life, and you should do this some in math, but there is something else that is even more important. You should look for, and come up with from scratch, TRACTABLE assumptions, problems, and concepts which are similar to, even if not identical to, the main problem; and you should 'play around' with these other related assumptions, combining them and manipulating them symbolically and seeing what you can prove of them and their combinations, to see if they are particularly 'generative' in the sense of yielding many interesting theorems, or conclusions that seem closer to your goal than when you started. In fact, what i mean by "tractable" and "generative" might be the same thing; maybe a 'tractable' problem is one whose assumptions are 'generative'. Note that the 'tractable' assumptions that you work with might even be things that you know for a fact to be untrue; for example, you might assume that some function is linear when you know that it is not, or you might assume that people are always rational when you know that they are not. Spending time working with assumptions that you know to be wrong seems senseless, but empirically it has shown to be a good technique for making progress in math (and applied math).

I put 'play around' in quotes to emphasize that this is a productive use of your time, part of the work, not just something for fun. Of course, this sort of thing, while often crucial for doing new math, is sometimes also useful for learning math.

Diagonalization proof that there are more reals than integers

Continuum hypothesis, and its independence

---

If there are a bunch of people (or objects of any sort) with many attributes, it is exceedingly unlikely that any one of them will not be extreme (ie far from average) on any attribute.

" banku_brougham 22 hours ago

The concept of no members of a group fitting into the average range for all observations is reminiscent of the 'curse of dimensionality'. Can anyone with a data science background make the connection here?

Secondly, this seems to explain why everyone hates autocorrect.

reply

okintheory 22 hours ago

Yes, this can be made precise: For example, consider a uniform distribution on a d-dimensional cube with side length 1. Only a (1/2)^d fraction of the mass is within 0.25 distance of the average in every coordinate.

In high dimensions, almost all the mass is "near the boundary" in at least one coordinate.

reply

carlob 21 hours ago

Or as one of my professor would say (after showing the calculation for n-dimensional spheres): "and that's why infinite dimensional oranges are all skin!"

"

Markov chains: http://setosa.io/ev/markov-chains/

---

Links:

- https://en.wikipedia.org/wiki/Template:Areas_of_mathematics
- https://en.wikipedia.org/wiki/Elementary_mathematics

---

" In particular, abstract algebra and category theory have a lot to say about pairs of functions such that one is injective and the other "reverses" that. toFloat :: Integer -> Float and round :: Float -> Integer form this relation, e.g. The most important thing to note is that round is a one-sided inverse of toFloat, or, that for all integers x, round(toFloat(x)) = x but it's not the case that for all Floats, y, toFloat(round(y)).

When you've got a situation like this, you call the pair a "section" and "retraction". " -- https://gist.github.com/garybernhardt/122909856b570c5c457a6cd674795a9c?#gistcomment-1857104

---

mb links:

https://betterexplained.com/articles/linear-algebra-guide/ http://joshua.smcvt.edu/linearalgebra/ https://minireference.com/static/tutorials/linear_algebra_in_4_pages.pdf http://ml-cheatsheet.readthedocs.io/en/latest/linear_algebra.html

---

https://marckhoury.github.io/counterintuitive-properties-of-high-dimensional-space/ https://news.ycombinator.com/item?id=16524452

---

"In linear algebra, a n x n real matrix M is said to be positive definite if the scalar (z^T)M(z) is strictly positive for every non-zero column vector z of n real numbers. Here z^T denotes the transpose of z.[1]" -- [27]

"A matrix is positive definite if (x^T)A(x) > 0 for all vectors x where x != 0." -- [28]

---

https://en.wikipedia.org/wiki/Isometry

---

In math the word 'conjugate' seems to be used to indicate a relationship of flipped sign or of orthogonality. This doesn't match the usual use (and etymological root) of the word to indicate a relationship of conjunction (eg "the conjugates were conjoined"). I think the reason seems is that the mathematical uses can be traced to an algebraic operation called 'conjugation', which does involve 'conjoining' stuff, see [29].

---

tips:

https://news.ycombinator.com/item?id=17088592

physicsAI 30 minutes ago [-]

I thought I would also add my two cents, though there have been many excellent responses already. I am about to defend my PhD? in Physics at MIT.

First of all - great idea! It is never too late to learn math and physics! In fact, with hard work and commitment, anybody can muster them to a high level.

(1) Reading =/= understanding in math and physics. You understand a topic only if you can solve the problems.

(2) Work through the solved problems you encounter in textbooks carefully.

(3) Most people around me have never read any physics textbook cover to cover. E.g. reading Halliday, Resnick & Walker completely might take you years! Not all topics are equally important. Focus on the important parts.

(4) You need guidance on what is important and what is not. Online courses, college material (especially problem sets!), teaching webpages could be a helpful guide. Checkout MIT OCW, once you are ready.

(5) Finding someone to talk to is really useful. You will likely have questions. Cultivating some relationship that allows you to ask questions is invaluable.

(4) College courses in math and physics have a very definitive order. It is really difficult to skip any step along the way. E.g. to understand special relativity, you must first understand classical physics and electrodynamics.

(5) Be prepared that the timescales in physics are long. Often, what turns people off is that they do not get things quickly (e.g. in 15-30 minutes). If you find yourself thinking hours about seemingly simple problems, do not despair! That is normal in physics.

(6) You have to 'soak in' physics. It takes time. Initially, you might feel like you do not make a lot of progress, but the more you know, the quicker it will get. Give yourself time and be patient and persistent.

(7) Often, just writing things down helps a lot with making things stick. It is a way of developing 'muscle memory'. So try and take notes while reading. Copying out solved problems from textbooks is also a good technique.

(8) Counterintuitive: If you get completely stuck, move on! Learning often happens in non-linear ways. If you hit an insurmountable roadblock, just keep going. When you return in a few days/weeks, things will almost certainly be clearer.

reply

---

another good eigenvector visualization

http://setosa.io/ev/eigenvectors-and-eigenvalues/

---

soVeryTired 1 day ago [-]

For a large fraction of probability theory, you only need two main facts from linear algebra.

First, linear transforms map spheres to ellipsoids. The axes of the ellipsoid are the eigenvectors.

Second, linear transforms map (hyper) cubes to parallelpipeds. If you start with a unit cube, the volume of the parallelpiped is the determinant of the transform.

That more or less covers covariances, PCA, and change of variables. Whenever I try to understand or re-derive a fact in probability, I almost always end up back at one or the other fact.

They're also useful in multivariate calculus, which is really just stitched-together linear algebra.

reply

qmalzp 1 day ago [-]

I think the first point is only true for symmetric matrices (which includes those that show up in multivariable calc). In general, the eigenvectors need not be orthogonal.

reply

soVeryTired 1 day ago [-]

Yep, you could well be right. The image of an ellipse under a linear transform is definitely an ellipse, but I'm not sure about the eigenvectors in the general case.

The symmetric case is by far the most relevant for probability theory though.

reply

tprice7 22 hours ago [-]

In general it's the eigenvectors of the positive-semidefinite (hence symmetric) part of the left polar decomposition.

reply

---

sannee 1 day ago [-]

Eigen{vectors,values} seemed like this totally arbitrary concept when I first learned about them. Later it turned out that they are actually really awesome and pop up all the time.

Multivariable function extrema? Just look at the eigenvalues of the hessian. Jacobi method convergence? Eigenvalues of the update matrix. RNN gradient explosion? Of course, eigenvalues.

reply

---

to remember which is which between the symbols '<' vs '>' (less-than and greater-than):

- pretend they are alligator mouths, about to eat the larger quantity
- or, the big side of the symbol goes next to the big number (or similarly, the side with 2 vertices is next to the big number, the side with 1 vertex is next to the small number)
- or, pretend that they are spears, and the bigger, more 'powerful' numbers are poking the smaller, 'weaker' number with the pointy part
- or, hold up your left hand and make a less-than sign; remember "Left hand, Less than" (they both start with 'L').

---

to remember which axis on a graph is X and which is Y:

" chriswaugh on Aug 7, 2013

unvote [-] |

When working with axis - "X is a-cross"

Chris2048 on Aug 7, 2013

unvote [-] |

" You have to run, before you can fly, You have to X, before you can Y ",

Rhyming reminder of which axis is vertical, and which horizontal...

test1235 on Aug 7, 2013

unvote [-] |

'Y to the sky!'

michaelmartin on Aug 7, 2013

unvote [-] |

Wise up! ;) (Y's up) " [30]

---

ok this is more applied math than math but here is a short piece on how to practically design a simple P, PI, or PID controller (Proportional, Proportional Integral, or Proportional Integral Derivative):

http://www.wescottdesign.com/articles/pid/pidWithoutAPhd.pdf

discussion: https://news.ycombinator.com/item?id=16257156

---

http://parrt.cs.usfca.edu/doc/matrix-calculus/index.html

---

http://www.cs.yale.edu/homes/aspnes/classes/202/notes.pdf

---

http://explained.ai/matrix-calculus/index.html

---

http://setosa.io/ev/markov-chains/

---

https://arxiv.org/pdf/1807.08416.pdf

---

https://www.math3ma.com/blog/matrices-probability-graphs https://news.ycombinator.com/item?id=19323123

---

---

https://www.dhruvonmath.com/2018/12/31/matrices/

---

http://msc2010.org/mediawiki/index.php?title=MSC2010 https://en.m.wikipedia.org/wiki/Outline_of_mathematics https://en.m.wikipedia.org/wiki/Glossary_of_areas_of_mathematics https://en.m.wikipedia.org/wiki/Mathematics_Subject_Classification https://en.m.wikipedia.org/wiki/Category:Fields_of_mathematics https://en.m.wikipedia.org/wiki/Lists_of_mathematics_topics https://en.m.wikipedia.org/wiki/Areas_of_mathematics

---

great free online book on abstract linear algebra:

http://linear.axler.net/LinearAbridged.html

---

https://www.meltingasphalt.com/interactive/going-critical/

---

one way to characterize different fields of study is by their content, and another is by their methods, and another is by their 'discipline', where by 'discipline' i mean a negative concept of what they do NOT permit themselves to do. You might think of this as a content/form dichotmy where method/discipline is the form part. The method/discipline characterization of (at least modern) mathematics is proof; in math we don't content ourselves with things that seem like they may be true, or things for which we have a lot of evidence that they are true (for example, 2 is prime, 3 is prime, 4 isn't prime, 5 is prime, 6 isn't prime, 7 is prime, it seems like we have some evidence that at least every other number is prime? oops...), rather we demand proof. A friend once remarked that mathematicians made a bargain with the universe that everything they say will be true; the tradeoff is that they have no guarantee that it will be useful...

to categorize math by its content, you once might have said that math is about numbers (and even that leaves out some of the ancient focus on geometry) but modern mathematics seems to have expanded a bit, and now it's about anything 'formal'. Formal here means not the modern sense of stuffy rituals and etiquette, but the ancient sense of 'pertaining to form' rather than to content.

To explain the content/form dichotomy, it's kind of like color/shape. For an example, think of an apple. I can tell you "this apple is big", and if you know the basic rules of English and you know what apples are and what bigness is, you understand what i mean. But even if you didn't know what 'apples' are and what 'big' means, you'd still understand that i said something of the same form as "this flibbernoodle is slibberdegibbet". What 'apples' are and what 'big' is is 'content' here, and what's left over after you take that out is the shape of the sentence, the form. And in the form alone there is some meaning; a famous example is "Every man is mortal; Socrates is a man; therefore Socrates is mortal". Even if you don't know who Socrates is, even if you don't know what 'mortal' means, in fact even if you don't know what 'man' is, you can recognize that some of the reasoning here is correct; if it's true that "Every A has property B" and also that "C is an A", then you can deduce that "C has property B". That's the sort of thinking you can do about form, without considering content.

---

information theory

https://colah.github.io/posts/2015-09-Visual-Information/

---