Response to discussion about memorization and learning

drafting my response, eventual response here: https://forum.openbagel.com/t/some-anki-stats/62/8

Notes on Kevin’s Open Bagel Post

First, this discussion is good and made realize there are a few different ideas going on here. I’ve also realized that Anki and spaced repetition tools are better described as tools for “not forgetting” than they are as tools for memorization. I would describe memorization as a type of “learning”, in a broad sense of the term. I’ll address your points before putting down the updated version of my own thinking.

Memorization = going through a list of flashcards until you know all of them

Periodic table

There are some groupings in the periodic table, and using them would definitely help you learn the connections and make it easier to remember. However, even if some elements are have “logically derivable” names (I’m not sure any do, honestly), some definitely do not, since they were named after the person who discovered the element, a country, or random other name. So you can get part of the way with connections, for example Berkelium is 97 and Californium is 98, easy to remember together. But as far as I know, if you forget Curium is 96, there isn’t any way to get to that from other information.

This is not to even mention the abbreviations, which are definitely just memorization. Cr is Chromium and Cm is Curium is about as much a memorization problem as it gets (like vocabulary).

Now, this brings up an interesting thought. More “memorizable” something is in this sense, also makes the knowledge feel “brittle”. And that type of knowledge also feels less “important”, in that perhaps it’s required for fluency (easily and smoothly communicating in a language or a given subject), but losing that knowledge is not a blow to some deeper level understanding of a subject area. Blanking on a definition or abbreviation is really only a time cost, not a knowledge cost.

Back to your points:

Even though I don’t think the periodic table is this, your points about the benefits of richer learning over rote memorization are good and interesting

(i)

First, great points. It’s made me think a lot and I’ve written a lot. You’ve triggered me to reframe a couple things which I think will be helpful for understanding the different types of stuff we’re talking about here. Before that though, I’ll respond to your points.

Periodic Table. Caveat: I remember very little of the logical backing for the periodic table (part of why I wondered how pure memorization would work). There are groupings in the periodic table, but I don’t think it’s possible to deduce any possible forgotten element name or (especially) abbreviation if you don’t recall it yourself. Still, it’s a fair point, if there was a way to remember the table without memorization that would be all of (i-iii). Your point about multiplication tables is this in a nutshell, I think. From this point already though, you can kind of feel that the more “memorization required” some given content is, the less “intellectually valuable” it is. This alludes to your second question, addressed below.

Geography. There’s no doubt: knowing history or other contextual information makes it much easier to remember. If you’re starting from scratch with no prior knowledge, and all you need to do is know every capital of every country, I can’t imagine you could do it in less time than 5 minutes over a lifetime per country with anything other than spaced repetition. Maybe SketchyMicro could, if they really built an amazing memory palace of all the countries, maybe you wouldn’t need spaced repetition to recall that information for the rest of your life. Also, even if spaced repetition solves the brittle nature of remembering cognitively isolated information technically, relying on an external tool to maintain your memory is a different type of brittle, and an interesting discussion in itself.

Your questions:

  1. In a sense, this asks the question of what exactly is the difference between learning through reasoning vs memorization. I’m guessing the operative difference here is that the memorizer learns that x = y, whereas the reasoner also understands why x = y, and thus has some greater context and connection to the world outside of x = y. In that sense, I don’t think it’s elitism, you should pick reasoner over memorizer pretty much every time, simply because it’s more knowledge. Now, maybe the memorizer knows a lot more variations of x = y than the reasoner, who spent time learning the why. But in most subject areas, going deeper into foundational understanding is going to be more important than gaining memorized breadth. Obvious in arithmetic, maybe slightly less so in history, but still pretty clear that understanding the reasons for the civil war are more important than remembering the dates of all the battles, for example.
  1. I don’t think there is any field that is valuable by itself that is also sufficiently covered with rote memorization alone (at least in the way we’re discussing memorization, I’ll present a different way in a subsequent post). Vocabulary may be the closest thing to this probably, but it’s not really a whole subject space (and even that has etymology, which makes them not like randomly different bits). I also can’t really think of anything we teach as truly just rote memorization.
  1. Your follow up I think is the answer to this, Nielsen does a pretty good job of it. There’s a spectrum of value: At the minimum it saves time, at the maximum, it makes a subject area a first language, and allows you to think in that space more easily (super-chunking?). Even more basically, the value of memorization is dependent on what you memorize, and how long you remember it for. So if you don’t memorize valuable information (chunks that either save you time, or help you move up and down levels of abstraction easily) and don’t use spaced repetition, then probably not much value in memorization.

Like you’ve said, I don’t think learning through memorization and reasoning are mutually exclusive. I’ll speak to that more in a bit, but in order to get out of this writing hole I’ll send this first part along for now and not try to trim this part down any more :)

Below are a few different use cases for spaced repetition. They are not super distinctly separated from each other, but they helped me categorize the stuff we’re talking about:

  1. Using it as a part of a learning process, with two (kind of different, but kind of similar) ways of doing so that I can immediately see:
  1. Trying to learn something only through spaced repetition exposure
  1. Not forgetting something you’ve already learned

(1)

  • Internalizing definitions or items required for higher order learning
  • Speeding up repetitive or difficult tasks
  • Both of the above

(2)

I’m guessing that this is, to some degree, what teachers fear when they think about students using memorization rather than reasoning to learn a subject. I think that’s fair.

Now, I do have a side theory that a lot more of learning is reducible to “memorizable” ideas than we normally give credit for being, but that’s a slightly different topic.

Periodic table and German states is an example of me trying (2), I think. This is probably the weakest use case and the most “brittle” because it’s asking spaced repetition to lift the whole load of learning and making that learning meaningful (i.e. learning by memorization alone).

Some topics are not high volume and you can just memorize them (German states). Some things are higher volume and chunking them into groups would have been better than going through everything slowly and randomly like I did (e.g. the periodic table, despite these cards being pretty well designed in that they offer visual and written mnemonic hooks, as you can see from the images above. And they actually are designed to be learned in groups but I accidentally didn’t do it that way and randomized it).

I don’t know of any whole subjects that are “worth learning” that are like this. Usually the memorization, even if critical, is a part of a higher level whole (e.g. memorizing law for a lawyer, multiplication tables to do arithmetic). At which point, this pretty much switches back to being use case (1).

This is why for use case (2) with any non-trivial content, the design of the cards is important, and there needs to be desire for the learner to put in some effort to connect it mnemonically. I started a couple music interval recognition decks but stopped them because I quickly felt it was too hard to learn through only basic exposure and spaced repetition, and that combined with feeling like learning them was hardly any benefit in my life made going through those cards grueling.

And to be clear, I have not really heard of anybody advocating for (2) for anything more than things like vocabulary and other classic “memorization is sufficient” types of learning. Pretty much everyone that uses Anki says that creating your own cards is the way to make the tool much more useful, and admit that a lot of the work is in understanding a concept enough to create well-defined, atomic cards. It was more my own deviation thinking it was an interesting idea that, with essentially zero prior interest or contextual knowledge, I could spend cumulative <5 minutes to learn that Magdeburg is the capital of Sachsen-Anhalt, or whatever, and retain that “for the rest of my life” (conditions apply, lol).

(3)

This is kind of similar to (1), but the main difference is that you removed all the burden of learning from the spaced repetition process. This allows you to kind of backtrack and say to yourself, “what is an example of something that, if I forgot how to do, I would be regressing my knowledge”. This seems to enable some more free form types of spaced repetition prompts, like every repetition is a new problem of a specific kind that I should be able to solve, and I have to solve it to make sure that I haven’t forgotten how to do it. This is a bit outside of the normal research done regarding spaced repetition, since forgetting how to solve a problem isn’t usually remedied simply by seeing the solution (though it could be!). Nonetheless, even this kind of “spaced practice” probably has a similar type of forgetting curve and could be optimized with the similar algorithms.

Memorization is sometimes mistaken with the absence of richer learning, but I think they can coexist (and fix that “brittle” feeling). I would say that learning most things involves at some point or another remembering information “like the back of your hand”, i.e. you do not remember how or why you learned this fact, but it became ingrained in your memory. Remembering that piece of information alone is often useless without the greater context (e.g. how to take the derivative of an exponential function without knowing what a derivative is), but still seems to be a part of the learning that is happening. The worry here is that memorization would “remove” the connection from the more “connected” learning happening, but I’m not sure it needs to.

Regarding Memorization as compared to learning

  • the two types of students, dull one and the clever one.
  • the benefits of the dull student are an example of how memorization can be useful. not memorizing 1212 and thus not knowing how to do 1213, but memorizing the steps for multiplication. Learning that 1213 is the same kind of problem as 1212 is the kind of thing that I don’t see as being within the scope of memorization, and is the core problem of learning.
  • In fact, I’ll run with that theory for the core problem of learning:
* I know how to apply x to a, but how do I know when a new thing comes along that I should also be applying x to?
* More generally, there is a set of actions and set of objects (not mutually exclusive), when do I apply what actions to what objects.
* And in this scenario, memorization is not necessarily useful for figuring out which objects require which actions (as far as I can tell right now). However, I think memorization can be useful for making it easier to learn a set of possible objects or actions.
* Curious if there is a lot or little sympathy for that general idea of learning. Seems to be missing something (many things?), but it’s compelling enough.

More simply though, memorization is not to be used alone for learning, but in congregation with chunking.

Chunking is what we do subconsciously while we’re learning. We understand big ideas in ways we can easily recall them.

Anki is a little different than memorization. Memorization depends on the information you choose to remember. I would call it a tool for not forgetting, and how you use it can span the spectrum of “rote memorization” (German states) to “chunk management”

For example, I wouldn’t say a teacher should change their lesson plan to incorporate more memory based techniques in the person to person activities. Rather they could modify some homework to be oriented around spaced repetition (with the option having some of that homework be simply recall, and not application)

Side Theory

Sawyer’s dull and clever students come to mind. The clever student may learn by “reasoning”, thinking using language and logic, etc etc, not sure what else describes the process of learning through reasoning alone. Meanwhile, the dull student could do one of two things. Memorize the multiplication tables and have no idea what 12*13 is, or see the steps taken for multiplication and memorize those. The latter

Here are a few questions to frame the discussion, in an order I think helps them flow:

  1. How are memory palaces (SketchyMicro) and spaced repetition (Anki) related to memorization?
  1. What can spaced repetition provide after learning?
  1. What can spaced repetition provide while learning?

How are memory palaces (SketchyMicro) and spaced repetition (Anki) related to memorization?

As far as I can tell, memorization is taking a set of information, and learning each piece of information one at a time. Memory palaces and spaced repetition are tools for “not forgetting” pieces of information, and they both can be essentially independent from the information itself. This is why they are generalizable strategies for “not forgetting”. I imagine they work best when put together, but I think the fundamentally interesting part of spaced repetition is that it’s not a mnemonic device itself, but instead an “effortless” retention system with pretty clear time costs.

Spaced repetition after learning

Most people who use Anki or discuss spaced repetition would say that it’s best as a tool to be used after learning either as preparation for future learning, or, most often, as maintenance in order to not forget. And that’s the key, spaced repetition is not actually a memorization tool as much as it’s a tool for not forgetting.

When you’ve learned a subject, it is also easier to answer the question: What is something that, if I forgot it, would indicate a regression in my understanding? In theory (even if Anki doesn’t support it), every answer to that question could be a spaced repetition “card”. For example, maybe every repetition is a new problem of a specific kind that I should be able to solve, and I have to solve it to make sure that I haven’t forgotten how to do it. This is a bit outside of the scope of traditional spaced repetition research, since sometimes seeing the solution to a problem you forgot how to solve doesn’t make you recall how to solve it yourself. Nonetheless, I think this falls under the umbrella of things that have a forgetting curve ( https://en.wikipedia.org/wiki/Forgetting_curve), and thus could be optimized using spaced repetition algorithms.

What’s funny is, we don’t usually consider a thing to be “learned” if we haven’t retained it, so in a way we have to reduce the bar something has to reach to be considered learned to then find value in using spaced repetition. The classic example might be reading a book or article, feeling like you’ve gained some new understanding, but only retaining some of the core ideas, or only being able to recall the core ideas from certain mental entry points that happened to stick around in your brain longer than others. I think the real win of spaced repetition is being able to “choose what you remember” from things you learn, and not relying on your brain and whatever subset of that learning it chooses to retain without assistance.

Spaced repetition while learning

But we’ve mostly been talking about the value of this spaced repetition while we’re learning, like in a classroom setting. A lot of this is discussed above, but I think it boils down to the following:

  1. Spaced repetition can save you time by helping you avoid having to look up commonly accessed but easily forgotten information.
  1. Spaced repetition can help you retain (and maybe even internalize / “grok”) chunks of information, making it easier and more fluid to think at higher levels of abstraction (like dreaming in a foreign language).
  1. If you buy my theory of learning below, it may be at least theoretically possible to structure all learning in the form of retaining atomic pieces of information through spaced repetition.

In fact, I have been thinking about how spaced repetition could be used in a classroom setting. @kevji I would be curious if you thought there was some content that could be surfaced to students using spaced repetition that would help bolster their learning. Or maybe just a spaced repetition deck to take away after class to help them forget less of it over time! My gut is that this understanding of cognition and memory is under-utilized in education, but of course I am also over-emphasizing it here because it’s the topic of discussion.

A theory of learning

Here is a theory of learning that I’m interested to hear if anyone agrees with (@kevji, you’ll see hints of category theory here, at least in my malnourished understanding of it, so some bias in my worldview for sure).

There are two, not mutually exclusive sets:

  • All objects (X)
  • All actions (Y)

All learning is connecting elements from X using elements from Y, and internalizing those connections. To the degree that this is true, then I think all learning is dealing with bits of information that can be made atomic. This is because we can move up and down degrees of abstraction as needed (using conceptual chunking), and turn all our understanding and knowledge into connections between objects. Objects and actions are probably not general enough, so I’ll give some examples (possibly not correct).

Object = x2,

Action = take the derivative,

Resulting Object = 2x

Object = the variable x in the context of a math equation

Action = what does it represent

Object = a complex number

Object = complex number

Action = expand contents

Object = any number represented by the equation a+bi where a is a real number, …

Object = complex number

Action = why do they exist?

Object = in order to solve some problem some mathematicians discovered at some point…

Object = cause of the civil war,

Action = expand contents,

Resulting Object = [slavery, tight election, cotton industry, Fox News, …]

Object = [it’s end of day on Friday, everyone is tired, a lot of code has changed]

Action = should we push to production?

Object = No.

Etc..

To the degree learning can be represented by atomic pieces of information like this, I don’t see how learning with reasoning and memorization are fundamentally different. To me, it seems like memorization might just be implying that the order of learning doesn’t matter. But once we concede that order matters, then I kind of want to call reasoning “memorization with style”.

It still feels like there’s a difference between grokking ( https://en.wikipedia.org/wiki/Grok) something and memorizing it. If we break our intuition down to its atomic bits, there probably is a translation between our intuition and something memorizable. But does memorizing something connect the right neural pathways in our minds to eventually grok that something? I don’t see immediately a reason why not, assuming all our learning can be reduced to something like I describe above.

Sorry, that was long but I had fun writing it :)

Related