Analytics: Leeches

This is a new series where I combine a few things that I am currently learning into a topic I have no business pretending to know anything about.  In addition to teaching myself Japanese, I am also attempting to teach myself programming and also data analysis.  Although it’s going very slowly, I am hoping to figure out a few things that will hopefully make the ankiing a little more efficient.

My first target is those damn leeches.  Leeches are what anki calls those cards that you keep forgetting over and over.  According to the supermemo site, around 50% of your time can be spent learning 2.5% of the material.  That 2.5% that is taking half of your time are leeches.  Depending on your goals, wouldn’t it be nice to be able to identify that 2.5% of material and spend that 50% of your time learning twice as much?  Personally, I would rather learn 97.5% of core twice as fast before spending the time to learn that last 2.5%.

Unfortunately we don’t know what those 2.5% hard vocab words are, and even worse, anki doesn’t give us nearly the tools to find them.  All that anki gives us is a setting that once you fail a card more than a set number of times (default is 7), anki will suspend that card.  The thinking being that you are more likely to learn a new card in less additional time than keep trying (and failing) to learn the one you’ve failed so many times already.  But I’ve always wondered what setting has you learning the most amount of material in the least amount of time?

This is the question I set out to answer.  I wrote a small program that counts the number of reps to either learn a card or become a leech.  I considered a card to be “learned” once it’s interval surpassed 4 months.  I did this for all cards, and averaging the reps to learn a card and the reps to become a leech for every card I’ve studied.  The result is the average number of reps it would take to learn a card assuming a given leech threshold in anki.
image (1)

The above graph shows the results for the 4 decks I’ve been studying.  The first thing to notice is that “core sentence”s and my” Japanese for busy people” decks are much easier than my “core vocabulary” and “kanji” decks.  The other thing to notice is that for all decks except for kanji, setting the leech threshold to the lowest setting results in learning the most number of cards in the fewest reps.  Kanji appears to be most efficient setting the leech threshold to 8, but any number higher than 4 appears to be just fine.  The final thing to notice is that all of the vocabulary and sentence decks appear to have a similar curve, and a very smooth one.  I take this to suggest that for all vocabulary decks I study, setting leeches to the lowest setting will result in learning the most amount of vocab words in the least amount of reps.  However this isn’t the only consideration.
image (3)

The second graph shows the ratio of learned cards to suspended leeches for each deck and each leech threshold.  As you can see with the “hard” vocab and kanji decks, at lower thresholds anki is suspending more cards than I would be learning learning.  In fact, setting the leech threshold to 1 for core vocab and kanji would result in learning only 18% of the vocab deck and 6% of the kanji deck.  This is hardly desirable, but finding a good balance between efficiency and completeness might make sense for some people.  For instance, setting the threshold to 9 for kanji and 6 for core vocab gets me in the 50-60% coverage range.  That still seems less than optimal to me, but something that I have to think about as there is no clear cut answer unfortunately.

That’s it for now.  Please put you thoughts, criticism, praise and especially suggestions in the comments as I’m happy to make this better with your help.

 

One thought on “Analytics: Leeches

  1. Pingback: Anki Analytics: Card difficulty - Jon Ken Po

Leave a Reply

Your email address will not be published. Required fields are marked *

4 × 2 =