What I read in 2021 | dataand.me

My relationship with reading borders on pathological (and by “borders on” I mean “has literally been a topic of discussion in therapy”). I mean, I’ve gotten it under control somewhat—we’ll use my 2014 Goodreads Reading Challenge as a bar for a bit out of control—which means I can take a look back on my 2021 year in books without too much self-recrimination.

The data

For all its faults (and there are many), I’ve gotten in the habit of using Goodreads to log what I’m reading over the past seven(ish) years. If nothing more, it has a nice enough export function, which lets you (or me, or whomever) retrieve your reading data as a CSV.

I stashed my exported data in Google Sheets. So, I’ll use {googlesheets4} to read it into R with its sheet ID, and make our lives easier by passing it straight through janitor::clean_names().

View code

library(tidyverse)
library(googlesheets4)
library(skimr)

View code

gr_data <- read_sheet("1PqnJ2UOaYnfIRCVSvlYlyfeQ4OlynMiPq0eCIfDbjLU") |>
  janitor::clean_names()

Since I just want to see the books I read in 2021, I’m going to filter these by bookshelves—keeping only those books on my 2021-reads shelf. (The count of books on this shelf matches up with the count of books read this year according to my Goodreads 2021 Reading Challenge, 188, and I’m too lazy to track down the five books that go missing when I filter by date.) I’m also going to convert book_id to be a string, since it seems to come through as numeric by default.

View code

read_2021 <- gr_data |> 
  filter(str_detect(bookshelves, "2021-reads")) |> 
  mutate(book_id = as.character(book_id))

Summary stats

Let’s peep a quick summary of the data using skimr::skim().

skimr::skim(read_2021)

Data summary
Name	read_2021
Number of rows	188
Number of columns	31
_______________________
Column type frequency:
character	15
list	1
logical	6
numeric	7
POSIXct	2
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
book_id	0	1.00	4	8	188
title	0	1.00	4	145	188
author	0	1.00	9	22	134
author_l_f	0	1.00	10	23	134
additional_authors	145	0.23	9	46	36
isbn	6	0.97	10	10	182
isbn13	25	0.87	13	13	163
binding	0	1.00	5	21	9
bookshelves	0	1.00	10	93	97
bookshelves_with_positions	0	1.00	16	138	188
exclusive_shelf	0	1.00	4	4	1
my_review	183	0.03	116	1156	5
recommended_for	188	0.00	NA	NA	0
recommended_by	188	0.00	NA	NA	0
condition	188	0.00	NA	NA	0

Variable type: list

skim_variable	n_missing	complete_rate	n_unique	min_length	max_length
publisher	0	1	108	0	1

Variable type: logical

skim_variable	n_missing	mean	count
spoiler	188	NaN	:
private_notes	188	NaN	:
original_purchase_date	188	NaN	:
original_purchase_location	188	NaN	:
condition_description	188	NaN	:
bcid	188	NaN	:

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
my_rating	0	1.00	3.22	0.98	1	3.00	3.00	4.00	5	▁▅▇▆▂
average_rating	0	1.00	4.02	0.37	2	3.81	4.06	4.25	5	▁▁▃▇▂
number_of_pages	7	0.96	328.12	124.81	51	256.00	320.00	387.00	1152	▃▇▁▁▁
year_published	0	1.00	2014.94	7.56	1964	2011.00	2018.00	2020.00	2022	▁▁▁▂▇
original_publication_year	19	0.90	2008.56	16.15	1963	2002.00	2017.00	2020.00	2022	▁▁▁▂▇
read_count	0	1.00	1.00	0.00	1	1.00	1.00	1.00	1	▁▁▇▁▁
owned_copies	0	1.00	0.00	0.00	0	0.00	0.00	0.00	0	▁▁▇▁▁

Variable type: POSIXct

skim_variable	n_missing	complete_rate	min	max	median	n_unique
date_read	5	0.97	2021-01-03	2021-12-30	2021-08-23	157
date_added	0	1.00	2012-03-21	2021-12-29	2021-07-11	147

Points of interest

For one thing, I don’t make use of the bulk of the 31 variables stored in the Goodreads export data. They’re all on my read shelf (the exclusive_shelf variable can be: “currently reading”, “read”, or “want to read”—I only have one distinct value for that variable and its max and min length is four, the same number of letters in the word “read”).

Though I’m not quite there, My book ratings (my_rating) are somewhat normally distributed, given you can only give a book 1, 2, 3, 4, or 5 starts. I’m admittedly withholding with stars for this very reason—if I just start handing out five-star reviews willy-nilly, then my ratings become devoid of value.

The length of the books I read this year varied wildly! Though the mean number_of_pages (328) seems pretty typical, a standard deviation of 125 is a big swing. With a sample size of 181 (there are seven entries missing values), you wouldn’t think that a single book would make a huge difference. That said, James Clavelle’s Shōgun clocked in at 1,152 pages, which is pretty darn hefty.

There are, indeed, five entries that have nothing in them for date_read. (The n_missing for skim_variable date_read is five). I think this is because I entered these a few days after reading them, and giving the dates for “Started Reading” and “Finished Reading” in the Goodreads interface, is not the same as clicking “I’m finished” in the “Update Progress” interface in terms of giving you a date_read.

Example of "My Activity" for a book without a date_read entry. The start and finish dates are given (2021-02-22, and 2021-02-23, respectively), but the exported data does not include a date_read. — Example of “My Activity” for a book without a `date_read` entry. The start and finish dates are given (2021-02-22, and 2021-02-23, respectively), but the exported data does not include a `date_read`.

Example of Goodreads interface for updating your progress on reading a book. The “I'm finished” button seems to beget the date_read in the exported data. — Example of Goodreads interface for updating your progress on reading a book. The “I’m finished” button seems to beget the `date_read` in the exported data.

If I decide to do any sort of temporal chart of my reading over the year, I’ll have to go in and manually fix the missing date_read entries, since the “started reading” and “finished reading” dates are not part of the data export.

Books over time

So, having fixed those entries with missing date_read manually, let’s take a peek at what my reading looked like over the course of the year.

View code

read_2021_rev |> 
  ggplot(aes(x = date_read, y = cumsum(read_count))) +
  geom_line() +
  scale_x_date(NULL,
               breaks = scales::date_breaks(width = "1 month"),
               labels = scales::label_date_short()) +
  labs(
    title = "Sum of books Mara read over the course of 2021",
    alt = "With x-axis range from January 2021 to January 2022, shows relatively steady increase in cumulative sum of books read over time (from zero to ~200, where y-max = 188).",
    x = "Time",
    y = "Total books read"
  ) +
  hrbrthemes::theme_ipsum_rc()

With x-axis range from January 2021 to January 2022, shows relatively steady increase in cumulative sum of books read over time (from zero to ~200, where y-max = 188). — Figure 1: Sum of books read over the course of 2021.

Not particularly riveting. It’s a pretty steady climb, and I think the slope increases mainly where I went on series benders (e.g. the Parker books at the end of the summer), and also after the Batpig died in November (I read when I’m sad).

What about for pages?

View code

read_2021_rev |> 
  mutate(number_of_pages = replace_na(number_of_pages, 0)) |> 
  mutate(total_pages = cumsum(number_of_pages)) |> 
  ggplot(aes(x = date_read, y = total_pages)) +
  geom_line() +
  scale_x_date(NULL,
               breaks = scales::date_breaks(width = "1 month"),
               labels = scales::label_date_short()) +
  scale_y_continuous(labels = scales::label_comma()) +
  labs(
    title = "Sum of pages Mara read over the course of 2021",
    alt = "With x-axis range from January 2021 to January 2022, shows relatively steady increase in cumulative sum of pages read over time (from zero to ~60000, where y-max = 59390).",
    x = "Time",
    y = "Total pages read"
  ) +
  hrbrthemes::theme_ipsum_rc()

With x-axis range from January 2021 to January 2022, shows relatively steady increase in cumulative sum of pages read over time (from zero to ~60000, where y-max = 59390). — Figure 2: Sum of pages read over the course of 2021.

Still looks like a steady climb. Translation: Nothing much to see here.

The books

Keeping in mind that just because I read a book doesn’t mean I recommend it (I have a thing about finishing books—that “thing” being that I have to do it), here’s a little widget of what I read in the year 2021 Anno Domini.

Reuse

CC BY-SA 4.0

Citation

BibTeX citation:

@online{averick2021,
  author = {Averick, Mara},
  title = {What {I} Read in 2021},
  date = {2021-12-31},
  url = {https://dataand.me/blog/2021-12_what-i-read-in-2021/},
  doi = {10.59350/dzdwa-es082},
  langid = {en-US}
}

For attribution, please cite this work as:

Averick, Mara. 2021. “What I Read in 2021.” December 31, 2021. https://doi.org/10.59350/dzdwa-es082.