Animals slaughtered in the Netherlands

animal welfare

data cleaning

data visualization

Published

September 18, 2022

In this post I take a look at how many animals are slaughtered in the Netherlands. The goal is to find and clean the data to answer this question and to get a better grasp of exactly how many animals are killed here every hear. It’s both an exercise in data cleaning and calibrating one’s beliefs about this topic, as this is something that’s too easy to avoid thinking about, while probably being one of the most important things you should think about.

Data

The data on the number of animals slaughtered in the Netherlands can be found on StatLine. This is a database managed by CBS, the national statistical office of the Netherlands. Specifically, we’re going to take a look at the meat production numbers (vleesproductie). These numbers can be found in a table here. We can adapt what is in the table by changing various filters. By default it shows both the number of animals and the weight of the animals. I’m only interested in the number of animals, so I deselect the weight-related rows. I also see that they offer data on more dates than is shown by default, so I select all of the dates. I then download the data as a .csv file using the button in the top right corner. Now we can start cleaning the data for our purposes.

Setup

Run the following setup code if you want to follow along. You can download the data yourself or use my file.

Code

# Load packages
library(tidyverse)

# Load data
data <- read_csv2("meat-production-netherlands.csv")

# Set default ggplot
theme_set(theme_minimal())

Note that we have to use read_csv2() because the data values are separated by a semi-colon. This is an annoying default in the Netherlands (and probably elsewhere in Europe).

Data cleaning

Let’s begin by inspecting the first few rows of the data.

Code

head(data)

First 6 rows of the data.

It should be no surprise, but the data is in Dutch. Let’s translate the data, starting with the columns. One of the columns is called Aantal slachtingen (x 1 000), which means number of slaughtered animals in units of 1000. Instead of translating this directly, I will simply rename it to count and multiply the values by 1000.

Code

data <- data %>%
  rename(
    animal = Slachtdieren,
    period = Perioden,
    count = `Aantal slachtingen (x 1 000)`
  ) %>%
  mutate(count = count * 1000)

Let’s clean up the period column next. It seems like it contains the year and the month (in Dutch). I can translate the month names to Dutch, but I first want to make sure that all data values are structured the same way. count() is a great function to inspect that.

Code

data %>%
  count(period)

Curiously, not all rows in the data contain both the year and the month. Some only have the year. This is important because that means we can’t just sum the number of slaughtered animals per year because that means we’ll actually get twice the number of animals because we’ll sum both the animals slaughtered in that year and each month of that year. The last several months also have an asterisk in the month name. This asterisk indicates that the data for these months has not yet been finalized.

What I want to do next is create a new column that only contains the year and another column that contains the month. Creating the year column is easy because we can use parse_number() to extract the year from the data. The month is a bit trickier, but we can use a regular expression to remove the year, leaving us with the month. We use str_remove() and tell it to remove a string pattern that consists of 4 numbers and a space. We can also use it to remove the asterisk from the more recent months, but first we add a column to say whether the numbers are final or not based on this asterisk. After doing that, we recode the month values that need to be translated and also convert the empty string to a missing value. Finally, we remove the period column because we don’t need it anymore.

Code

data <- mutate(data,
    year = parse_number(period),
    month = str_remove(period, "[0-9]{4} ?"),
    final = if_else(str_detect(period, "\\*"), "no", "yes"),
    month = str_remove(month, "\\*"),
    month = recode(month,
      "augustus" = "august",
      "februari" = "february",
      "januari" = "january",
      "juli" = "july",
      "juni" = "june",
      "maart" = "march",
      "mei" = "may",
      "oktober" = "october",
    ),
    month = na_if(month, ""),
    period = NULL
  )

Next are the animals. Let’s take a look at the unique values we have.

Code

count(data, animal)

Hmm… it looks like there are a few challenges here. First, we seem to have both total values and non-total values, so we should take care to separate these. Second, we need to figure out what each word means. Even my Dutch is not helping me in understanding each type of animal.

Let’s first simply translate the values so we get a better grasp of what we are dealing with. The translations won’t be direct translations. Instead, I already think about what kind of categories make sense and how I want to plot the data later, so I translate the values into names that will also be useful later.

Code

data <- mutate(data, 
  animal = recode(animal, 
    "Eenhoevigen" = "ungulates (mostly horses)",
    "Geiten (totaal)" = "goats",
    "Kalkoenen" = "turkeys",
    "Kalveren jonger dan 9 maanden" = "calves (< 9 months)",
    "Kalveren van 9 tot en met 12 maanden" = "calves (9-12 months)",
    "Koeien" = "cows",
    "Overig pluimvee" = "poultry (misc)",
    "Overige kippen" = "chicken (mostly layers)",
    "Rundvee (totaal)" = "cattle",
    "Schapen incl. lammeren" = "sheep",
    "Schapenlammeren" = "lambs",
    "Stieren" = "bulls",
    "Totaal kalveren" = "calves",
    "Totaal volwassen runderen" = "adult cattle (total)",
    "Vaarzen" = "heifers",
    "Varkens (totaal)" = "pigs",
    "Vleeskuikens" = "broilers"
  )
)

Translating the words was very helpful to better understand the data. One thing that’s clear is that some of the values are totals of other values. Below I list which values in the data are actually sums of other values:

adult cattle: Total of cows, heifers, and bulls
cattle: Total of adult cattle and calves
calves: Total of calves (< 9 months) and calves (9-12 months)

If we are interested in what the totals are made of, we can remove the total columns and reconstruct them later if we want to. This works for the first two total columns, but not calves because they only started making the distinction between young and older calves in 2009. So let’s instead remove the values that the total values are made of and only keep the total values.

Code

data <- filter(data, !animal %in% c("adult cattle (total)", "cows", 
  "heifers", "bulls", "calves","calves (< 9 months)", 
  "calves (9-12 months)", "lambs")
)

This leaves us with the following animals.

Code

count(data, animal)

This looks fine to me, which means we are almost done with the data cleaning. At this point I want to create two separate data frames: one that only contains the annual data and one that contains the monthly data. This is easy to do because we can take all the annual data by simply selecting the rows with a missing value in the month column. Since the month column is useless in that data frame, we remove it.

Code

data_annual <- data %>%
  filter(is.na(month)) %>%
  select(-month)

data <- filter(data, !is.na(month))

As a final step we can combine the year and month columns from the monthly data frame into a single column, which will be useful for plotting the data later. This requires a special function from the zoo package.

Code

data <- mutate(data,
  month = str_to_sentence(month),
  month = match(month, month.name),
  year_month = paste(year, month, "1", sep = "-"),
  year_month = lubridate::as_date(year_month),
  year_month = zoo::as.yearmon(year_month)
)

Data analysis

With the data cleaned up we can start to ask some questions. Let’s begin with a graph that shows as much of the data as possible. That means plotting the monthly data for each animal.

Code

ggplot(data, aes(x = year_month, y = count)) +
  geom_point(size = 1) +
  geom_line(alpha = .25) +
  facet_wrap(~ animal, scales = "free")

Number of slaughtered animals per month and animal

A few observations:

The numbers are very high, particularly for some animals (e.g., broilers)
Some animals used to be slaughtered in larger numbers than now
- This is clearly the case for turkeys and poulty (misc), but also other other animals such as cattle and pigs
Some numbers are relatively low (e.g., for horses)
The data sometimes fluctuates quite a bit from month to month, so annual view might show clearer patterns

Given these observations, let’s create a subset focusing only on six categories of animals that are still being slaughtered in large numbers and let’s also plot the annual data.

Code

data_annual_subset <- data_annual %>%
  filter(animal %in% c("broilers", "goats", "sheep", "cattle", "pigs", 
    "chicken (mostly layers)")
  ) %>%
  filter(final == "yes")

ggplot(data_annual_subset, aes(x = year, y = count)) +
  geom_point(size = 1) +
  geom_line(alpha = .25) +
  facet_wrap(~ animal, scales = "free")

Number of slaughtered animals per year and animal

Okay, parsing this graph I see that a lot of chicken are slaughtered every year, particularly broiler chicken. The numbers are so high that ggplot has switched to the scientific notation to represent the numbers. Interestingly, though, the number of slaughtered broiler chickens has decreased somewhat in the last two years. I don’t know why that is. I also see that some animals are slaughtered more and more over the years (e.g., cattle, non-broiler chicken, goats, and pigs), although I’m also surprised to see that for some animals we’ve had worse years, particularly for cattle and pigs. For those animals we see a huge drop around the year 2000. The reason for that drop was the outbreak of foot-and-mouth disease and subsequent regulation. I thought things were getting worse and worse, but apparently it was already worse a while ago.

Let’s hone in on some exact numbers. Below we create a table to show the number of slaughtered animals in 2021, per animal.

Code

data_annual %>%
  filter(year == 2021) %>%
  arrange(desc(count)) %>%
  select(animal, count)

Number of slaughtered animals in 2021

Oof. That’s over 500 million chicken! For reference, the Netherlands had a population of 17.17 million in 2021.

How many animals were slaughtered in total, in 2021?

Code

count_total_2021 <- data_annual %>%
  filter(year == 2021) %>%
  summarize(count_total = sum(count)) %>%
  pull(count_total)

Apparently a total of 541039500, or 17.16 animals per second. That means that about…

…have died since you started reading this blog post. That’s a bit of a bummer to end on, but then, this post was never going to have a happy ending.

This post was last updated on 2022-09-22.

--- title: "Animals slaughtered in the Netherlands" date: 2022-09-18 categories: - animal welfare - data cleaning - data visualization df-print: paged code-tools: true code-fold: show include-after-body: counter.html --- In this post I take a look at how many animals are slaughtered in the Netherlands. The goal is to find and clean the data to answer this question and to get a better grasp of exactly how many animals are killed here every hear. It's both an exercise in data cleaning and calibrating one's beliefs about this topic, as this is something that's too easy to avoid thinking about, while probably being one of the most important things you *should* think about. ## Data The data on the number of animals slaughtered in the Netherlands can be found on [StatLine](https://opendata.cbs.nl/statline#/CBS/nl/ "StatLine"). This is a database managed by CBS, the national statistical office of the Netherlands. Specifically, we're going to take a look at the meat production numbers (vleesproductie). These numbers can be found in a table [here](https://opendata.cbs.nl/#/CBS/nl/dataset/7123slac/table?ts=1663151766192). We can adapt what is in the table by changing various filters. By default it shows both the number of animals and the weight of the animals. I'm only interested in the number of animals, so I deselect the weight-related rows. I also see that they offer data on more dates than is shown by default, so I select all of the dates. I then download the data as a .csv file using the button in the top right corner. Now we can start cleaning the data for our purposes. ## Setup Run the following setup code if you want to follow along. You can download the data yourself or use my [file](meat-production-netherlands.csv). ```{r} #| label: setup #| message: false # Load packages library(tidyverse) # Load data data <- read_csv2("meat-production-netherlands.csv") # Set default ggplot theme_set(theme_minimal()) ``` Note that we have to use `read_csv2()` because the data values are separated by a semi-colon. This is an annoying default in the Netherlands (and probably elsewhere in Europe). ## Data cleaning Let's begin by inspecting the first few rows of the data. ```{r} #| label: data-inspection #| tbl-cap: First 6 rows of the data. head(data) ``` It should be no surprise, but the data is in Dutch. Let's translate the data, starting with the columns. One of the columns is called `Aantal slachtingen (x 1 000)`, which means number of slaughtered animals in units of 1000. Instead of translating this directly, I will simply rename it to `count` and multiply the values by 1000. ```{r} #| label: rename-columns data <- data %>% rename( animal = Slachtdieren, period = Perioden, count = `Aantal slachtingen (x 1 000)` ) %>% mutate(count = count * 1000) ``` Let's clean up the `period` column next. It seems like it contains the year and the month (in Dutch). I can translate the month names to Dutch, but I first want to make sure that all data values are structured the same way. `count()` is a great function to inspect that. ```{r} #| label: inspect-count data %>% count(period) ``` Curiously, not all rows in the data contain both the year *and* the month. Some only have the year. This is important because that means we can't just sum the number of slaughtered animals per year because that means we'll actually get *twice* the number of animals because we'll sum both the animals slaughtered in that year and each month of that year. The last several months also have an asterisk in the month name. This asterisk indicates that the data for these months has not yet been finalized. What I want to do next is create a new column that only contains the year and another column that contains the month. Creating the year column is easy because we can use `parse_number()` to extract the year from the data. The month is a bit trickier, but we can use a regular expression to remove the year, leaving us with the month. We use `str_remove()` and tell it to remove a string pattern that consists of 4 numbers and a space. We can also use it to remove the asterisk from the more recent months, but first we add a column to say whether the numbers are final or not based on this asterisk. After doing that, we recode the month values that need to be translated and also convert the empty string to a missing value. Finally, we remove the `period` column because we don't need it anymore. ```{r} #| label: clean-period data <- mutate(data, year = parse_number(period), month = str_remove(period, "[0-9]{4} ?"), final = if_else(str_detect(period, "\\*"), "no", "yes"), month = str_remove(month, "\\*"), month = recode(month, "augustus" = "august", "februari" = "february", "januari" = "january", "juli" = "july", "juni" = "june", "maart" = "march", "mei" = "may", "oktober" = "october", ), month = na_if(month, ""), period = NULL ) ``` Next are the animals. Let's take a look at the unique values we have. ```{r} #| label: inspect-animals count(data, animal) ``` Hmm... it looks like there are a few challenges here. First, we seem to have both total values and non-total values, so we should take care to separate these. Second, we need to figure out what each word means. Even my Dutch is not helping me in understanding each type of animal. Let's first simply translate the values so we get a better grasp of what we are dealing with. The translations won't be direct translations. Instead, I already think about what kind of categories make sense and how I want to plot the data later, so I translate the values into names that will also be useful later. ```{r} #| label: clean-animals data <- mutate(data, animal = recode(animal, "Eenhoevigen" = "ungulates (mostly horses)", "Geiten (totaal)" = "goats", "Kalkoenen" = "turkeys", "Kalveren jonger dan 9 maanden" = "calves (< 9 months)", "Kalveren van 9 tot en met 12 maanden" = "calves (9-12 months)", "Koeien" = "cows", "Overig pluimvee" = "poultry (misc)", "Overige kippen" = "chicken (mostly layers)", "Rundvee (totaal)" = "cattle", "Schapen incl. lammeren" = "sheep", "Schapenlammeren" = "lambs", "Stieren" = "bulls", "Totaal kalveren" = "calves", "Totaal volwassen runderen" = "adult cattle (total)", "Vaarzen" = "heifers", "Varkens (totaal)" = "pigs", "Vleeskuikens" = "broilers" ) ) ``` Translating the words was very helpful to better understand the data. One thing that's clear is that some of the values are totals of other values. Below I list which values in the data are actually sums of other values: - adult cattle: Total of cows, heifers, and bulls - cattle: Total of adult cattle and calves - calves: Total of calves (\< 9 months) and calves (9-12 months) If we are interested in what the totals are made of, we can remove the total columns and reconstruct them later if we want to. This works for the first two total columns, but not calves because they only started making the distinction between young and older calves in 2009. So let's instead remove the values that the total values are made of and only keep the total values. ```{r} #| label: remove-non-total-animals data <- filter(data, !animal %in% c("adult cattle (total)", "cows", "heifers", "bulls", "calves","calves (< 9 months)", "calves (9-12 months)", "lambs") ) ``` This leaves us with the following animals. ```{r} #| label: inspect-animals-again count(data, animal) ``` This looks fine to me, which means we are almost done with the data cleaning. At this point I want to create two separate data frames: one that only contains the annual data and one that contains the monthly data. This is easy to do because we can take all the annual data by simply selecting the rows with a missing value in the `month` column. Since the `month` column is useless in that data frame, we remove it. ```{r} #| label: separate-data-frames data_annual <- data %>% filter(is.na(month)) %>% select(-month) data <- filter(data, !is.na(month)) ``` As a final step we can combine the `year` and `month` columns from the monthly data frame into a single column, which will be useful for plotting the data later. This requires a special function from the `zoo` package. ```{r} #| label: add-year-month-column data <- mutate(data, month = str_to_sentence(month), month = match(month, month.name), year_month = paste(year, month, "1", sep = "-"), year_month = lubridate::as_date(year_month), year_month = zoo::as.yearmon(year_month) ) ``` ## Data analysis With the data cleaned up we can start to ask some questions. Let's begin with a graph that shows as much of the data as possible. That means plotting the monthly data for each animal. ```{r} #| label: plot-monthly-data #| fig-cap: Number of slaughtered animals per month and animal #| warning: false ggplot(data, aes(x = year_month, y = count)) + geom_point(size = 1) + geom_line(alpha = .25) + facet_wrap(~ animal, scales = "free") ``` A few observations: - The numbers are very high, particularly for some animals (e.g., broilers) - Some animals used to be slaughtered in larger numbers than now - This is clearly the case for turkeys and poulty (misc), but also other other animals such as cattle and pigs - Some numbers are relatively low (e.g., for horses) - The data sometimes fluctuates quite a bit from month to month, so annual view might show clearer patterns Given these observations, let's create a subset focusing only on six categories of animals that are still being slaughtered in large numbers and let's also plot the annual data. ```{r} #| label: plot-annual-data #| fig-cap: Number of slaughtered animals per year and animal #| warning: false data_annual_subset <- data_annual %>% filter(animal %in% c("broilers", "goats", "sheep", "cattle", "pigs", "chicken (mostly layers)") ) %>% filter(final == "yes") ggplot(data_annual_subset, aes(x = year, y = count)) + geom_point(size = 1) + geom_line(alpha = .25) + facet_wrap(~ animal, scales = "free") ``` Okay, parsing this graph I see that a lot of chicken are slaughtered every year, particularly broiler chicken. The numbers are so high that `ggplot` has switched to the scientific notation to represent the numbers. Interestingly, though, the number of slaughtered broiler chickens has decreased somewhat in the last two years. I don't know why that is. I also see that some animals are slaughtered more and more over the years (e.g., cattle, non-broiler chicken, goats, and pigs), although I'm also surprised to see that for some animals we've had worse years, particularly for cattle and pigs. For those animals we see a huge drop around the year 2000. The reason for that drop was the outbreak of foot-and-mouth disease and subsequent regulation. I thought things were getting worse and worse, but apparently it was already worse a while ago. Let's hone in on some exact numbers. Below we create a table to show the number of slaughtered animals in 2021, per animal. ```{r} #| label: table-count-2021 #| tbl-cap: Number of slaughtered animals in 2021 data_annual %>% filter(year == 2021) %>% arrange(desc(count)) %>% select(animal, count) ``` Oof. That's over 500 million chicken! For reference, the Netherlands had a population of 17.17 million in 2021. How many animals were slaughtered in total, in 2021? ```{r} #| label: total-count-2021 count_total_2021 <- data_annual %>% filter(year == 2021) %>% summarize(count_total = sum(count)) %>% pull(count_total) ``` Apparently a total of `r format(count_total_2021)`, or `r round(count_total_2021 / 365 / 24 / 60 / 60, 2)` animals per second. That means that about... ::: {#counter style="margin-bottom: 1rem;"} 0 ::: ...have died since you started reading this blog post. That's a bit of a bummer to end on, but then, this post was never going to have a happy ending. *This post was last updated on `r format(Sys.Date(), "%Y-%m-%d")`.*