knitr::opts_chunk$set(
  message = FALSE,
  warning = FALSE
)
knitr::opts_chunk$set(
  message = FALSE,
  warning = FALSE
)
library(janitor)
library(dplyr)
library(tidyr)
library(ggplot2)
library(tidylog)
library(stringr)
library(stats)
library(infer)
library(readxl)
library(lubridate)
library(visdat)
library(knitr)
library(rmarkdown)
library(kableExtra)
library(purrr)

1. Introduction

This report compiles and analyses data I personally collected from March 2024 to September 2025 on my job applications. Every time I noticed an interesting job description I had a two-step process; 1) I copy and paste the job description in a word document and 2) I fill in an excel spreadsheet with data from the job description for example: on the company name, job title, date of application etc. This process was intended to be useful in case I was called for an interview for which the job description had disappeared and along with the job details, therefore would hinder me in the preparation for the interview. The second objective was to track my applications in order to report them later to my unemployment agency, as I was required to do minimum of applications per months and provide details such as company name, location, link to the application and job title. After more than a year, the total data compiled started to be interesting to conduct an analysis and practice my skills in data wrangling I had aquired during the 2025 year, while being a stimulating and entertaining exercise it also was relevant for me to have an overview of my applications.

2. Data and methodology

Data collection process

The data comes from an excel spreadsheet and a word document. The excel spreadsheet includes the company name, the job title, the date of application, the month of application, the status of the application, the date of the reply, the sector, the organisation type, and the minimum years of experience. The word document includes the job descriptions of jobs I applied to and some I haven’t applied to. There was one KPI I wanted to include in the analysis that I hadn’t tracked, which was the number of acronyms per job descriptions. This variable was possible to extract thanks to a python script by Chat GPT that produced an excel document from a list of company names, acronyms and key word for job titles out of the word document.

Description of the two Excel tables

The Data is composed of two tables: job_acronym and job_applications. The first one has 5 columns and shows the number of occurrences of specific acronyms per job title. Yet, the table is not fully cleaned as there are still some duplicates. The second table has 12 columns and includes more information on the job applications and descriptions. The second table also needs to be cleaned, as some columns haven’t been recognized by R Studio with the correct class type.

Data Import

Loading files into R

job_acronym <- read_excel("~/Documents/Job data analysis/job_acronym.xlsx") %>% 
  clean_names() %>% 
  rename(meaning = `x5`) %>% 
  glimpse()
# still have to make sure the count is right per position

job_applications <- read_excel("~/Documents/Job data analysis/Job applications March 2024 - September 2025.xlsx") %>% 
  clean_names() %>% 
  glimpse()
# still have to clean the columns types here

Data cleaning

Handling missing values

job_acronym_NA <- job_acronym %>% 
  mutate(company = if_else(company == "Unknown", NA_character_, company))

This new table is identical to the latter except the “Unknown” companies are replaced by the NA character which is a standard character recognized by R to indicate missing values.The table contains 3 missing values now labelled as NAs.

Formatting columns

Cleaning columns and class types

job_applications1 <- job_applications %>% 
  mutate(date_of_application = ymd(date_of_application),
         date_of_reply = ymd(date_of_reply),
         min_years_of_experience = as.double(min_years_of_experience)) %>% 
  glimpse()

Cleaning duplicates

job_acronym1 <- job_acronym_NA %>%
  group_by(company, job_title, acronym) %>% 
  mutate(count = sum(count)) %>% 
  ungroup() %>% 
  distinct(company, job_title, acronym, count)

3. Descriptive Analysis

3.1 Companies

The first data I want to know is the number of companies I applied to in total.

Number of distinct companies applied to

distinct_companies <- job_acronym1%>%
  distinct(company) %>% 
  nrow()

There are 77 distinct companies in the job_acronym table.

distinct_company_names<- job_applications1 %>%
  mutate(
    name_lowercase = name %>% 
           str_to_lower()
    ) %>% 
  distinct(name_lowercase) %>% 
  nrow()

There are 112 distinct companies in the job_applications1 table. Which means some companies I applied to either didn’t include the selected acronyms in their job description, or that some applications I made were not included in the sample I used to extract the acronyms.

3.2 Sectors

Most frequent sectors

In the below table, I want to know which sectors I have applied to most frequently.

frequent_sectors <- job_applications1 %>% 
  filter(!is.na(sector)) %>% 
  group_by(sector) %>% 
  summarise(n=n()) %>% 
  ungroup() %>% 
  arrange(desc(n))

kable(frequent_sectors, caption = "Sectors I applied to") %>% 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE) %>%
  scroll_box(height = "400px")
Sectors I applied to
sector n
Consulting 24
Environment 16
Recruitment 16
Finance 9
Manufacture 8
Banking 7
human rights 7
Commodities 6
Government 6
Biodiversity 5
Chemicals 5
Energy 5
Automation 4
Aviation 4
Beverage 4
Certification 4
Science 4
Shipping 4
Academia 3
Standard-setter 3
Insurance 2
Manufacture/Watchmaking 2
Sport 2
Automobile 1
Culture 1
Diplomacy 1
Education 1
Furniture 1
Garment 1
Garnment 1
IT 1
Manufacturing 1
Media 1
Pharmaceutical 1
Real estate 1
Standards-setter 1
Supply chain 1
Sustainability 1
Sustainable Finance 1
Water 1
chemicals 1

Comment

There are several duplicated sectors because of typos, lower cases instead of capital letter, plurals instead if singular forms and so on. To avoid this issue, I have to standardize the sector names.

cleaned_sectors <- job_applications1 %>% 
  mutate(
    sector_lowercase = sector %>%
      str_trim() %>%
      str_to_lower()
  )
cleaned_sectors1 <- cleaned_sectors %>% 
  mutate(
    sector_clean = case_when(
      str_detect(sector_lowercase, "garnment") ~ "garment",
      str_detect(sector_lowercase, "standards-setter") ~ "standard-setter",
      str_detect(sector_lowercase, "manufacture|manufacture/watchmaking") ~ "manufacturing",
      TRUE ~ str_to_lower(sector_lowercase)
    )
  )
cleaned_sectors1 %>%
  count(sector_clean, sort = TRUE)
## # A tibble: 37 × 2
##    sector_clean      n
##    <chr>         <int>
##  1 consulting       24
##  2 environment      16
##  3 recruitment      16
##  4 manufacturing    11
##  5 finance           9
##  6 banking           7
##  7 human rights      7
##  8 chemicals         6
##  9 commodities       6
## 10 government        6
## # ℹ 27 more rows

Comment

The above table shows that I have mostly applied to jobs in the consulting sector, than in the environment sector and thirdly in the recruitment sector. There are 37 sectors, which means that the group size for each is very small.

frequent_sectors_clean <- cleaned_sectors1 %>% 
  count(sector_clean, sort=TRUE)

Now that I have the number of applications per sector, I build a plot and exclude the NA category, to only keep existing sector names.

frequent_sectors_clean %>% 
  filter(!is.na(sector_clean)) %>% 
  arrange(desc(n)) %>% 
  slice_max(order_by=n, n = 15) %>% 
  ggplot(aes(x=reorder(sector_clean, -n), y= n, fill = sector_clean))+
  geom_col(alpha = 0.7)+
  theme_minimal()+
  labs(title = "Top 15 most frequent sectors in applications", x="Sectors", y = "Number of applications")+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

In order to reduce the number of sectors, I decide to group them into 7 groups.

cleaned_sectors_less <- cleaned_sectors1 %>% 
  mutate(sector_grouped = case_when(
    sector_clean %in% c("finance", "banking", "sustainable finance", "insurance", "real estate") ~ "Finance & Business",
    sector_clean %in% c("manufacturing", "automobile", "aviation", "garment", "furniture", "chemicals", "commodities", "beverage", "shipping") ~ "Industry & Manufacturing",
    sector_clean %in% c("sustainability", "environment", "biodiversity", "water", "energy") ~ "Sustainability & Environment",
    sector_clean %in% c("consulting", "supply chain", "certification", "standard-setter", "recruitment", "automation", "it") ~ "Consulting & Services",
    sector_clean %in% c("government", "diplomacy", "human rights", "culture", "sport") ~ "Public & Social Sector",
    sector_clean %in% c("academia", "education", "science", "media") ~ "Knowledge & Education",
    TRUE ~ NA_character_
  ))
cleaned_sectors_less %>% 
  count(sector_grouped) %>% 
  arrange(desc(n))
## # A tibble: 7 × 2
##   sector_grouped                   n
##   <chr>                        <int>
## 1 Consulting & Services           54
## 2 Industry & Manufacturing        39
## 3 Sustainability & Environment    28
## 4 Finance & Business              20
## 5 Public & Social Sector          17
## 6 Knowledge & Education            9
## 7 <NA>                             3
cleaned_sectors_less %>%
  count(sector_grouped) %>%
  ggplot(aes(x = reorder(sector_grouped, n), y = n, fill = sector_grouped)) +
  geom_col(alpha = 0.8) +
  geom_text(aes(label = n), hjust = -0.3, size = 3.5) +
  coord_flip() +
  labs(
    title = "Number of Applications by Sector",
    subtitle = "Distribution of job applications across grouped sectors",
    x = "Sector",
    y = "Number of Applications",
    caption = paste0("n = ", nrow(cleaned_sectors_less))
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
    axis.title = element_text(size = 11, face = "bold"),
    legend.position = "none"
  )

Comment

The above graph shows that I have mainly applied to Consulting & Services sectors (54 applications) and least to the Knowledge and Education sector.

3.3 Job Requirements: Years of Experience

Descriptive statistics

In the below table I want to examine the mean, the minimum, the maximum, the median and standard deviation if the number of years of experience required in the job description sample.

year_of_exp <- job_applications1 %>% 
  filter(!is.na(min_years_of_experience)) %>% 
  summarise(mean = mean(min_years_of_experience),
            st_var = sd(min_years_of_experience),
            min = min(min_years_of_experience),
            max = max(min_years_of_experience),
            median = median(min_years_of_experience),
            group_size = n())
year_of_exp
## # A tibble: 1 × 6
##    mean st_var   min   max median group_size
##   <dbl>  <dbl> <dbl> <dbl>  <dbl>      <int>
## 1  3.30   1.80     0    10      3         71

Comment

I notice that the median and the mean are very similar; close to 3 years. This result is aligned with my actual 3 years of experience. Even though I have applied to jobs that required sometimes more than my years of experience, 50% of my applications were targeted to jobs aligned with my profile, with at least 3 years minimum required or below.

job_applications1 %>%
  filter(!is.na(min_years_of_experience)) %>%
  ggplot(aes(x = min_years_of_experience)) +
  geom_histogram(binwidth = 1, fill = "#3498db", alpha = 0.7, color = "white") +
  geom_vline(aes(xintercept = median(min_years_of_experience)),
             color = "#e74c3c", linetype = "dashed", linewidth = 0.8) +
  annotate("text", x = median(job_applications1$min_years_of_experience, na.rm = TRUE) + 0.3,
           y = Inf, vjust = 2, label = "Median", color = "#e74c3c", size = 3.5) +
  scale_x_continuous(breaks = 0:10) +
  labs(
    title = "Distribution of Minimum Years of Experience Required",
    subtitle = "Across all job applications",
    x = "Minimum Years of Experience",
    y = "Count",
    caption = paste0("n = ", nrow(filter(job_applications1, !is.na(min_years_of_experience))))
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
    axis.title = element_text(size = 11, face = "bold")
  )

Distribution plot

job_applications1 %>%
  filter(!is.na(min_years_of_experience)) %>%
  ggplot(aes(x = "", y = min_years_of_experience)) +
  geom_violin(fill = "#3498db", alpha = 0.4, linewidth = 0.3, trim = FALSE) +
  geom_boxplot(width = 0.2, fill = "#3498db", alpha = 0.7, outlier.alpha = 0.5) +
  scale_y_continuous(breaks = 0:10) +
  labs(
    title = "Distribution of Minimum Years of Experience Required",
    subtitle = "Across all job applications",
    x = "All applications",
    y = "Minimum Years of Experience",
    caption = paste0("n = ", nrow(filter(job_applications1, !is.na(min_years_of_experience))))
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
    axis.title = element_text(size = 11, face = "bold"),
    axis.text = element_text(size = 10)
  )

Comment

This graph shows that I applied to a significant proportion of jobs requiring between 2 and 3 years of experience, and as a second choice, jobs requiring between 4 and 6 years of experience on average.

3.4 Job Requirements: Acronyms and technical language

Number of acronym occurrences

job_acronym_total <- job_acronym1 %>% 
  group_by(acronym) %>% 
  summarise(total = sum(count)) %>% 
  ungroup() %>% 
  arrange(desc(total))

job_acronym_total %>% 
  kable(caption = "Number of occurrences per acronym") %>% 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE) %>%
  scroll_box(height = "400px")
Number of occurrences per acronym
acronym total
ESG 156
CSRD 34
EU 27
GHG 23
GRI 23
LCA 22
UN 22
ACV 19
TCFD 17
ISO 15
CV 13
RSE 13
IT 11
ES 10
ESRS 10
CENMAT 9
EHS 9
CNC 8
CDP 7
CSR 6
D&I 6
EPR 6
NGO 6
SASB 6
HR 5
MS 5
PhD 5
SAP 5
CEO 4
CORSIA 4
HES 4
IFRS 4
KPI 4
SDG 4
CET 3
COP 3
CROSSEU 3
CSDDD 3
E&S 3
IFC 3
L&D 3
NNY 3
PCAF 3
SAF 3
TA 3
AI 2
ASAP 2
ArcGIS 2
BPM 2
CDI 2
CDM 2
CEFR 2
CFA 2
CRM 2
DwP 2
EFTA 2
EM 2
GES 2
GIS 2
GSSB 2
ISS 2
LEED 2
NDCs 2
PEF 2
PRI 2
PoS 2
SBTi 2
SFRD 2
SME 2
SQL 2
TNFD 2
UPR 2
ACCA 1
ACE 1
AFOLU 1
AP 1
BIA 1
BU 1
CA 1
CADO 1
CAEP 1
CDD 1
COC 1
COO 1
CPA 1
CRREM 1
CoP 1
DAW 1
DDTrO 1
DGNB 1
EIA 1
EMEA 1
EPF 1
EPP 1
ERP 1
ERPDs 1
ESAP 1
ESDD 1
ESIA 1
ESMP 1
ESMS 1
ESPR 1
FCF 1
FMCG 1
GCF 1
GIIN 1
HBE 1
HRC 1
I&D 1
ICAO 1
INSTRAW 1
IPCC 1
ISCC 1
ISSB 1
JEDI 1
LGBTQ 1
LGBTQI+ 1
MBA 1
MRV 1
NFR 1
OHCHR 1
OPIM 1
OSAGI 1
PACTA 1
PAI 1
PLM 1
PMO 1
PWM 1
RED 1
REDD+ 1
RH 1
RJC 1
ROI 1
RTFC 1
SBTN 1
SBU 1
SER 1
SES 1
SMEI 1
SPOC 1
THG 1
TORs 1
UDB 1
UE 1
UNFCCC 1
UNGC 1
UNIFEM 1
VIP 1
VUCA 1
eDNA 1

This new table only has 2 columns: “acronym” and “total”. It shows the total number of occurrences per acronym throughout the whole job_acronym table. Regardless of the number of times they are repeated for the same job title.

distinct_acronyms_per_job <- job_acronym1 %>% 
  group_by(job_title, company) %>% 
  summarise(distinct_acronym = n()) %>% 
  ungroup()%>% 
  select(-company) %>%
  arrange(desc(distinct_acronym))

distinct_acronyms_per_job %>% 
  head(10) %>% 
  kable(caption = "Number of disctinct acronyms per job description")
Number of disctinct acronyms per job description
job_title distinct_acronym
ESG Officer 19
ESG Analyst 10
ESG Reporting Officer 10
Sustainability Reporting Analyst 10
Consultant 9
Environmental Sustainability Specialist Operations 9
Human Rights Officer 9
Assistant Manager Sustainability Programs 8
Chargé.e de données durabilité environnementale 8
Environmental, Social and Governance (ESG) Auditor – Associate/Senior Associate 8

This new table has 3 columns: “company”, “job_title”, and “distinct_acronym”. I exclude the company name to guarantee some secrecy. This new table shows the number of distinct acronyms per job title and company.

total_acronyms_per_job <- job_acronym1 %>% 
  group_by(job_title, company) %>% 
  summarise(total_acronyms = sum(count)) %>% 
  ungroup() %>% 
  select(-company) %>% 
  arrange(desc(total_acronyms))

total_acronyms_per_job %>% 
  head(10) %>% 
  kable(caption = "Number of acronyms per job description")
Number of acronyms per job description
job_title total_acronyms
ESG Officer 27
ESG Reporting Officer 22
ESG Analyst 19
Sustainability & LCA Specialist 18
ESG Data and Compliance Controller 17
Environmental Sustainability Specialist Operations 17
Global Sustainability Manager 17
Human Rights Officer 16
Programme Analyst 16
Sustainability Project Manager 16

This new table has 3 columns: “company”, “job_title”, “total_acronyms”. This new table shows the total number of acronyms per job title and company. Regardless of the acronym repetitions.

job_acronym_total %>%
  slice_max(total, n = 15) %>%
  ggplot(aes(x = reorder(acronym, total), y = total, fill = total)) +
  geom_col(alpha = 0.8) +
  geom_text(aes(label = total), hjust = -0.3, size = 3.5) +
  coord_flip() +
  scale_fill_gradient(low = "#a8d0e6", high = "#3498db") +
  labs(
    title = "Top 15 Most Frequent Acronyms in Job Descriptions",
    subtitle = "Total occurrences across all job postings",
    x = "Acronym",
    y = "Total Occurrences",
    caption = paste0("Total unique acronyms: ", nrow(job_acronym_total))
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
    axis.title = element_text(size = 11, face = "bold"),
    legend.position = "none"
  )

Comment

In the above plot, the acronyms are counted regardless of whether they are mentioned several times within the same job description or not. This means there is no way to assess the number of job descriptions that mention the same acronym.

Top 15 acronyms by distinct jobs

The first step is to add a column that counts the number of occurrences per distinct job.

number_of_jobs_w_acronym <- job_acronym1 %>% 
  group_by(acronym) %>% 
  distinct(company, job_title, count)%>%
  mutate(n_jobs_w_acronym = n()) %>% 
  ungroup()
number_of_jobs_w_acronym %>% 
  distinct(acronym, n_jobs_w_acronym) %>% 
  arrange(desc(n_jobs_w_acronym)) %>% 
  slice_max(order_by = n_jobs_w_acronym, n=15) %>% 
  ggplot(aes(x=reorder(factor(acronym), n_jobs_w_acronym), y= n_jobs_w_acronym, fill=acronym)) +
  geom_col()+
  coord_flip()+
  theme_minimal()+
  labs(title = "Top 15 acronyms mentionned by distinct job descriptions",
       x = "Acronyms",
       y = "Number of distinct job descriptions")

Comment

If we compare this second plot with the first one we observe that the top 3 acronyms “ESG”, “CSRD”, and “EU” are the same. The number of “ESG” occurrences is not a surprise since it is closely linked with sustainability positions. “CSRD” and “EU” can indicate the importance and of EU regulation for the companies/organisations, especially in 2025, as the EU Directive on sustainability disclosure was being debated. On November 13, 2025, the European Parliament voted in the “Omnibus” Proposal on sustainability reporting which is a simplified version of the proposition from 2024. The latest version removes roughly ~90% of companies from the scope of CSRD. Now environmental and Social reporting requirements only apply to businesses employing on average 1750 employees and with a net annual turnover of over € 450 million.

In the future, we might see a diminution of these EU disclosure-related acronyms. One could assume EU regulation on sustainability reporting becomes less of a priority. Yet, the topic of sustainability reporting for businesses might not totally fall out of trend; some companies might want to anticipate future regulations, as other jurisdictions adopt sustainability-related financial standards such as ISSB.

The number of job descriptions mentioning other reporting standards or frameworks such as “GHG” (I assume that GHG was mentioned along with Protocol but it is possible that descriptions only mention GHG, in any cases, this still shows the interest for carbon accounting which is a part of sustainability reporting), “GRI”, “TCFD” is around 15 for each one. This observation tends to confirm that while interest to anticipate EU regulation may fade, there is still a strong interest to assess and report on impact through other recognized standards and frameworks.

One notable difference with the two plots is the importance of “LCA” (Life Cycle Assessment) and “ACV” (Analyse de Cycle de Vie in French) in the first plot, while these acronyms barely make it to the top 15 of acronyms mentioned by distinct job descriptions, they appear higher in the first plot. This result shows that a few job descriptions likely include these acronyms many times, whereas the number of job descriptions referencing Life Cycle Assessment is less than 10 in a sample of distinct_companies job descriptions.

Average acronyms per job description

av_n_acronyms <- number_of_jobs_w_acronym %>% 
  group_by(job_title, company) %>% 
  mutate(n_acronyms = n()) %>% 
  ungroup() %>% 
  summarise(average_number_of_acronyms = mean(n_acronyms),
            min = min(n_acronyms),
            max = max(n_acronyms)) 
kable(av_n_acronyms, caption = "Average number of Acronyms per job description with minimum and maximum values")
Average number of Acronyms per job description with minimum and maximum values
average_number_of_acronyms min max
5.981723 1 19

4. Relational Analysis

4.1 Sector & Years of experience

Descriptive statistics

In this chapter, I want to determine if there is a relationship between the sector and the years of experience required. First, I analyze the descriptive statistics of the years of experience and group them by the sector.

cleaned_sectors_less %>% 
  filter(!is.na(min_years_of_experience)) %>% 
  group_by(sector_grouped) %>% 
  summarise(mean = mean(min_years_of_experience),
            st_var = sd(min_years_of_experience),
            min = min(min_years_of_experience),
            max = max(min_years_of_experience),
            median = median(min_years_of_experience),
            group_size = n()) %>% 
  ungroup()
## # A tibble: 7 × 7
##   sector_grouped                mean st_var   min   max median group_size
##   <chr>                        <dbl>  <dbl> <dbl> <dbl>  <dbl>      <int>
## 1 Consulting & Services         3.75   1.74   2       8    3           20
## 2 Finance & Business            2.88   1.46   1       5    2.5          8
## 3 Industry & Manufacturing      3.17   1.54   1       5    3           18
## 4 Knowledge & Education         2.67   2.07   0       5    2.5          6
## 5 Public & Social Sector        3.33   1.37   2       5    3            6
## 6 Sustainability & Environment  2.79   1.53   0.5     5    2.5         12
## 7 <NA>                         10     NA     10      10   10            1

Comment

The differences in the mean for each sector are below 1 point. This observation matches the observation from chapter 3.3.

cleaned_sectors_less %>%
  filter(!is.na(min_years_of_experience), !is.na(sector_grouped)) %>%
  ggplot(aes(x = sector_grouped, y = min_years_of_experience, fill = sector_grouped)) +
  geom_boxplot(alpha = 0.7, outlier.alpha = 0.5) +
  geom_jitter(width = 0.2, alpha = 0.3, size = 2) +
  scale_y_continuous(breaks = 0:10) +
  labs(
    title = "Years of Experience Required by Sector",
    subtitle = "Distribution of minimum years of experience per sector",
    x = "Sector",
    y = "Minimum Years of Experience",
    caption = paste0("n = ", nrow(filter(cleaned_sectors_less, !is.na(min_years_of_experience), !is.na(sector_grouped))))
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title = element_text(size = 11, face = "bold"),
    legend.position = "none"
  )

Inferential statistics

The next step is to test if the observations are normally distributed with a Shapiro-Wilk test.

Normality test

shapiro.test(cleaned_sectors_less$min_years_of_experience)
## 
##  Shapiro-Wilk normality test
## 
## data:  cleaned_sectors_less$min_years_of_experience
## W = 0.90014, p-value = 3.486e-05

Comment

I observe that the p-value is lower than 0.05, which means that the data is not normally distributed, and a Kruskal-Wallis test needs to be conducted in order to determine whether the sector is a variable that influences the years of experience.

kruskal.test(min_years_of_experience ~ sector_grouped, data = cleaned_sectors_less)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  min_years_of_experience by sector_grouped
## Kruskal-Wallis chi-squared = 3.4804, df = 5, p-value = 0.6264

Comment

The p-value is 0.6264, which is above 0.05, so I have to accept the null hypothesis. This means there is no statistically significant difference in years of experience required across sectors. The years of experience requested do not vary significantly depending on the sector. All sectors tend to require a similar level of experience in your dataset. The reasons for that might be; the sample size is small (personal job search), which reduces statistical power, and sustainability jobs may genuinely cluster around similar experience requirements regardless of sector.

4.2 Organisation Type & Years of Experience

Descriptive statistics

job_applications1 %>% 
  filter(!is.na(min_years_of_experience)) %>% 
  group_by(org_type) %>% 
  summarise(mean = mean(min_years_of_experience),
            st_var = sd(min_years_of_experience),
            min = min(min_years_of_experience),
            max = max(min_years_of_experience),
            median = median(min_years_of_experience),
            group_size = n()) %>% 
  ungroup()
## # A tibble: 9 × 7
##   org_type                            mean st_var   min   max median group_size
##   <chr>                              <dbl>  <dbl> <dbl> <dbl>  <dbl>      <int>
## 1 Academia                            4     1.41    3       5      4          2
## 2 Accounting profession and auditors  2     0       2       2      2          3
## 3 Company (listed)                    3.73  2.37    1      10      3         11
## 4 Company (unlisted)                  3.42  1.46    1       5      3         19
## 5 Government                          4     1.15    3       5      4          4
## 6 NGO / not for profit                3.44  1.60    0.5     6      3         17
## 7 Trade association                   1     0       1       1      1          2
## 8 intergovernmental organization      1.75  0.886   0       3      2          8
## 9 <NA>                                4.8   2.59    2       8      4          5

Comment

I observe that the the mean of years of experience is this time slightly difference across organisation types. Yet I also note that the group size for some organisation types is low (below 5 observations).

job_applications1 %>%
  filter(!is.na(min_years_of_experience), !is.na(org_type)) %>%
  ggplot(aes(x = org_type, y = min_years_of_experience, fill = org_type)) +
  geom_boxplot(alpha = 0.7, outlier.alpha = 0.5) +
  geom_jitter(width = 0.2, alpha = 0.3, size = 2) +
  scale_y_continuous(breaks = 0:10) +
  labs(
    title = "Years of Experience Required by Organisation Type",
    subtitle = "Distribution of minimum years of experience per organisation type",
    x = "Organisation Type",
    y = "Minimum Years of Experience",
    caption = paste0("n = ", nrow(filter(job_applications1, !is.na(min_years_of_experience), !is.na(org_type))))
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title = element_text(size = 11, face = "bold"),
    legend.position = "none"
  )

Inferential statistics

I already know that the years of experience are not normally distributed, so I can already use the Kruskal test to determine the whether the organisation type plays a role in the years of experience.

kruskal.test(min_years_of_experience ~ org_type, data = job_applications1)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  min_years_of_experience by org_type
## Kruskal-Wallis chi-squared = 18.292, df = 7, p-value = 0.01072

Comment

The p-value after the test is 0.02072, which is below the 0.05 threshold. This means I can reject the null hypothesis, because there is a statistically difference of the years of experience across the organisation types. Which might mean that the organisation type influences the number of required years, or that some organisation types consistently require more or less years of experience. The next step is to run a post-hoc pairwise Wilcoxon test with Bonferroni correction to identify which groups differ significantly.

pairwise.wilcox.test(job_applications1$min_years_of_experience,
                     job_applications1$org_type,
                     p.adjust.method = "bonferroni")
## 
##  Pairwise comparisons using Wilcoxon rank sum test with continuity correction 
## 
## data:  job_applications1$min_years_of_experience and job_applications1$org_type 
## 
##                                    Academia Accounting profession and auditors
## Accounting profession and auditors 1.00     -                                 
## Company (listed)                   1.00     1.00                              
## Company (unlisted)                 1.00     1.00                              
## Government                         1.00     1.00                              
## intergovernmental organization     1.00     1.00                              
## NGO / not for profit               1.00     1.00                              
## Trade association                  1.00     1.00                              
##                                    Company (listed) Company (unlisted)
## Accounting profession and auditors -                -                 
## Company (listed)                   -                -                 
## Company (unlisted)                 1.00             -                 
## Government                         1.00             1.00              
## intergovernmental organization     0.24             0.21              
## NGO / not for profit               1.00             1.00              
## Trade association                  1.00             1.00              
##                                    Government intergovernmental organization
## Accounting profession and auditors -          -                             
## Company (listed)                   -          -                             
## Company (unlisted)                 -          -                             
## Government                         -          -                             
## intergovernmental organization     0.28       -                             
## NGO / not for profit               1.00       0.35                          
## Trade association                  1.00       1.00                          
##                                    NGO / not for profit
## Accounting profession and auditors -                   
## Company (listed)                   -                   
## Company (unlisted)                 -                   
## Government                         -                   
## intergovernmental organization     -                   
## NGO / not for profit               -                   
## Trade association                  1.00                
## 
## P value adjustment method: bonferroni

Comment

A pairwise Wilcoxon test with Bonferroni correction was conducted. Due to ties in the data, approximate p-values were computed.

The Bonferroni correction is very conservative — it adjusts p-values upward to reduce false positives, which can make it hard to detect differences with a small dataset. Therefore I can try a less conservative method such as Benjamini-Hochberg in order to detect differences across organisation types.

pairwise.wilcox.test(job_applications1$min_years_of_experience,
                     job_applications1$org_type,
                     p.adjust.method = "BH")
## 
##  Pairwise comparisons using Wilcoxon rank sum test with continuity correction 
## 
## data:  job_applications1$min_years_of_experience and job_applications1$org_type 
## 
##                                    Academia Accounting profession and auditors
## Accounting profession and auditors 0.199    -                                 
## Company (listed)                   0.786    0.165                             
## Company (unlisted)                 0.799    0.199                             
## Government                         1.000    0.162                             
## intergovernmental organization     0.162    0.928                             
## NGO / not for profit               0.798    0.199                             
## Trade association                  0.363    0.199                             
##                                    Company (listed) Company (unlisted)
## Accounting profession and auditors -                -                 
## Company (listed)                   -                -                 
## Company (unlisted)                 0.958            -                 
## Government                         0.671            0.723             
## intergovernmental organization     0.088            0.088             
## NGO / not for profit               0.996            0.996             
## Trade association                  0.162            0.162             
##                                    Government intergovernmental organization
## Accounting profession and auditors -          -                             
## Company (listed)                   -          -                             
## Company (unlisted)                 -          -                             
## Government                         -          -                             
## intergovernmental organization     0.088      -                             
## NGO / not for profit               0.709      0.088                         
## Trade association                  0.199      0.356                         
##                                    NGO / not for profit
## Accounting profession and auditors -                   
## Company (listed)                   -                   
## Company (unlisted)                 -                   
## Government                         -                   
## intergovernmental organization     -                   
## NGO / not for profit               -                   
## Trade association                  0.162               
## 
## P value adjustment method: BH

Comment

Although the Kruskal-Wallis test indicated a global significant difference across organisation types (p = 0.011), pairwise comparisons with both Bonferroni and Benjamini-Hochberg corrections revealed no individually significant pairs. This likely reflects the limited sample size, which reduces the statistical power needed to detect differences at the pairwise level.

4.3 Reply Time & Sector

New columns: reply time (days) and reply status

In this next part I want to compute the reply time for each application. First, I have to exclude the applications with no application date or no reply date. I compute these values in the new column called “days to reply”, by substracting the date of reply minus the date of application, and turning the value into a numeric value. In the table below I create a column called “replied”, to later be able to compute the percentage of replies I received.

job_applications_clean <- cleaned_sectors_less %>%
  mutate(
    replied = !is.na(date_of_reply),
    days_to_reply = as.numeric(date_of_reply - date_of_application)
  )
job_applications_clean %>% 
  count(replied) %>%
  mutate(percentage = n / sum(n) * 100)
## # A tibble: 2 × 3
##   replied     n percentage
##   <lgl>   <int>      <dbl>
## 1 FALSE      87       51.2
## 2 TRUE       83       48.8
job_applications_clean %>%
  count(replied) %>%
  mutate(percentage = round(n / sum(n) * 100, 1)) %>%
  ggplot(aes(x = replied, y = percentage, fill = replied)) +
  geom_col(alpha = 0.8) +
  geom_text(aes(label = paste0(percentage, "%")), vjust = -0.5) +
  theme_minimal()+
  labs(
    title = "Percentage of applications with replies and without",
    subtitle = "In red applications without response, in blue replied applications",
    x = "Application replied",
    y = "Percentage")

Comment

Out of 170 applications, 48.8% received a reply, which is above the typical industry response rate of ~25%.

Descriptive statistics

Descriptive statistics for the number of days to reply

job_applications_clean %>% 
  filter(days_to_reply > 0) %>% 
  summarise(mean = mean(days_to_reply),
            min = min(days_to_reply),
            max = max(days_to_reply),
            group_size = n(),
            median = median(days_to_reply),
            sd_var = sd(days_to_reply))
## # A tibble: 1 × 6
##    mean   min   max group_size median sd_var
##   <dbl> <dbl> <dbl>      <int>  <dbl>  <dbl>
## 1  29.8     1   174         78     21   29.7

Comment

In the above table, I notice that the average number of days to reply is close to 30 days, with a minimum of 1 day and a maximum of 174 days (almost 6 months).

job_applications_clean %>%
  filter(days_to_reply > 0) %>%
  ggplot(aes(x = days_to_reply)) +
  geom_histogram(binwidth = 5, fill = "#3498db", alpha = 0.7, color = "white") +
  geom_vline(aes(xintercept = median(days_to_reply)), 
             color = "#e74c3c", linetype = "dashed", linewidth = 0.8) +
  annotate("text", x = median(job_applications_clean$days_to_reply, na.rm = TRUE) + 3, 
           y = Inf, vjust = 2, label = "Median", color = "#e74c3c", size = 3.5) +
  labs(
    title = "Distribution of Reply Time",
    subtitle = "Number of days between application and reply",
    x = "Days to Reply",
    y = "Count",
    caption = paste0("n = ", nrow(filter(job_applications_clean, days_to_reply > 0)))
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
    axis.title = element_text(size = 11, face = "bold")
  )

Comment

The histogram shows that the majority of replies were received within the first 25 days, with a right-skewed distribution. A small number of applications took over 100 days to receive a reply, which inflates the mean relative to the median.

Distribution plot

job_applications_clean %>%
  filter(days_to_reply > 0) %>%
  ggplot(aes(x = "", y = days_to_reply)) +
  geom_violin(fill = "#3498db", alpha = 0.4, linewidth = 0.3, trim = FALSE) +
  geom_boxplot(width = 0.2, fill = "#3498db", alpha = 0.7, outlier.alpha = 0.5) +
  scale_y_continuous(breaks = seq(0, 180, by = 20)) +
  labs(
    title = "Distribution of number of Days to Reply",
    subtitle = "Across all replied applications",
    x = "All applications",
    y = "Number of days to reply",
    caption = paste0("n = ", nrow(filter(job_applications_clean, days_to_reply > 0)))
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
    axis.title = element_text(size = 11, face = "bold"),
    axis.text = element_text(size = 10)
  )

Comment

The above graph shows that the distribution of days to reply is mostly concentrated below 25 days.

Descriptive statistics

Let’s examine the mean, minimum or maximum values of the days to reply based on the sectors.

job_applications_clean_mean <- job_applications_clean %>% 
  filter(days_to_reply > 0) %>% 
  group_by(sector_grouped) %>% 
  summarise(mean = mean(days_to_reply),
            st_var = sd(days_to_reply),
            min = min(days_to_reply),
            max = max(days_to_reply),
            median = median(days_to_reply),
            group_size = n()) %>% 
  ungroup() 
  
job_applications_clean_mean %>% 
    kable(
    caption = "Reply Time (days) by Sector",
    col.names = c("Sector", "Mean", "Std Dev", "Min", "Max", "Median", "N"),
    align = c("l", rep("c", 6))
  ) %>% 
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = FALSE
  )
Reply Time (days) by Sector
Sector Mean Std Dev Min Max Median N
Consulting & Services 28.09524 33.50508 1 119 18 21
Finance & Business 34.58333 19.77353 2 60 34 12
Industry & Manufacturing 25.34783 23.25473 2 85 20 23
Knowledge & Education 28.20000 21.08791 10 62 19 5
Public & Social Sector 33.54545 48.15467 1 174 26 11
Sustainability & Environment 43.40000 22.73324 11 72 40 5
NA 9.00000 NA 9 9 9 1

Comment

The group size per sector varies from 23 (Industry & Manufacturing) to 5 (Sustainability & Environment) which influences the mean. The mean of days to reply is 43 days for the Sustainability & Environment sector. The mean of number of days to replay is 25 for the Industry & Manufacturing sector, which is the lowest mean but also with the smallest group size.

Visualization

job_applications_clean %>%
  filter(days_to_reply > 0, !is.na(sector_grouped)) %>%
  ggplot(aes(x = sector_grouped, y = days_to_reply, fill = sector_grouped)) +
  geom_boxplot(alpha = 0.7, outlier.alpha = 0.5) +
  geom_jitter(width = 0.2, alpha = 0.3, size = 2) +
  labs(
    title = "Reply Time by Sector",
    subtitle = "Distribution of days to reply per sector",
    x = "Sector",
    y = "Days to Reply",
    caption = paste0("n = ", nrow(filter(job_applications_clean, days_to_reply > 0, !is.na(sector_grouped))))
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title = element_text(size = 11, face = "bold"),
    legend.position = "none"
  )

Comment

The Public & Social Sector has one of the maximal values for the days to reply. On average the Consulting & Services sector has the lowest number of days to reply.

Inferential statistics

In this chapter I want to test the relationship between the number of days to reply and other variables. But first, I have to test if it is normally distributed with a Shapiro-Wilk test.

shapiro.test(job_applications_clean$days_to_reply[!is.na(job_applications_clean$days_to_reply) & job_applications_clean$days_to_reply > 0])
## 
##  Shapiro-Wilk normality test
## 
## data:  job_applications_clean$days_to_reply[!is.na(job_applications_clean$days_to_reply) & job_applications_clean$days_to_reply > 0]
## W = 0.79231, p-value = 4.001e-09

Comment

The result of the test is

4.001e-09

which is below 0.05. It shows the variable of days to reply is not normally distributed across the dataset. This might be because the sample is small. Since I still want to test the relationship between the number of days to reply and the sectors, I use the Kruskal-Wallis test.

kruskal.test(days_to_reply ~ sector_grouped, data = job_applications_clean)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  days_to_reply by sector_grouped
## Kruskal-Wallis chi-squared = 8.4901, df = 5, p-value = 0.1312

Comment

The p-value is 0.1312, which is higher than 0.05 which means that I have to accept the null Hypothesis. This means there is no statistically significant difference in the number of days to reply across sectors. All sectors seem to have on average a similar number of days to reply.

4.4 Reply Time & Organisation Type

Descriptive statistics

Below I want to test the relationship between the number of days to reply and the organization type. The first step is to observe how the mean, minimum and maximum values behave depending on the organization type.

job_applications_clean_mean2 <- job_applications_clean %>% 
  filter(days_to_reply > 0) %>% 
  group_by(org_type) %>% 
  summarise(mean = mean(days_to_reply),
            st_var = sd(days_to_reply),
            min = min(days_to_reply),
            max = max(days_to_reply),
            median = median(days_to_reply),
            group_size = n()) %>% 
  ungroup()

job_applications_clean_mean2 %>% 
    kable(
    caption = "Reply Time (days) by Organization type",
    col.names = c("Org Type", "Mean", "Std Dev", "Min", "Max", "Median", "N"),
    align = c("l", rep("c", 6))
  ) %>% 
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = FALSE
  )
Reply Time (days) by Organization type
Org Type Mean Std Dev Min Max Median N
Academia 27.00000 11.31371 19 35 27.0 2
Accounting profession and auditors 27.87500 36.27450 1 107 15.0 8
Company (listed) 19.47059 18.85510 2 60 14.0 17
Company (unlisted) 27.68000 22.98862 2 85 21.0 25
Government 26.00000 13.28533 4 37 27.0 5
NGO / not for profit 34.40000 30.33809 1 119 26.0 15
Trade association 26.50000 17.67767 14 39 26.5 2
intergovernmental organization 83.66667 81.68435 15 174 62.0 3
NA 74.00000 NA 74 74 74.0 1

Comment

The most common organization is the company (unlisted) type followed by the NGO.

Visualization

job_applications_clean %>%
  filter(days_to_reply > 0, !is.na(org_type)) %>%
  ggplot(aes(x = org_type, y = days_to_reply, fill = org_type)) +
  geom_boxplot(alpha = 0.7, outlier.alpha = 0.5) +
  geom_jitter(width = 0.2, alpha = 0.3, size = 2) +
  labs(
    title = "Reply Time by Organisation Type",
    subtitle = "Distribution of days to reply per organisation type",
    x = "Organisation Type",
    y = "Days to Reply",
    caption = paste0("n = ", nrow(filter(job_applications_clean, days_to_reply > 0, !is.na(org_type))))
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title = element_text(size = 11, face = "bold"),
    legend.position = "none"
  )

Comment

The plot shows considerable variation in reply time across organisation types. Some organisation types such as intergovernmental organizations show a wider spread, indicating less consistency in their reply times, while others such as the Academia or Trade associations tend to reply more quickly and consistently.The organization type that is the longest to reply is the intergovernmental organization, with a median value higher tan 50 days. Whereas the median values of the remaining organization types are below 50 days. The organization types with the lowest days to reply are the Accounting profession and auditors and the listed companies.

4.5 Reply Time & Application Month

New columns: application month and reply month

job_applications4 <- job_applications_clean %>% 
  separate(month, into = c("month_name", "year"), sep = " ")


job_applications4 %>% select(-name, -website, -link_to_the_offer, -position, -sector, -status) %>% 
    kable(
    caption = "Applications") %>% 
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = FALSE
  )%>%
  scroll_box(height = "400px")
Applications
month_name year date_of_application date_of_reply location org_type min_years_of_experience sector_lowercase sector_clean sector_grouped replied days_to_reply
May 2024 2024-05-05 NA Zurich Company (listed) NA automation automation Consulting & Services FALSE NA
May 2025 2025-05-05 2025-07-18 NA NA NA recruitment recruitment Consulting & Services TRUE 74
October 2024 2024-10-21 NA Geneva NA NA recruitment recruitment Consulting & Services FALSE NA
June 2025 NA NA Geneva NA NA recruitment recruitment Consulting & Services FALSE NA
August 2024 2024-08-19 2024-09-13 Geneva Company (unlisted) 5.0 aviation aviation Industry & Manufacturing TRUE 25
August 2025 2025-08-29 NA Geneva Company (unlisted) NA recruitment recruitment Consulting & Services FALSE NA
December 2024 2024-12-17 NA Geneva NA NA recruitment recruitment Consulting & Services FALSE NA
June 2024 2024-06-25 NA Geneva NGO / not for profit NA culture culture Public & Social Sector FALSE NA
April 2025 NA NA NA Company (listed) NA insurance insurance Finance & Business FALSE NA
NA NA NA NA Olten Company (unlisted) NA energy energy Sustainability & Environment FALSE NA
August 2025 2025-08-29 NA Geneva Company (unlisted) NA consulting consulting Consulting & Services FALSE NA
August 2025 2025-08-29 NA Geneva Company (unlisted) NA consulting consulting Consulting & Services FALSE NA
August 2025 2025-07-29 2025-08-06 Geneva Company (unlisted) 3.0 consulting consulting Consulting & Services TRUE 8
May 2024 2024-05-22 NA Lausanne NGO / not for profit 3.0 certification certification Consulting & Services FALSE NA
August 2025 2025-07-30 NA Geneva NGO / not for profit 5.0 certification certification Consulting & Services FALSE NA
January 2025 2025-01-18 NA NA Company (unlisted) NA recruitment recruitment Consulting & Services FALSE NA
February 2025 2025-02-12 2025-04-02 Geneva NGO / not for profit NA commodities commodities Industry & Manufacturing TRUE 49
May 2024 2024-05-22 NA Geneva NGO / not for profit NA commodities commodities Industry & Manufacturing FALSE NA
December 2024 2024-12-06 NA NA Company (unlisted) NA consulting consulting Consulting & Services FALSE NA
November 2024 2024-11-05 2024-11-12 Basel Company (unlisted) 3.0 garment garment Industry & Manufacturing TRUE 7
June 2025 2025-06-29 2025-07-18 Fribourg Company (unlisted) NA manufacture manufacturing Industry & Manufacturing TRUE 19
July 2025 2025-07-26 2025-08-21 Geneva NGO / not for profit 2.0 finance finance Finance & Business TRUE 26
November 2024 2024-11-28 2024-12-03 Meyrin Company (listed) 2.0 commodities commodities Industry & Manufacturing TRUE 5
July 2024 2024-07-02 2024-07-17 Geneva intergovernmental organization 1.0 science science Knowledge & Education TRUE 15
October 2024 2024-10-10 2024-12-11 Geneva intergovernmental organization 2.0 science science Knowledge & Education TRUE 62
February 2025 2025-02-20 NA Geneva intergovernmental organization 0.0 science science Knowledge & Education FALSE NA
NA NA NA NA NA intergovernmental organization NA science science Knowledge & Education FALSE NA
NA NA NA NA NA Company (unlisted) NA supply chain supply chain Consulting & Services FALSE NA
October 2024 2024-10-24 2024-11-14 Geneva Company (unlisted) NA automation automation Consulting & Services TRUE 21
December 2024 2024-12-02 2024-12-16 Pratteln Company (listed) 5.0 chemicals chemicals Industry & Manufacturing TRUE 14
November 2024 2024-11-20 NA Geneva NA 8.0 recruitment recruitment Consulting & Services FALSE NA
March 2025 2025-03-25 NA Geneva NGO / not for profit 3.0 biodiversity biodiversity Sustainability & Environment FALSE NA
NA NA NA NA Zurich Accounting profession and auditors NA consulting consulting Consulting & Services FALSE NA
March 2025 2025-03-21 2025-03-25 Zurich Accounting profession and auditors NA consulting consulting Consulting & Services TRUE 4
June 2025 2025-06-19 2025-06-19 Geneva Accounting profession and auditors NA consulting consulting Consulting & Services TRUE 0
July 2024 2024-07-29 2024-08-14 Geneva Company (unlisted) 4.0 consulting consulting Consulting & Services TRUE 16
June 2025 2025-06-29 2025-07-04 Lausanne Company (unlisted) NA consulting consulting Consulting & Services TRUE 5
April 2025 2025-04-11 NA Geneva NGO / not for profit 5.0 biodiversity biodiversity Sustainability & Environment FALSE NA
March 2025 2025-03-25 2025-04-29 Renens Academia 5.0 academia academia Knowledge & Education TRUE 35
June 2025 2025-06-20 2025-07-09 Lausanne Academia NA academia academia Knowledge & Education TRUE 19
August 2025 2025-08-19 2025-08-20 NA NGO / not for profit NA human rights human rights Public & Social Sector TRUE 1
February 2025 2025-02-04 NA Geneva Company (unlisted) 5.0 consulting consulting Consulting & Services FALSE NA
October 2024 2024-10-14 2024-11-12 Geneva Company (unlisted) NA finance finance Finance & Business TRUE 29
September 2024 2024-09-20 2024-01-24 Geneva Accounting profession and auditors NA consulting consulting Consulting & Services TRUE -240
January 2025 2025-01-20 2025-01-30 Geneva Accounting profession and auditors NA consulting consulting Consulting & Services TRUE 10
April 2025 2025-04-28 2025-08-13 Geneva Accounting profession and auditors NA consulting consulting Consulting & Services TRUE 107
February 2025 2025-02-21 NA Geneva NGO / not for profit NA automobile automobile Industry & Manufacturing FALSE NA
April 2025 2025-04-03 2025-04-07 Geneva Company (listed) 1.0 chemicals chemicals Industry & Manufacturing TRUE 4
June 2024 2024-06-09 NA NA NA NA recruitment recruitment Consulting & Services FALSE NA
July 2024 2024-07-02 2024-07-04 Geneva Company (listed) 3.0 chemicals chemicals Industry & Manufacturing TRUE 2
July 2024 2024-07-27 2024-08-17 Geneva Company (listed) 3.0 chemicals chemicals Industry & Manufacturing TRUE 21
August 2024 2024-08-07 NA Geneva NGO / not for profit NA finance finance Finance & Business FALSE NA
July 2024 2024-07-27 2024-08-14 Remote NGO / not for profit 3.0 standard-setter standard-setter Consulting & Services TRUE 18
NA NA NA NA NA NGO / not for profit NA standard-setter standard-setter Consulting & Services FALSE NA
May 2024 2024-05-05 2024-06-30 Geneva Company (listed) 3.0 water water Sustainability & Environment TRUE 56
NA NA NA NA NA NGO / not for profit NA human rights human rights Public & Social Sector FALSE NA
November 2024 2024-11-02 2024-11-06 Basel Company (listed) NA insurance insurance Finance & Business TRUE 4
July 2025 2025-07-26 NA NA NA NA recruitment recruitment Consulting & Services FALSE NA
November 2024 2024-11-20 2024-12-16 Pfäfikon Company (listed) 3.0 automation automation Consulting & Services TRUE 26
June 2024 2024-06-18 NA Geneva Trade association NA aviation aviation Industry & Manufacturing FALSE NA
December 2024 2024-12-25 2025-01-08 Geneva Trade association 1.0 aviation aviation Industry & Manufacturing TRUE 14
May 2025 2025-05-08 2025-06-16 Geneva Trade association 1.0 aviation aviation Industry & Manufacturing TRUE 39
July 2025 2025-07-15 2025-08-04 Geneva Company (unlisted) NA furniture furniture Industry & Manufacturing TRUE 20
NA NA NA NA NA Company (listed) NA real estate real estate Finance & Business FALSE NA
April 2025 2025-04-14 2025-04-16 Geneva Company (unlisted) NA finance finance Finance & Business TRUE 2
July 2024 2024-07-27 2024-08-11 Lausanne NGO / not for profit NA sport sport Public & Social Sector TRUE 15
February 2025 2025-02-04 2025-06-03 Geneva NGO / not for profit 2.0 standards-setter standard-setter Consulting & Services TRUE 119
August 2024 2024-08-29 2024-10-08 Gland NGO / not for profit 0.5 biodiversity biodiversity Sustainability & Environment TRUE 40
January 2025 2025-01-22 2025-04-04 Gland NGO / not for profit 5.0 biodiversity biodiversity Sustainability & Environment TRUE 72
March 2024 2024-03-14 2024-03-20 Geneva Accounting profession and auditors 2.0 consulting consulting Consulting & Services TRUE 6
May 2025 2025-05-08 2025-05-18 Geneva Company (listed) NA chemicals chemicals Industry & Manufacturing TRUE 10
October 2024 2024-10-18 NA Geneva NA 7.0 recruitment recruitment Consulting & Services FALSE NA
April 2025 NA NA NA NGO / not for profit NA certification certification Consulting & Services FALSE NA
May 2024 2024-05-10 NA Geneva NA 3.0 recruitment recruitment Consulting & Services FALSE NA
September 2024 2024-09-27 2024-11-04 Geneva Company (unlisted) 1.0 shipping shipping Industry & Manufacturing TRUE 38
December 2024 2024-12-11 NA Geneva Company (unlisted) NA shipping shipping Industry & Manufacturing FALSE NA
February 2025 2025-02-21 2025-02-21 Geneva Company (unlisted) NA shipping shipping Industry & Manufacturing TRUE 0
June 2024 2024-06-23 NA Remote NGO / not for profit NA sustainability sustainability Sustainability & Environment FALSE NA
October 2024 2024-10-03 NA Remote NGO / not for profit NA sustainable finance sustainable finance Finance & Business FALSE NA
September 2024 2024-09-22 2024-09-25 La Tour de Peliz Company (listed) NA beverage beverage Industry & Manufacturing TRUE 3
September 2024 2024-09-22 2024-09-22 Vevey Company (listed) NA beverage beverage Industry & Manufacturing TRUE 0
NA NA NA NA NA Company (listed) NA beverage beverage Industry & Manufacturing FALSE NA
May 2025 2025-05-02 2025-05-27 Vevey Company (listed) NA beverage beverage Industry & Manufacturing TRUE 25
August 2025 2025-08-29 NA Geneva Company (unlisted) NA manufacturing manufacturing Industry & Manufacturing FALSE NA
NA NA NA NA NA Company (unlisted) NA NA NA NA FALSE NA
September 2024 2024-09-21 2024-09-23 Geneva Company (listed) 5.0 chemicals chemicals Industry & Manufacturing TRUE 2
April 2025 2025-04-11 2025-05-07 Bern Government NA government government Public & Social Sector TRUE 26
June 2025 2025-06-20 2025-07-29 Geneva Company (unlisted) 2.0 banking banking Finance & Business TRUE 39
March 2025 2025-03-25 NA Geneva NGO / not for profit NA certification certification Consulting & Services FALSE NA
July 2025 2025-07-16 2025-07-21 Lausanne Company (unlisted) NA it it Consulting & Services TRUE 5
April 2024 2024-04-29 NA Geneva Company (unlisted) 5.0 banking banking Finance & Business FALSE NA
June 2024 2024-06-18 2024-08-13 Geneva Company (unlisted) 3.0 banking banking Finance & Business TRUE 56
August 2024 2024-08-07 2024-10-03 Geneva Company (unlisted) NA banking banking Finance & Business TRUE 57
NA NA NA NA Geneva Company (unlisted) NA banking banking Finance & Business FALSE NA
NA NA NA NA Lausanne Company (listed) NA NA NA NA FALSE NA
April 2025 2025-04-11 NA Geneva NGO / not for profit 5.0 environment environment Sustainability & Environment FALSE NA
January 2025 2025-01-03 2025-01-13 Lausanne NGO / not for profit NA media media Knowledge & Education TRUE 10
September 2024 2024-09-10 2024-09-11 Geneva Accounting profession and auditors NA consulting consulting Consulting & Services TRUE 1
October 2024 2024-10-03 2024-10-23 Geneva Accounting profession and auditors 2.0 consulting consulting Consulting & Services TRUE 20
December 2024 2024-12-19 2025-02-12 Geneva Accounting profession and auditors NA consulting consulting Consulting & Services TRUE 55
December 2024 2024-12-25 2025-01-14 Geneva Accounting profession and auditors 2.0 consulting consulting Consulting & Services TRUE 20
May 2024 2024-05-30 NA Lausanne Company (unlisted) 3.0 consulting consulting Consulting & Services FALSE NA
July 2024 2024-07-04 2024-07-19 Lausanne Company (unlisted) 5.0 consulting consulting Consulting & Services TRUE 15
February 2025 2025-02-21 2025-04-02 Lausanne Company (unlisted) 3.0 consulting consulting Consulting & Services TRUE 40
November 2024 2024-11-12 2024-11-12 Geneva Company (listed) 3.0 manufacture manufacturing Industry & Manufacturing TRUE 0
March 2025 2025-03-08 2025-04-25 Geneva Company (listed) NA manufacture manufacturing Industry & Manufacturing TRUE 48
June 2025 2025-06-20 NA Geneva Company (listed) 3.0 manufacture manufacturing Industry & Manufacturing FALSE NA
October 2024 2024-10-18 NA Geneva NA NA recruitment recruitment Consulting & Services FALSE NA
January 2025 2025-01-17 NA Geneva NA 4.0 recruitment recruitment Consulting & Services FALSE NA
June 2025 2025-06-27 NA Geneva NA NA recruitment recruitment Consulting & Services FALSE NA
November 2024 2024-11-05 2024-11-14 Ebikon Company (listed) 10.0 pharmaceutical pharmaceutical NA TRUE 9
August 2024 2024-08-19 2024-11-11 Geneva Company (unlisted) 5.0 manufacture/watchmaking manufacturing Industry & Manufacturing TRUE 84
December 2024 2024-12-19 2025-01-17 Geneva Company (unlisted) NA manufacture manufacturing Industry & Manufacturing TRUE 29
May 2025 2025-05-05 2025-05-21 Geneva Company (unlisted) 3.0 manufacture manufacturing Industry & Manufacturing TRUE 16
May 2025 2025-05-28 2025-08-21 Geneva Company (unlisted) 5.0 manufacture manufacturing Industry & Manufacturing TRUE 85
January 2025 2025-01-22 2025-01-24 Nyon Company (unlisted) NA consulting consulting Consulting & Services TRUE 2
November 2024 2024-11-07 2024-11-25 Basel Company (listed) NA automation automation Consulting & Services TRUE 18
July 2025 2025-07-21 NA Zurich Company (listed) NA finance finance Finance & Business FALSE NA
August 2025 2025-08-29 NA Zug Company (listed) NA consulting consulting Consulting & Services FALSE NA
May 2025 2025-05-28 NA NA NA 2.0 recruitment recruitment Consulting & Services FALSE NA
NA NA NA NA Lausanne NGO / not for profit NA energy energy Sustainability & Environment FALSE NA
May 2024 2024-05-10 NA Basel NGO / not for profit 6.0 recruitment recruitment Consulting & Services FALSE NA
August 2025 2025-07-29 2025-08-07 Bern NGO / not for profit NA human rights human rights Public & Social Sector TRUE 9
October 2024 2024-10-03 2024-10-30 Basel NGO / not for profit NA human rights human rights Public & Social Sector TRUE 27
April 2025 2025-04-04 2025-06-03 Gland Company (listed) NA banking banking Finance & Business TRUE 60
December 2024 2024-12-12 NA Nyon Company (unlisted) NA consulting consulting Consulting & Services FALSE NA
February 2025 2025-02-12 NA Geneva Company (unlisted) NA manufacture manufacturing Industry & Manufacturing FALSE NA
June 2025 2025-06-17 NA Geneva NGO / not for profit 2.0 biodiversity biodiversity Sustainability & Environment FALSE NA
April 2024 2024-04-20 NA Geneva Company (unlisted) 3.0 shipping shipping Industry & Manufacturing FALSE NA
July 2025 2025-07-06 NA Remote NGO / not for profit NA standard-setter standard-setter Consulting & Services FALSE NA
November 2024 2024-11-25 NA Geneva Company (unlisted) NA commodities commodities Industry & Manufacturing FALSE NA
December 2024 2024-12-11 2025-01-05 Zurich Company (unlisted) 1.0 banking banking Finance & Business TRUE 25
February 2025 2025-02-12 2025-02-25 Nyon NGO / not for profit NA sport sport Public & Social Sector TRUE 13
September 2024 2024-09-22 NA Geneva Government 3.0 diplomacy diplomacy Public & Social Sector FALSE NA
March 2025 2025-03-30 NA Remote intergovernmental organization 2.0 human rights human rights Public & Social Sector FALSE NA
NA NA NA NA Geneva Academia 3.0 academia academia Knowledge & Education FALSE NA
NA NA NA NA NA intergovernmental organization NA human rights human rights Public & Social Sector FALSE NA
March 2025 2025-03-21 2025-09-11 Remote intergovernmental organization 2.0 human rights human rights Public & Social Sector TRUE 174
March 2025 2025-03-02 NA Remote NGO / not for profit 5.0 education education Knowledge & Education FALSE NA
May 2024 2024-05-14 NA Geneva Company (unlisted) 5.0 manufacture/watchmaking manufacturing Industry & Manufacturing FALSE NA
June 2025 2025-06-17 2025-07-11 Stabio Company (listed) NA garnment garment Industry & Manufacturing TRUE 24
January 2025 2025-01-03 2025-01-30 Geneva Government NA government government Public & Social Sector TRUE 27
February 2025 2025-02-04 2025-03-12 Geneva Government NA government government Public & Social Sector TRUE 36
July 2025 2025-07-15 2025-08-21 Geneva Government 5.0 government government Public & Social Sector TRUE 37
July 2025 2025-07-04 2025-07-08 Geneva Government 5.0 government government Public & Social Sector TRUE 4
March 2025 2025-03-09 NA Geneva Government 3.0 government government Public & Social Sector FALSE NA
March 2024 2024-03-07 2024-03-18 Geneva Company (unlisted) 1.0 energy energy Sustainability & Environment TRUE 11
June 2024 2024-06-09 NA Geneva Company (unlisted) NA energy energy Sustainability & Environment FALSE NA
February 2025 2025-02-07 2025-03-17 Geneva Company (unlisted) NA energy energy Sustainability & Environment TRUE 38
January 2025 2025-01-10 NA Geneva Company (unlisted) NA commodities commodities Industry & Manufacturing FALSE NA
June 2024 2024-06-18 2024-07-28 Geneva NGO / not for profit NA finance finance Finance & Business TRUE 40
October 2024 2024-10-10 2024-11-03 Geneva NGO / not for profit 5.0 finance finance Finance & Business TRUE 24
January 2025 2025-01-22 2025-03-16 Geneva NGO / not for profit 2.0 finance finance Finance & Business TRUE 53
September 2025 2025-09-12 NA Geneva NGO / not for profit 3.0 finance finance Finance & Business FALSE NA
August 2024 2024-08-07 NA Geneva intergovernmental organization NA environment environment Sustainability & Environment FALSE NA
August 2024 2024-08-14 NA Geneva intergovernmental organization NA environment environment Sustainability & Environment FALSE NA
August 2024 2024-08-19 NA Geneva intergovernmental organization 2.0 environment environment Sustainability & Environment FALSE NA
September 2024 2024-09-26 NA Geneva intergovernmental organization 3.0 environment environment Sustainability & Environment FALSE NA
November 2024 2024-11-01 NA Geneva intergovernmental organization 2.0 environment environment Sustainability & Environment FALSE NA
January 2025 2025-01-10 NA Geneva intergovernmental organization NA environment environment Sustainability & Environment FALSE NA
February 2025 2025-02-24 NA Geneva intergovernmental organization NA environment environment Sustainability & Environment FALSE NA
March 2025 2025-03-21 NA Geneva intergovernmental organization NA environment environment Sustainability & Environment FALSE NA
April 2025 2025-04-28 NA Geneva intergovernmental organization NA environment environment Sustainability & Environment FALSE NA
May 2025 2025-05-28 NA Geneva intergovernmental organization NA environment environment Sustainability & Environment FALSE NA
May 2025 2025-05-28 NA Geneva intergovernmental organization NA environment environment Sustainability & Environment FALSE NA
May 2025 2025-05-28 NA Geneva intergovernmental organization NA environment environment Sustainability & Environment FALSE NA
July 2025 2025-07-25 NA Geneva intergovernmental organization NA environment environment Sustainability & Environment FALSE NA
July 2025 2025-07-26 NA Geneva intergovernmental organization NA environment environment Sustainability & Environment FALSE NA
NA NA NA NA NA NA NA commodities commodities Industry & Manufacturing FALSE NA
August 2024 2024-08-29 NA Remote NGO / not for profit 2.0 environment environment Sustainability & Environment FALSE NA
job_applications4_months <- job_applications4 %>% 
  mutate(
    month_name = factor(
      month_name,
      levels = month.name,  # jan → dec
      ordered = TRUE
    )
  )
levels(job_applications4_months$month_name)
##  [1] "January"   "February"  "March"     "April"     "May"       "June"     
##  [7] "July"      "August"    "September" "October"   "November"  "December"
job_application6 <- job_applications4_months %>% 
  mutate(
    reply_yearmonth = floor_date(date_of_reply, unit = "month")  # 2024-04-01, 2025-04-01
  )

Descriptive statistics

job_applications_month_summary <- job_application6


job_applications_mean <- job_applications_month_summary %>% 
  filter(days_to_reply > 0) %>% 
  group_by(month_name) %>% 
  summarise(mean = mean(days_to_reply),
            st_var = sd(days_to_reply),
            min = min(days_to_reply),
            max = max(days_to_reply),
            median = median(days_to_reply),
            group_size = n()) %>% 
  ungroup()

job_applications_mean %>% 
     kable(
    caption = "Descriptive statistics of the reply time range based on the application month") %>% 
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = FALSE
  )
Descriptive statistics of the reply time range based on the application month
month_name mean st_var min max median group_size
January 29.00000 27.856777 2 72 18.5 6
February 49.16667 36.240401 13 119 39.0 6
March 46.33333 64.957422 4 174 23.0 6
April 39.80000 44.228950 2 107 26.0 5
May 43.57143 29.010671 10 85 39.0 7
June 28.85714 17.082433 5 56 24.0 7
July 16.16667 9.768533 2 37 15.5 12
August 32.00000 30.298515 1 84 25.0 7
September 11.00000 18.018509 1 38 2.5 4
October 30.50000 15.808226 20 62 25.5 6
November 11.50000 8.689074 4 26 8.0 6
December 26.16667 15.328622 14 55 22.5 6
ggplot(job_applications_mean, aes(x = month_name)) +
  geom_linerange(aes(ymin = min, ymax = max, color = month_name),
                 linewidth = 1) +
  geom_point(aes(y = mean, colour = month_name),
    size = 2) +
  labs(
    title = "Reply Time Range by Application Month",
    x = "Months",
    y = "Days to Reply",
    colour = "Months")+
  theme_minimal()+
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1))

Comment

I notice that the reply time range is th longest with applications from March, February or April, August is also a month during which the applications sent take longer to be replied to. This can start of the year can be the slowest and busiest for companies, with annual reports or other administrative tasks to manage, while the rest of the year can be a more quiet time.

The next step is to compute the number of replies per month of reply. I have to create a new column:

replies_per_month <- job_application6 %>%
    filter(!is.na(reply_yearmonth)) %>%        # only keep rows with a reply date
  filter(!is.na(date_of_reply)) %>%           # double check reply date exists
  group_by(reply_yearmonth) %>%
  summarise(n_replies =n()) %>% 
  ungroup()

Visualization

ggplot(replies_per_month, aes(x = reply_yearmonth, y = n_replies)) +
  geom_col(fill = "#3498db") +
  labs(title = "Number of Replies per Month", x = "Month-Year", y = "Replies") +
  theme_minimal() +
  scale_x_date(date_labels = "%b %Y", date_breaks = "1 month") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Comment

November 2024, January 2025, July 2025 and August 2025 were the months with the most replies. Although the number of application was of 10 per month.

Inferential statistics

I want to observe whether the number of replies is normally distributed across the dataset.

shapiro.test(replies_per_month$n_replies)
## 
##  Shapiro-Wilk normality test
## 
## data:  replies_per_month$n_replies
## W = 0.919, p-value = 0.1241

The p-value is higher than 0.05. The Shapiro test shows that the number of replies per month is normally distributed. The number of observations is 18 so it won’t be enough for an ANOVA test or a Kruskal-Wallis test.

replies_per_month <- replies_per_month %>%
  mutate(time_index = row_number())  # 1, 2, 3... 18

cor.test(replies_per_month$time_index, 
         replies_per_month$n_replies, 
         method = "spearman")
## 
##  Spearman's rank correlation rho
## 
## data:  replies_per_month$time_index and replies_per_month$n_replies
## S = 656.83, p-value = 0.1923
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.3221566

Comment

A Spearman correlation was used to test whether the number of replies changed over time. Given that each month contains only one aggregated observation, group comparison tests such as ANOVA or Kruskal-Wallis were not appropriate. The results show a weak positive correlation between time and number of replies (rho = 0.322, p = 0.192), suggesting a slight upward trend over the job search period, though this trend does not reach statistical significance. The number of replies per month therefore appears relatively stable across the observation period.

4.6 Reply Status & Sector

Descriptive statistics

I want to know if the response rate is different from sector to sector, therefore I create a new column with the percentage of response.

job_applications_month_summary %>%
  filter(!is.na(sector_grouped), !is.na(replied)) %>%
  group_by(sector_grouped, replied) %>%
  summarise(n = n()) %>%
  mutate(percentage = round(n / sum(n) * 100, 1)) %>% 
       kable(
    caption = "Descriptive statistics of the reply status based on the sector") %>% 
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = FALSE
  )
Descriptive statistics of the reply status based on the sector
sector_grouped replied n percentage
Consulting & Services FALSE 31 57.4
Consulting & Services TRUE 23 42.6
Finance & Business FALSE 8 40.0
Finance & Business TRUE 12 60.0
Industry & Manufacturing FALSE 13 33.3
Industry & Manufacturing TRUE 26 66.7
Knowledge & Education FALSE 4 44.4
Knowledge & Education TRUE 5 55.6
Public & Social Sector FALSE 6 35.3
Public & Social Sector TRUE 11 64.7
Sustainability & Environment FALSE 23 82.1
Sustainability & Environment TRUE 5 17.9

Visualization

job_applications_month_summary %>%
  filter(!is.na(sector_grouped), !is.na(replied)) %>%
  group_by(sector_grouped, replied) %>%
  summarise(n = n()) %>%
  mutate(percentage = round(n / sum(n) * 100, 1)) %>%
  ggplot(aes(x = sector_grouped, y = percentage, fill = replied)) +
  geom_col(position = "fill", alpha = 0.8) +
  scale_y_continuous(labels = scales::percent) +
  scale_fill_manual(values = c("#e74c3c", "#3498db"), 
                    labels = c("No reply", "Replied")) +
  labs(
    title = "Reply Status by Sector",
    subtitle = "Proportion of replied vs non-replied applications per sector",
    x = "Sector",
    y = "Percentage",
    fill = "Reply Status"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title = element_text(size = 11, face = "bold")
  )+
  geom_text(
    data = ~ filter(.x, replied == TRUE),                  # only label the replied segment
    aes(label = paste0(percentage, "%")),
    position = position_fill(vjust = 0.5),
    size = 3.5, color = "white", fontface = "bold"
  )

Comment

The response rate is the lowest for the Sustainability & Environment sector (17.9%), while it is the highest for the Industry and Manufacturing sector 66.7%.

Inferential

In this case I want to test the relationship between a logical variable and a categorical value.

Chi-square test

chisq.test(table(job_applications_month_summary$replied, job_applications_month_summary$sector_grouped))
## 
##  Pearson's Chi-squared test
## 
## data:  table(job_applications_month_summary$replied, job_applications_month_summary$sector_grouped)
## X-squared = 19.424, df = 5, p-value = 0.001602

Comment

The p-value from the Chi-square test is below the 0.05 threshold. Which means that there is a statistically significant difference between the groups of sectors.

fisher.test(table(job_applications_month_summary$replied, job_applications_month_summary$sector_grouped), 
            simulate.p.value = TRUE)
## 
##  Fisher's Exact Test for Count Data with simulated p-value (based on
##  2000 replicates)
## 
## data:  table(job_applications_month_summary$replied, job_applications_month_summary$sector_grouped)
## p-value = 0.001499
## alternative hypothesis: two.sided

Comment

The Fisher exact test confirms the Chi-square result, showing a statistically significant relationship between reply status and sector (p < 0.05). This result suggests that the sector of the organisation influences the likelihood of receiving a reply. Given the warning raised by the Chi-square test about small cell counts, the Fisher test result is the more reliable of the two.

4.7 Reply Status & Organisation Type

Descriptive statistics

job_applications_month_summary %>%
  filter(!is.na(org_type), !is.na(replied)) %>%
  group_by(org_type, replied) %>%
  summarise(n = n()) %>%
  mutate(percentage = round(n / sum(n) * 100, 1)) %>% 
       kable(
    caption = "Descriptive statistics of the reply time range based on the organization type") %>% 
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = FALSE
  )
Descriptive statistics of the reply time range based on the organization type
org_type replied n percentage
Academia FALSE 1 33.3
Academia TRUE 2 66.7
Accounting profession and auditors FALSE 1 9.1
Accounting profession and auditors TRUE 10 90.9
Company (listed) FALSE 8 29.6
Company (listed) TRUE 19 70.4
Company (unlisted) FALSE 21 44.7
Company (unlisted) TRUE 26 55.3
Government FALSE 2 28.6
Government TRUE 5 71.4
NGO / not for profit FALSE 22 59.5
NGO / not for profit TRUE 15 40.5
Trade association FALSE 1 33.3
Trade association TRUE 2 66.7
intergovernmental organization FALSE 18 85.7
intergovernmental organization TRUE 3 14.3

Visualization

job_applications_month_summary %>%
  filter(!is.na(org_type), !is.na(replied)) %>%
  group_by(org_type, replied) %>%
  summarise(n = n()) %>%
  mutate(percentage = round(n / sum(n) * 100, 1)) %>%
  ggplot(aes(x = org_type, y = percentage, fill = replied)) +
  geom_col(position = "fill", alpha = 0.8) +
  scale_y_continuous(labels = scales::percent) +
  scale_fill_manual(values = c("#e74c3c", "#3498db"), 
                    labels = c("No reply", "Replied")) +
  labs(
    title = "Reply Status by Organization Type",
    subtitle = "Proportion of replied vs non-replied applications per organization type",
    x = "Organization Type",
    y = "Percentage",
    fill = "Reply Status"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title = element_text(size = 11, face = "bold")
  )+
  geom_text(
    data = ~ filter(.x, replied == TRUE),                  # only label the replied segment
    aes(label = paste0(percentage, "%")),
    position = position_fill(vjust = 0.5),
    size = 3.5, color = "white", fontface = "bold"
  )

Comment

The organization type with the highest response rate is the Accounting profession and auditors with 90%, while the intergovernmental organizations have the lowest rate (14%). This result is aligned with the analysis of the time to reply to the applications.

Inferential

Fisher test

fisher.test(table(job_applications_month_summary$replied, job_applications_month_summary$org_type), 
            simulate.p.value = TRUE)
## 
##  Fisher's Exact Test for Count Data with simulated p-value (based on
##  2000 replicates)
## 
## data:  table(job_applications_month_summary$replied, job_applications_month_summary$org_type)
## p-value = 0.0009995
## alternative hypothesis: two.sided

Comment

Fisher’s exact test revealed a statistically significant relationship between reply status and organization type (p = 0.0005). This confirms that the type of organization significantly influences whether an application receives a reply. Despite the significant global result, the small group sizes for some organisation types mean these findings should be interpreted with caution.

4.8 Reply Status & Application Month

Descriptive statistics

I want to know the percentage of replies per application month.

job_applications_month_summary %>%
  filter(!is.na(month_name), !is.na(replied)) %>%
  group_by(month_name, replied) %>%
  summarise(n = n()) %>%
  mutate(percentage = round(n / sum(n) * 100, 1)) %>% 
       kable(
    caption = "Descriptive statistics of the reply status based on the application month") %>% 
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = FALSE
  )
Descriptive statistics of the reply status based on the application month
month_name replied n percentage
January FALSE 4 40.0
January TRUE 6 60.0
February FALSE 5 41.7
February TRUE 7 58.3
March FALSE 6 50.0
March TRUE 6 50.0
April FALSE 7 58.3
April TRUE 5 41.7
May FALSE 11 61.1
May TRUE 7 38.9
June FALSE 9 52.9
June TRUE 8 47.1
July FALSE 5 29.4
July TRUE 12 70.6
August FALSE 11 61.1
August TRUE 7 38.9
September FALSE 3 33.3
September TRUE 6 66.7
October FALSE 4 40.0
October TRUE 6 60.0
November FALSE 3 30.0
November TRUE 7 70.0
December FALSE 4 40.0
December TRUE 6 60.0

Visualization

job_applications_month_summary %>%
  filter(!is.na(month_name), !is.na(replied)) %>%
  group_by(month_name, replied) %>%
  summarise(n = n()) %>%
  mutate(percentage = round(n / sum(n) * 100, 1)) %>%
  ggplot(aes(x = month_name, y = percentage, fill = replied)) +
  geom_col(position = "fill", alpha = 0.8) +
  scale_y_continuous(labels = scales::percent) +
  scale_fill_manual(values = c("#e74c3c", "#3498db"), 
                    labels = c("No reply", "Replied")) +
  labs(
    title = "Reply Status by Application Month",
    subtitle = "Proportion of replied vs non-replied applications per month",
    x = "Application Month",
    y = "Percentage",
    fill = "Reply Status"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title = element_text(size = 11, face = "bold")
  )+
  geom_text(
    data = ~ filter(.x, replied == TRUE),                  # only label the replied segment
    aes(label = paste0(percentage, "%")),
    position = position_fill(vjust = 0.5),
    size = 2.5, color = "white", fontface = "bold"
  )

Inferential Analysis

Fisher test

The Fischer test helps me to test the relationship between the reply status and application month.

fisher.test(table(job_applications_month_summary$replied, job_applications_month_summary$month_name), 
            simulate.p.value = TRUE)
## 
##  Fisher's Exact Test for Count Data with simulated p-value (based on
##  2000 replicates)
## 
## data:  table(job_applications_month_summary$replied, job_applications_month_summary$month_name)
## p-value = 0.6967
## alternative hypothesis: two.sided

Comment

The p-value is 0.68, which is above the 0.05 threshold. This means that the difference is not statistically significant. And that the month might not have an influence on the response rate.

5. Cross Analysis

This new table has several columns. Some job descriptions included in the applications table didn’t have any of the listed acronyms and therefore were not discarded by using left_join. There are also more than one row per job description since I still want to account for the different acronyms per job description. There are also duplicates of course. And the “passed” don’t have a sector or more information on them.

# I have to keep the same column names to join both tables
job_applications2 <- job_applications_clean %>% 
  rename(company = name) %>% 
  rename(job_title = position)

job_app_acronyms <- job_applications2 %>% 
  full_join(number_of_jobs_w_acronym, by = c("job_title", "company"))

Missing data from the merged table “job_app_acronyms”

vis_miss(job_app_acronyms)

Comment

This newly merged table has many rows per job application/description. The table also includes job positions which were not applied to and therefore have no dates of application/reply, month of application, sector, location, organisation type, minimum year of experience required. There are some unknown companies because I was not able to retrieve the company names from the job descriptions when extracting the acronyms. In addition, some job positions are mentioned in the job_application table but are missing from the acronym table because they didn’t include any of the acronyms I selected. Finally, it is possible there are some mismatches, as the job title might be slightly different or because I haven’t applied to all of the job descriptions I saved. The contrary can be true as I might have not saved all of the job descriptions I applied to.

5.1 Number of Acronyms & Sector

Descriptive statistics

I first create a column that computes the number of acronyms per job.

job_app_acronyms1 <- job_app_acronyms %>% 
  group_by(job_title) %>% 
  mutate(acronyms_per_job = sum(count)) %>% 
  ungroup()

Next, I compute the minimum, maximum and mean values of the number of acronyms per job based on the sector.

job_app_acronyms1_mean <- job_app_acronyms1 %>%
  filter(!is.na(acronyms_per_job)) %>% 
  group_by(sector_grouped) %>% 
  summarise(mean = mean(acronyms_per_job),
            st_var = sd(acronyms_per_job),
            min = min(acronyms_per_job),
            max = max(acronyms_per_job),
            median = median(acronyms_per_job),
            group_size = n()) %>% 
  ungroup()

job_app_acronyms1_mean %>% 
       kable(
    caption = "Descriptive statistics of the number of acronyms per job based on the sector") %>% 
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = FALSE
  )
Descriptive statistics of the number of acronyms per job based on the sector
sector_grouped mean st_var min max median group_size
Consulting & Services 22.337349 17.511862 1 44 16 83
Finance & Business 22.415385 8.442623 4 36 27 65
Industry & Manufacturing 10.607143 7.236151 1 36 10 84
Knowledge & Education 6.466667 2.825058 3 9 9 15
Public & Social Sector 10.058824 6.628260 1 16 16 17
Sustainability & Environment 12.185185 13.399303 1 36 5 27
NA 14.428571 13.766702 1 44 10 56

Comment

The table shows variation in the mean number of acronyms per job across sectors. The Consulting & Service sector has the highest average number of acronyms per job description, suggesting more technical language is used in those postings. The Knowledge and Education sector has the lowest average, which may reflect less standardised reporting requirements.

Inferential statistics

I first need to examine whether the variable of the number of acronyms per job is normally distributed.

shapiro.test(job_app_acronyms1$acronyms_per_job)
## 
##  Shapiro-Wilk normality test
## 
## data:  job_app_acronyms1$acronyms_per_job
## W = 0.85644, p-value < 2.2e-16

Comment

The p-value is below the 0.05 threshold, which means that the data is not normally distributed, therefore a Kruskal-Wallis test is used to investigate the relationship between the number of acronym per job and the sector.

kruskal.test(acronyms_per_job ~ sector_grouped, data = job_app_acronyms1)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  acronyms_per_job by sector_grouped
## Kruskal-Wallis chi-squared = 59.632, df = 5, p-value = 1.448e-11

Comment

The Kruskal-Wallis test result shows that the p-value is below 0.05. This means that the difference in the number of acronyms per job across sectors is significant. There is a statistically significant difference in the number of acronyms per kob across sectors. This suggests that sector influences the technical complexity of job descriptions. A pairwise Wilcoxon test will therefore be conducted to identify which sectors differ. As observed in section 4.2, if no individual pairs reach significance, this likely reflects the limited sample size reducing statistical power rather than a true absence of difference.

pairwise.wilcox.test(job_app_acronyms1$acronyms_per_job,
                     job_app_acronyms1$sector_grouped,
                     p.adjust.method = "bonferroni")
## 
##  Pairwise comparisons using Wilcoxon rank sum test with continuity correction 
## 
## data:  job_app_acronyms1$acronyms_per_job and job_app_acronyms1$sector_grouped 
## 
##                              Consulting & Services Finance & Business
## Finance & Business           1.000                 -                 
## Industry & Manufacturing     0.022                 3.4e-13           
## Knowledge & Education        0.149                 2.4e-06           
## Public & Social Sector       0.438                 1.4e-05           
## Sustainability & Environment 0.020                 0.006             
##                              Industry & Manufacturing Knowledge & Education
## Finance & Business           -                        -                    
## Industry & Manufacturing     -                        -                    
## Knowledge & Education        0.200                    -                    
## Public & Social Sector       1.000                    1.000                
## Sustainability & Environment 1.000                    1.000                
##                              Public & Social Sector
## Finance & Business           -                     
## Industry & Manufacturing     -                     
## Knowledge & Education        -                     
## Public & Social Sector       -                     
## Sustainability & Environment 1.000                 
## 
## P value adjustment method: bonferroni
pairwise.wilcox.test(job_app_acronyms1$acronyms_per_job,
                     job_app_acronyms1$sector_grouped,
                     p.adjust.method = "BH")
## 
##  Pairwise comparisons using Wilcoxon rank sum test with continuity correction 
## 
## data:  job_app_acronyms1$acronyms_per_job and job_app_acronyms1$sector_grouped 
## 
##                              Consulting & Services Finance & Business
## Finance & Business           0.6859                -                 
## Industry & Manufacturing     0.0036                3.4e-13           
## Knowledge & Education        0.0213                1.2e-06           
## Public & Social Sector       0.0487                4.7e-06           
## Sustainability & Environment 0.0036                0.0015            
##                              Industry & Manufacturing Knowledge & Education
## Finance & Business           -                        -                    
## Industry & Manufacturing     -                        -                    
## Knowledge & Education        0.0251                   -                    
## Public & Social Sector       0.5901                   0.1835               
## Sustainability & Environment 0.2216                   0.8833               
##                              Public & Social Sector
## Finance & Business           -                     
## Industry & Manufacturing     -                     
## Knowledge & Education        -                     
## Public & Social Sector       -                     
## Sustainability & Environment 0.8337                
## 
## P value adjustment method: BH

Visualization

bonferroni_result <- pairwise.wilcox.test(job_app_acronyms1$acronyms_per_job,
                                           job_app_acronyms1$sector_grouped,
                                           p.adjust.method = "bonferroni")

bh_result <- pairwise.wilcox.test(job_app_acronyms1$acronyms_per_job,
                                   job_app_acronyms1$sector_grouped,
                                   p.adjust.method = "BH")
library(tibble)  # for rownames_to_column

pval_to_long <- function(test_result, method_name) {
  mat <- test_result$p.value
  
  # Make the matrix symmetric manually
  all_sectors <- union(rownames(mat), colnames(mat))
  n <- length(all_sectors)
  full_mat <- matrix(NA, nrow = n, ncol = n,
                     dimnames = list(all_sectors, all_sectors))
  
  for (r in rownames(mat)) {
    for (c in colnames(mat)) {
      full_mat[r, c] <- mat[r, c]
      full_mat[c, r] <- mat[r, c]  # mirror
    }
  }
  diag(full_mat) <- NA
  
  # Convert to long format
  expand.grid(Sector1 = all_sectors, 
              Sector2 = all_sectors,
              stringsAsFactors = FALSE) %>%
    mutate(p_value = map2_dbl(Sector1, Sector2, ~ full_mat[.x, .y]),
           method = method_name,
           significant = ifelse(!is.na(p_value), p_value < 0.05, NA))
}

bonferroni_long <- pval_to_long(bonferroni_result, "Bonferroni")
bh_long         <- pval_to_long(bh_result, "BH")

# Combine both
combined <- bind_rows(bonferroni_long, bh_long)

# Plot
combined %>%
  ggplot(aes(x = Sector1, y = Sector2, fill = p_value)) +
  geom_tile(color = "white", linewidth = 0.5) +
  geom_text(aes(label = ifelse(!is.na(p_value),
                               ifelse(p_value < 0.001, "<0.001", round(p_value, 3)),
                               "")),
            size = 2.8, color = "white", fontface = "bold") +
  scale_fill_gradient2(low = "#e74c3c",
                       mid = "#f39c12",
                       high = "#ecf0f1",
                       midpoint = 0.05,
                       na.value = "grey90",
                       name = "p-value",
                       limits = c(0, 1)) +
  facet_wrap(~ method, ncol = 2) +
  labs(
    title = "Pairwise Wilcoxon Test P-values by Correction Method",
    subtitle = "Red = significant (p < 0.05), lighter = not significant",
    x = NULL,
    y = NULL
  ) +
  theme_minimal(base_size = 11) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    strip.text = element_text(face = "bold", size = 12),
    legend.position = "bottom"
  )

6. Limitations

Sample size and representativeness The dataset covers 170 applications made over 18 months in a specific field (sustainability/ESG). The findings therefore reflect my personal job search experience and cannot be generalised to broader labour market trends.

Self-reported and manually collected data The data was collected manually, which introduces the risk of entry errors, inconsistencies in sector classification, and incomplete records. Some applications may have been omitted, and not all job descriptions were saved for acronym extraction.

Small group sizes Several organisation types and sectors have fewer than 10 observations, which reduces the statistical power of inferential tests and makes pairwise comparisons unreliable even when a global test is significant. This is particularly relevant for the pairwise Wilcoxon tests in sections 4.2 and 5.1, where the Bonferroni correction may have been too conservative given the small sample, and the additional pairs detected by the BH correction should be interpreted with caution.

Aggregated monthly reply data The monthly reply counts in section 4.5 represent one aggregated observation per month, which made group comparison tests such as ANOVA and Kruskal-Wallis inappropriate. A Spearman correlation was used instead to test for a trend over time, which is a more suitable but less powerful approach given the small number of time points (18 months).

Acronym extraction limitations The acronym extraction relied on a predefined list of keywords, which may have missed relevant terms or misclassified others. The Python script output also contained duplicates that required manual cleaning. Furthermore, the acronym dataset does not cover all applications, meaning the cross-analysis in section 5 is based on a partial overlap between the two tables.

Reply status interpretation A “reply” includes both positive and negative responses, meaning a high reply rate does not necessarily indicate success. This analysis does not distinguish between rejections, interview invitations, or other outcomes, which limits the practical interpretation of the reply rate findings.

Correction method dependency The choice of p-value adjustment method meaningfully affects which pairwise comparisons are deemed significant. The Bonferroni correction identified 6 significant pairs while the BH correction identified 9. Conclusions drawn from the pairwise analysis are therefore sensitive to the correction method chosen, and both should be considered together rather than in isolation.

7. Discussion

A 48.8% overall reply rate is notably higher than the commonly cited industry average of ~25%, which may reflect the niche nature of sustainability roles, the targeted nature of the applications, or the strong presence of structured HR processes in the sectors applied to.

The statistically significant relationships between reply status and both sector (Fisher, p < 0.05) and organisation type (Fisher, p = 0.0005) suggest that these structural characteristics of employers meaningfully influence recruitment responsiveness. Accounting and auditing firms had the highest reply rate (90%), possibly reflecting more formalised recruitment pipelines, while intergovernmental organisations had the lowest (14%), which may be explained by longer and more bureaucratic hiring processes consistent with the reply time analysis in section 4.4.

Regarding the temporal dimension of replies, the Spearman correlation found a weak positive trend between time and number of replies (rho = 0.322, p = 0.192), suggesting a slight increase in replies over the 18-month period that does not reach statistical significance. This result should be interpreted cautiously given that monthly reply counts represent single aggregated observations, limiting the power of any temporal analysis. The absence of a significant relationship between reply time and sector or month of application further suggests that timing a job application strategically by month or field may not substantially improve responsiveness, at least within this dataset.

The acronym analysis in section 5.1 provides perhaps the most insightful finding of the report. The global Kruskal-Wallis test confirmed that the number of acronyms per job description differs significantly across sectors. The subsequent pairwise Wilcoxon tests revealed that Finance & Business is the most distinct sector, differing significantly from all other sectors under both Bonferroni and BH corrections. This is a somewhat counterintuitive finding, as one might expect Finance & Business to use more technical language, but it may reflect that financial sector job descriptions in this sample were more generalist in nature or targeted a broader audience. Consulting & Services on the other hand consistently used more technical acronyms than Industry & Manufacturing, Sustainability & Environment, Knowledge & Education, and Public & Social Sector, the latter two only emerging as significant under the less conservative BH correction. Knowledge & Education consistently showed the lowest acronym usage across both correction methods, which aligns with expectations given the more descriptive and less regulatory nature of academic and educational job postings.

The prevalence of ESG, CSRD, and EU-related acronyms throughout the dataset reflects the regulatory environment of 2024–2025, where the EU sustainability disclosure framework was actively debated. The subsequent Omnibus proposal in November 2025, which reduced the scope of CSRD dramatically, may shift this acronym landscape significantly in future job postings, with implications for the technical skills demanded by employers.

8. Conclusion

This analysis of 170 personal job applications submitted between March 2024 and September 2025 provides a detailed snapshot of a sustainability-focused job search. The data reveals that applications were spread across 37 sectors (reduced to 6 sectors to increase the number of observations per sector), predominantly in Consulting & Services, and that the majority targeted roles requiring 2–3 years of experience, consistent with the applicant’s profile.

Statistically significant relationships were found between reply status and both sector and organisation type, suggesting that these structural factors play a meaningful role in employer responsiveness. In contrast, no significant relationship was found between reply time and sector or application month, and the temporal analysis of monthly reply counts revealed only a weak, non-significant upward trend over the 18-month period (rho = 0.322, p = 0.192). These findings collectively suggest that the timing and field of an application are less important than the structural characteristics of the target organisation in determining whether and how quickly a reply is received.

The acronym analysis highlighted meaningful differences in technical language use across sectors. Finance & Business stood out as the most distinct group, differing significantly from all other sectors in acronym usage, while Consulting & Services consistently used the most technical language. These findings, robust under both Bonferroni and BH corrections, suggest that the density of technical acronyms in job descriptions is partly a function of the sector, with implications for how candidates tailor their application materials.

The centrality of EU sustainability regulation acronyms such as ESG, CSRD, and GRI throughout the dataset reflects the regulatory moment of 2024–2025 in Europe. As the Omnibus proposal reshapes the scope of mandatory sustainability reporting, future analyses may capture a shift in the technical vocabulary of sustainability job descriptions, potentially reducing the dominance of EU regulatory frameworks in favour of broader international standards such as ISSB.

While the findings are limited by the personal and small-scale nature of the dataset, this project demonstrates the value of systematic data collection during a job search and provides a methodological template that could be scaled or replicated. Future work could enrich the analysis by distinguishing between types of replies, incorporating salary data, expanding the acronym list to capture a broader range of technical frameworks, or collecting reply counts at the individual application level rather than as monthly aggregates to enable more robust temporal analysis.