knitr::opts_chunk$set(
  message = FALSE,
  warning = FALSE
)

knitr::opts_chunk$set(
  message = FALSE,
  warning = FALSE
)
library(janitor)
library(dplyr)
library(tidyr)
library(ggplot2)
library(tidylog)
library(stringr)
library(stats)
library(infer)
library(readxl)
library(lubridate)
library(visdat)
library(knitr)
library(rmarkdown)
library(kableExtra)
library(purrr)

1. Introduction

This report compiles and analyses data I personally collected from March 2024 to September 2025 on my job applications. Every time I noticed an interesting job description I had a two-step process; 1) I copy and paste the job description in a word document and 2) I fill in an excel spreadsheet with data from the job description for example: on the company name, job title, date of application etc. This process was intended to be useful in case I was called for an interview for which the job description had disappeared and along with the job details, therefore would hinder me in the preparation for the interview. The second objective was to track my applications in order to report them later to my unemployment agency, as I was required to do minimum of applications per months and provide details such as company name, location, link to the application and job title. After more than a year, the total data compiled started to be interesting to conduct an analysis and practice my skills in data wrangling I had aquired during the 2025 year, while being a stimulating and entertaining exercise it also was relevant for me to have an overview of my applications.

2. Data and methodology

Data collection process

The data comes from an excel spreadsheet and a word document. The excel spreadsheet includes the company name, the job title, the date of application, the month of application, the status of the application, the date of the reply, the sector, the organisation type, and the minimum years of experience. The word document includes the job descriptions of jobs I applied to and some I haven’t applied to. There was one KPI I wanted to include in the analysis that I hadn’t tracked, which was the number of acronyms per job descriptions. This variable was possible to extract thanks to a python script by Chat GPT that produced an excel document from a list of company names, acronyms and key word for job titles out of the word document.

Description of the two Excel tables

The Data is composed of two tables: job_acronym and job_applications. The first one has 5 columns and shows the number of occurrences of specific acronyms per job title. Yet, the table is not fully cleaned as there are still some duplicates. The second table has 12 columns and includes more information on the job applications and descriptions. The second table also needs to be cleaned, as some columns haven’t been recognized by R Studio with the correct class type.

Data Import

Loading files into R

job_acronym <- read_excel("~/Documents/Job data analysis/job_acronym.xlsx") %>% 
  clean_names() %>% 
  rename(meaning = `x5`) %>% 
  glimpse()
# still have to make sure the count is right per position

job_applications <- read_excel("~/Documents/Job data analysis/Job applications March 2024 - September 2025.xlsx") %>% 
  clean_names() %>% 
  glimpse()
# still have to clean the columns types here

Data cleaning

Handling missing values

job_acronym_NA <- job_acronym %>% 
  mutate(company = if_else(company == "Unknown", NA_character_, company))

This new table is identical to the latter except the “Unknown” companies are replaced by the NA character which is a standard character recognized by R to indicate missing values.The table contains 3 missing values now labelled as NAs.

Formatting columns

Cleaning columns and class types

job_applications1 <- job_applications %>% 
  mutate(date_of_application = ymd(date_of_application),
         date_of_reply = ymd(date_of_reply),
         min_years_of_experience = as.double(min_years_of_experience)) %>% 
  glimpse()

Cleaning duplicates

job_acronym1 <- job_acronym_NA %>%
  group_by(company, job_title, acronym) %>% 
  mutate(count = sum(count)) %>% 
  ungroup() %>% 
  distinct(company, job_title, acronym, count)

3. Descriptive Analysis

3.1 Companies

The first data I want to know is the number of companies I applied to in total.

Number of distinct companies applied to

distinct_companies <- job_acronym1%>%
  distinct(company) %>% 
  nrow()

There are 77 distinct companies in the job_acronym table.

distinct_company_names<- job_applications1 %>%
  mutate(
    name_lowercase = name %>% 
           str_to_lower()
    ) %>% 
  distinct(name_lowercase) %>% 
  nrow()

There are 112 distinct companies in the job_applications1 table. Which means some companies I applied to either didn’t include the selected acronyms in their job description, or that some applications I made were not included in the sample I used to extract the acronyms.

3.2 Sectors

Most frequent sectors

In the below table, I want to know which sectors I have applied to most frequently.

frequent_sectors <- job_applications1 %>% 
  filter(!is.na(sector)) %>% 
  group_by(sector) %>% 
  summarise(n=n()) %>% 
  ungroup() %>% 
  arrange(desc(n))

kable(frequent_sectors, caption = "Sectors I applied to") %>% 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE) %>%
  scroll_box(height = "400px")

Sectors I applied to
sector	n
Consulting	24
Environment	16
Recruitment	16
Finance	9
Manufacture	8
Banking	7
human rights	7
Commodities	6
Government	6
Biodiversity	5
Chemicals	5
Energy	5
Automation	4
Aviation	4
Beverage	4
Certification	4
Science	4
Shipping	4
Academia	3
Standard-setter	3
Insurance	2
Manufacture/Watchmaking	2
Sport	2
Automobile	1
Culture	1
Diplomacy	1
Education	1
Furniture	1
Garment	1
Garnment	1
IT	1
Manufacturing	1
Media	1
Pharmaceutical	1
Real estate	1
Standards-setter	1
Supply chain	1
Sustainability	1
Sustainable Finance	1
Water	1
chemicals	1

Comment

There are several duplicated sectors because of typos, lower cases instead of capital letter, plurals instead if singular forms and so on. To avoid this issue, I have to standardize the sector names.

cleaned_sectors <- job_applications1 %>% 
  mutate(
    sector_lowercase = sector %>%
      str_trim() %>%
      str_to_lower()
  )

cleaned_sectors1 <- cleaned_sectors %>% 
  mutate(
    sector_clean = case_when(
      str_detect(sector_lowercase, "garnment") ~ "garment",
      str_detect(sector_lowercase, "standards-setter") ~ "standard-setter",
      str_detect(sector_lowercase, "manufacture|manufacture/watchmaking") ~ "manufacturing",
      TRUE ~ str_to_lower(sector_lowercase)
    )
  )

cleaned_sectors1 %>%
  count(sector_clean, sort = TRUE)

## # A tibble: 37 × 2
##    sector_clean      n
##    <chr>         <int>
##  1 consulting       24
##  2 environment      16
##  3 recruitment      16
##  4 manufacturing    11
##  5 finance           9
##  6 banking           7
##  7 human rights      7
##  8 chemicals         6
##  9 commodities       6
## 10 government        6
## # ℹ 27 more rows

Comment

The above table shows that I have mostly applied to jobs in the consulting sector, than in the environment sector and thirdly in the recruitment sector. There are 37 sectors, which means that the group size for each is very small.

frequent_sectors_clean <- cleaned_sectors1 %>% 
  count(sector_clean, sort=TRUE)

Now that I have the number of applications per sector, I build a plot and exclude the NA category, to only keep existing sector names.

frequent_sectors_clean %>% 
  filter(!is.na(sector_clean)) %>% 
  arrange(desc(n)) %>% 
  slice_max(order_by=n, n = 15) %>% 
  ggplot(aes(x=reorder(sector_clean, -n), y= n, fill = sector_clean))+
  geom_col(alpha = 0.7)+
  theme_minimal()+
  labs(title = "Top 15 most frequent sectors in applications", x="Sectors", y = "Number of applications")+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

In order to reduce the number of sectors, I decide to group them into 7 groups.

cleaned_sectors_less <- cleaned_sectors1 %>% 
  mutate(sector_grouped = case_when(
    sector_clean %in% c("finance", "banking", "sustainable finance", "insurance", "real estate") ~ "Finance & Business",
    sector_clean %in% c("manufacturing", "automobile", "aviation", "garment", "furniture", "chemicals", "commodities", "beverage", "shipping") ~ "Industry & Manufacturing",
    sector_clean %in% c("sustainability", "environment", "biodiversity", "water", "energy") ~ "Sustainability & Environment",
    sector_clean %in% c("consulting", "supply chain", "certification", "standard-setter", "recruitment", "automation", "it") ~ "Consulting & Services",
    sector_clean %in% c("government", "diplomacy", "human rights", "culture", "sport") ~ "Public & Social Sector",
    sector_clean %in% c("academia", "education", "science", "media") ~ "Knowledge & Education",
    TRUE ~ NA_character_
  ))

cleaned_sectors_less %>% 
  count(sector_grouped) %>% 
  arrange(desc(n))

## # A tibble: 7 × 2
##   sector_grouped                   n
##   <chr>                        <int>
## 1 Consulting & Services           54
## 2 Industry & Manufacturing        39
## 3 Sustainability & Environment    28
## 4 Finance & Business              20
## 5 Public & Social Sector          17
## 6 Knowledge & Education            9
## 7 <NA>                             3

cleaned_sectors_less %>%
  count(sector_grouped) %>%
  ggplot(aes(x = reorder(sector_grouped, n), y = n, fill = sector_grouped)) +
  geom_col(alpha = 0.8) +
  geom_text(aes(label = n), hjust = -0.3, size = 3.5) +
  coord_flip() +
  labs(
    title = "Number of Applications by Sector",
    subtitle = "Distribution of job applications across grouped sectors",
    x = "Sector",
    y = "Number of Applications",
    caption = paste0("n = ", nrow(cleaned_sectors_less))
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
    axis.title = element_text(size = 11, face = "bold"),
    legend.position = "none"
  )

Comment

The above graph shows that I have mainly applied to Consulting & Services sectors (54 applications) and least to the Knowledge and Education sector.

3.3 Job Requirements: Years of Experience

Descriptive statistics

In the below table I want to examine the mean, the minimum, the maximum, the median and standard deviation if the number of years of experience required in the job description sample.

year_of_exp <- job_applications1 %>% 
  filter(!is.na(min_years_of_experience)) %>% 
  summarise(mean = mean(min_years_of_experience),
            st_var = sd(min_years_of_experience),
            min = min(min_years_of_experience),
            max = max(min_years_of_experience),
            median = median(min_years_of_experience),
            group_size = n())
year_of_exp

## # A tibble: 1 × 6
##    mean st_var   min   max median group_size
##   <dbl>  <dbl> <dbl> <dbl>  <dbl>      <int>
## 1  3.30   1.80     0    10      3         71

Comment

I notice that the median and the mean are very similar; close to 3 years. This result is aligned with my actual 3 years of experience. Even though I have applied to jobs that required sometimes more than my years of experience, 50% of my applications were targeted to jobs aligned with my profile, with at least 3 years minimum required or below.

job_applications1 %>%
  filter(!is.na(min_years_of_experience)) %>%
  ggplot(aes(x = min_years_of_experience)) +
  geom_histogram(binwidth = 1, fill = "#3498db", alpha = 0.7, color = "white") +
  geom_vline(aes(xintercept = median(min_years_of_experience)),
             color = "#e74c3c", linetype = "dashed", linewidth = 0.8) +
  annotate("text", x = median(job_applications1$min_years_of_experience, na.rm = TRUE) + 0.3,
           y = Inf, vjust = 2, label = "Median", color = "#e74c3c", size = 3.5) +
  scale_x_continuous(breaks = 0:10) +
  labs(
    title = "Distribution of Minimum Years of Experience Required",
    subtitle = "Across all job applications",
    x = "Minimum Years of Experience",
    y = "Count",
    caption = paste0("n = ", nrow(filter(job_applications1, !is.na(min_years_of_experience))))
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
    axis.title = element_text(size = 11, face = "bold")
  )

Distribution plot

job_applications1 %>%
  filter(!is.na(min_years_of_experience)) %>%
  ggplot(aes(x = "", y = min_years_of_experience)) +
  geom_violin(fill = "#3498db", alpha = 0.4, linewidth = 0.3, trim = FALSE) +
  geom_boxplot(width = 0.2, fill = "#3498db", alpha = 0.7, outlier.alpha = 0.5) +
  scale_y_continuous(breaks = 0:10) +
  labs(
    title = "Distribution of Minimum Years of Experience Required",
    subtitle = "Across all job applications",
    x = "All applications",
    y = "Minimum Years of Experience",
    caption = paste0("n = ", nrow(filter(job_applications1, !is.na(min_years_of_experience))))
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
    axis.title = element_text(size = 11, face = "bold"),
    axis.text = element_text(size = 10)
  )

Comment

This graph shows that I applied to a significant proportion of jobs requiring between 2 and 3 years of experience, and as a second choice, jobs requiring between 4 and 6 years of experience on average.

3.4 Job Requirements: Acronyms and technical language

Number of acronym occurrences

job_acronym_total <- job_acronym1 %>% 
  group_by(acronym) %>% 
  summarise(total = sum(count)) %>% 
  ungroup() %>% 
  arrange(desc(total))

job_acronym_total %>% 
  kable(caption = "Number of occurrences per acronym") %>% 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE) %>%
  scroll_box(height = "400px")

Number of occurrences per acronym
acronym	total
ESG	156
CSRD	34
EU	27
GHG	23
GRI	23
LCA	22
UN	22
ACV	19
TCFD	17
ISO	15
CV	13
RSE	13
IT	11
ES	10
ESRS	10
CENMAT	9
EHS	9
CNC	8
CDP	7
CSR	6
D&I	6
EPR	6
NGO	6
SASB	6
HR	5
MS	5
PhD	5
SAP	5
CEO	4
CORSIA	4
HES	4
IFRS	4
KPI	4
SDG	4
CET	3
COP	3
CROSSEU	3
CSDDD	3
E&S	3
IFC	3
L&D	3
NNY	3
PCAF	3
SAF	3
TA	3
AI	2
ASAP	2
ArcGIS	2
BPM	2
CDI	2
CDM	2
CEFR	2
CFA	2
CRM	2
DwP	2
EFTA	2
EM	2
GES	2
GIS	2
GSSB	2
ISS	2
LEED	2
NDCs	2
PEF	2
PRI	2
PoS	2
SBTi	2
SFRD	2
SME	2
SQL	2
TNFD	2
UPR	2
ACCA	1
ACE	1
AFOLU	1
AP	1
BIA	1
BU	1
CA	1
CADO	1
CAEP	1
CDD	1
COC	1
COO	1
CPA	1
CRREM	1
CoP	1
DAW	1
DDTrO	1
DGNB	1
EIA	1
EMEA	1
EPF	1
EPP	1
ERP	1
ERPDs	1
ESAP	1
ESDD	1
ESIA	1
ESMP	1
ESMS	1
ESPR	1
FCF	1
FMCG	1
GCF	1
GIIN	1
HBE	1
HRC	1
I&D	1
ICAO	1
INSTRAW	1
IPCC	1
ISCC	1
ISSB	1
JEDI	1
LGBTQ	1
LGBTQI+	1
MBA	1
MRV	1
NFR	1
OHCHR	1
OPIM	1
OSAGI	1
PACTA	1
PAI	1
PLM	1
PMO	1
PWM	1
RED	1
REDD+	1
RH	1
RJC	1
ROI	1
RTFC	1
SBTN	1
SBU	1
SER	1
SES	1
SMEI	1
SPOC	1
THG	1
TORs	1
UDB	1
UE	1
UNFCCC	1
UNGC	1
UNIFEM	1
VIP	1
VUCA	1
eDNA	1

This new table only has 2 columns: “acronym” and “total”. It shows the total number of occurrences per acronym throughout the whole job_acronym table. Regardless of the number of times they are repeated for the same job title.

distinct_acronyms_per_job <- job_acronym1 %>% 
  group_by(job_title, company) %>% 
  summarise(distinct_acronym = n()) %>% 
  ungroup()%>% 
  select(-company) %>%
  arrange(desc(distinct_acronym))

distinct_acronyms_per_job %>% 
  head(10) %>% 
  kable(caption = "Number of disctinct acronyms per job description")

Number of disctinct acronyms per job description
job_title	distinct_acronym
ESG Officer	19
ESG Analyst	10
ESG Reporting Officer	10
Sustainability Reporting Analyst	10
Consultant	9
Environmental Sustainability Specialist Operations	9
Human Rights Officer	9
Assistant Manager Sustainability Programs	8
Chargé.e de données durabilité environnementale	8
Environmental, Social and Governance (ESG) Auditor – Associate/Senior Associate	8

This new table has 3 columns: “company”, “job_title”, and “distinct_acronym”. I exclude the company name to guarantee some secrecy. This new table shows the number of distinct acronyms per job title and company.

total_acronyms_per_job <- job_acronym1 %>% 
  group_by(job_title, company) %>% 
  summarise(total_acronyms = sum(count)) %>% 
  ungroup() %>% 
  select(-company) %>% 
  arrange(desc(total_acronyms))

total_acronyms_per_job %>% 
  head(10) %>% 
  kable(caption = "Number of acronyms per job description")

Number of acronyms per job description
job_title	total_acronyms
ESG Officer	27
ESG Reporting Officer	22
ESG Analyst	19
Sustainability & LCA Specialist	18
ESG Data and Compliance Controller	17
Environmental Sustainability Specialist Operations	17
Global Sustainability Manager	17
Human Rights Officer	16
Programme Analyst	16
Sustainability Project Manager	16

This new table has 3 columns: “company”, “job_title”, “total_acronyms”. This new table shows the total number of acronyms per job title and company. Regardless of the acronym repetitions.

job_acronym_total %>%
  slice_max(total, n = 15) %>%
  ggplot(aes(x = reorder(acronym, total), y = total, fill = total)) +
  geom_col(alpha = 0.8) +
  geom_text(aes(label = total), hjust = -0.3, size = 3.5) +
  coord_flip() +
  scale_fill_gradient(low = "#a8d0e6", high = "#3498db") +
  labs(
    title = "Top 15 Most Frequent Acronyms in Job Descriptions",
    subtitle = "Total occurrences across all job postings",
    x = "Acronym",
    y = "Total Occurrences",
    caption = paste0("Total unique acronyms: ", nrow(job_acronym_total))
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
    axis.title = element_text(size = 11, face = "bold"),
    legend.position = "none"
  )

Comment

In the above plot, the acronyms are counted regardless of whether they are mentioned several times within the same job description or not. This means there is no way to assess the number of job descriptions that mention the same acronym.

Top 15 acronyms by distinct jobs

The first step is to add a column that counts the number of occurrences per distinct job.

number_of_jobs_w_acronym <- job_acronym1 %>% 
  group_by(acronym) %>% 
  distinct(company, job_title, count)%>%
  mutate(n_jobs_w_acronym = n()) %>% 
  ungroup()

number_of_jobs_w_acronym %>% 
  distinct(acronym, n_jobs_w_acronym) %>% 
  arrange(desc(n_jobs_w_acronym)) %>% 
  slice_max(order_by = n_jobs_w_acronym, n=15) %>% 
  ggplot(aes(x=reorder(factor(acronym), n_jobs_w_acronym), y= n_jobs_w_acronym, fill=acronym)) +
  geom_col()+
  coord_flip()+
  theme_minimal()+
  labs(title = "Top 15 acronyms mentionned by distinct job descriptions",
       x = "Acronyms",
       y = "Number of distinct job descriptions")

Comment

If we compare this second plot with the first one we observe that the top 3 acronyms “ESG”, “CSRD”, and “EU” are the same. The number of “ESG” occurrences is not a surprise since it is closely linked with sustainability positions. “CSRD” and “EU” can indicate the importance and of EU regulation for the companies/organisations, especially in 2025, as the EU Directive on sustainability disclosure was being debated. On November 13, 2025, the European Parliament voted in the “Omnibus” Proposal on sustainability reporting which is a simplified version of the proposition from 2024. The latest version removes roughly ~90% of companies from the scope of CSRD. Now environmental and Social reporting requirements only apply to businesses employing on average 1750 employees and with a net annual turnover of over € 450 million.

In the future, we might see a diminution of these EU disclosure-related acronyms. One could assume EU regulation on sustainability reporting becomes less of a priority. Yet, the topic of sustainability reporting for businesses might not totally fall out of trend; some companies might want to anticipate future regulations, as other jurisdictions adopt sustainability-related financial standards such as ISSB.

The number of job descriptions mentioning other reporting standards or frameworks such as “GHG” (I assume that GHG was mentioned along with Protocol but it is possible that descriptions only mention GHG, in any cases, this still shows the interest for carbon accounting which is a part of sustainability reporting), “GRI”, “TCFD” is around 15 for each one. This observation tends to confirm that while interest to anticipate EU regulation may fade, there is still a strong interest to assess and report on impact through other recognized standards and frameworks.

One notable difference with the two plots is the importance of “LCA” (Life Cycle Assessment) and “ACV” (Analyse de Cycle de Vie in French) in the first plot, while these acronyms barely make it to the top 15 of acronyms mentioned by distinct job descriptions, they appear higher in the first plot. This result shows that a few job descriptions likely include these acronyms many times, whereas the number of job descriptions referencing Life Cycle Assessment is less than 10 in a sample of distinct_companies job descriptions.

Average acronyms per job description

av_n_acronyms <- number_of_jobs_w_acronym %>% 
  group_by(job_title, company) %>% 
  mutate(n_acronyms = n()) %>% 
  ungroup() %>% 
  summarise(average_number_of_acronyms = mean(n_acronyms),
            min = min(n_acronyms),
            max = max(n_acronyms)) 
kable(av_n_acronyms, caption = "Average number of Acronyms per job description with minimum and maximum values")

Average number of Acronyms per job description with minimum and maximum values
average_number_of_acronyms	min	max
5.981723	1	19

4. Relational Analysis

4.1 Sector & Years of experience

Descriptive statistics

In this chapter, I want to determine if there is a relationship between the sector and the years of experience required. First, I analyze the descriptive statistics of the years of experience and group them by the sector.

cleaned_sectors_less %>% 
  filter(!is.na(min_years_of_experience)) %>% 
  group_by(sector_grouped) %>% 
  summarise(mean = mean(min_years_of_experience),
            st_var = sd(min_years_of_experience),
            min = min(min_years_of_experience),
            max = max(min_years_of_experience),
            median = median(min_years_of_experience),
            group_size = n()) %>% 
  ungroup()

## # A tibble: 7 × 7
##   sector_grouped                mean st_var   min   max median group_size
##   <chr>                        <dbl>  <dbl> <dbl> <dbl>  <dbl>      <int>
## 1 Consulting & Services         3.75   1.74   2       8    3           20
## 2 Finance & Business            2.88   1.46   1       5    2.5          8
## 3 Industry & Manufacturing      3.17   1.54   1       5    3           18
## 4 Knowledge & Education         2.67   2.07   0       5    2.5          6
## 5 Public & Social Sector        3.33   1.37   2       5    3            6
## 6 Sustainability & Environment  2.79   1.53   0.5     5    2.5         12
## 7 <NA>                         10     NA     10      10   10            1

Comment

The differences in the mean for each sector are below 1 point. This observation matches the observation from chapter 3.3.

cleaned_sectors_less %>%
  filter(!is.na(min_years_of_experience), !is.na(sector_grouped)) %>%
  ggplot(aes(x = sector_grouped, y = min_years_of_experience, fill = sector_grouped)) +
  geom_boxplot(alpha = 0.7, outlier.alpha = 0.5) +
  geom_jitter(width = 0.2, alpha = 0.3, size = 2) +
  scale_y_continuous(breaks = 0:10) +
  labs(
    title = "Years of Experience Required by Sector",
    subtitle = "Distribution of minimum years of experience per sector",
    x = "Sector",
    y = "Minimum Years of Experience",
    caption = paste0("n = ", nrow(filter(cleaned_sectors_less, !is.na(min_years_of_experience), !is.na(sector_grouped))))
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title = element_text(size = 11, face = "bold"),
    legend.position = "none"
  )

Inferential statistics

The next step is to test if the observations are normally distributed with a Shapiro-Wilk test.

Normality test

shapiro.test(cleaned_sectors_less$min_years_of_experience)

## 
##  Shapiro-Wilk normality test
## 
## data:  cleaned_sectors_less$min_years_of_experience
## W = 0.90014, p-value = 3.486e-05

Comment

I observe that the p-value is lower than 0.05, which means that the data is not normally distributed, and a Kruskal-Wallis test needs to be conducted in order to determine whether the sector is a variable that influences the years of experience.

kruskal.test(min_years_of_experience ~ sector_grouped, data = cleaned_sectors_less)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  min_years_of_experience by sector_grouped
## Kruskal-Wallis chi-squared = 3.4804, df = 5, p-value = 0.6264

Comment

The p-value is 0.6264, which is above 0.05, so I have to accept the null hypothesis. This means there is no statistically significant difference in years of experience required across sectors. The years of experience requested do not vary significantly depending on the sector. All sectors tend to require a similar level of experience in your dataset. The reasons for that might be; the sample size is small (personal job search), which reduces statistical power, and sustainability jobs may genuinely cluster around similar experience requirements regardless of sector.

4.2 Organisation Type & Years of Experience

Descriptive statistics

job_applications1 %>% 
  filter(!is.na(min_years_of_experience)) %>% 
  group_by(org_type) %>% 
  summarise(mean = mean(min_years_of_experience),
            st_var = sd(min_years_of_experience),
            min = min(min_years_of_experience),
            max = max(min_years_of_experience),
            median = median(min_years_of_experience),
            group_size = n()) %>% 
  ungroup()

## # A tibble: 9 × 7
##   org_type                            mean st_var   min   max median group_size
##   <chr>                              <dbl>  <dbl> <dbl> <dbl>  <dbl>      <int>
## 1 Academia                            4     1.41    3       5      4          2
## 2 Accounting profession and auditors  2     0       2       2      2          3
## 3 Company (listed)                    3.73  2.37    1      10      3         11
## 4 Company (unlisted)                  3.42  1.46    1       5      3         19
## 5 Government                          4     1.15    3       5      4          4
## 6 NGO / not for profit                3.44  1.60    0.5     6      3         17
## 7 Trade association                   1     0       1       1      1          2
## 8 intergovernmental organization      1.75  0.886   0       3      2          8
## 9 <NA>                                4.8   2.59    2       8      4          5

Comment

I observe that the the mean of years of experience is this time slightly difference across organisation types. Yet I also note that the group size for some organisation types is low (below 5 observations).

job_applications1 %>%
  filter(!is.na(min_years_of_experience), !is.na(org_type)) %>%
  ggplot(aes(x = org_type, y = min_years_of_experience, fill = org_type)) +
  geom_boxplot(alpha = 0.7, outlier.alpha = 0.5) +
  geom_jitter(width = 0.2, alpha = 0.3, size = 2) +
  scale_y_continuous(breaks = 0:10) +
  labs(
    title = "Years of Experience Required by Organisation Type",
    subtitle = "Distribution of minimum years of experience per organisation type",
    x = "Organisation Type",
    y = "Minimum Years of Experience",
    caption = paste0("n = ", nrow(filter(job_applications1, !is.na(min_years_of_experience), !is.na(org_type))))
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title = element_text(size = 11, face = "bold"),
    legend.position = "none"
  )

Inferential statistics

I already know that the years of experience are not normally distributed, so I can already use the Kruskal test to determine the whether the organisation type plays a role in the years of experience.

kruskal.test(min_years_of_experience ~ org_type, data = job_applications1)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  min_years_of_experience by org_type
## Kruskal-Wallis chi-squared = 18.292, df = 7, p-value = 0.01072

Comment

The p-value after the test is 0.02072, which is below the 0.05 threshold. This means I can reject the null hypothesis, because there is a statistically difference of the years of experience across the organisation types. Which might mean that the organisation type influences the number of required years, or that some organisation types consistently require more or less years of experience. The next step is to run a post-hoc pairwise Wilcoxon test with Bonferroni correction to identify which groups differ significantly.

pairwise.wilcox.test(job_applications1$min_years_of_experience,
                     job_applications1$org_type,
                     p.adjust.method = "bonferroni")

## 
##  Pairwise comparisons using Wilcoxon rank sum test with continuity correction 
## 
## data:  job_applications1$min_years_of_experience and job_applications1$org_type 
## 
##                                    Academia Accounting profession and auditors
## Accounting profession and auditors 1.00     -                                 
## Company (listed)                   1.00     1.00                              
## Company (unlisted)                 1.00     1.00                              
## Government                         1.00     1.00                              
## intergovernmental organization     1.00     1.00                              
## NGO / not for profit               1.00     1.00                              
## Trade association                  1.00     1.00                              
##                                    Company (listed) Company (unlisted)
## Accounting profession and auditors -                -                 
## Company (listed)                   -                -                 
## Company (unlisted)                 1.00             -                 
## Government                         1.00             1.00              
## intergovernmental organization     0.24             0.21              
## NGO / not for profit               1.00             1.00              
## Trade association                  1.00             1.00              
##                                    Government intergovernmental organization
## Accounting profession and auditors -          -                             
## Company (listed)                   -          -                             
## Company (unlisted)                 -          -                             
## Government                         -          -                             
## intergovernmental organization     0.28       -                             
## NGO / not for profit               1.00       0.35                          
## Trade association                  1.00       1.00                          
##                                    NGO / not for profit
## Accounting profession and auditors -                   
## Company (listed)                   -                   
## Company (unlisted)                 -                   
## Government                         -                   
## intergovernmental organization     -                   
## NGO / not for profit               -                   
## Trade association                  1.00                
## 
## P value adjustment method: bonferroni

Comment

A pairwise Wilcoxon test with Bonferroni correction was conducted. Due to ties in the data, approximate p-values were computed.

The Bonferroni correction is very conservative — it adjusts p-values upward to reduce false positives, which can make it hard to detect differences with a small dataset. Therefore I can try a less conservative method such as Benjamini-Hochberg in order to detect differences across organisation types.

pairwise.wilcox.test(job_applications1$min_years_of_experience,
                     job_applications1$org_type,
                     p.adjust.method = "BH")

## 
##  Pairwise comparisons using Wilcoxon rank sum test with continuity correction 
## 
## data:  job_applications1$min_years_of_experience and job_applications1$org_type 
## 
##                                    Academia Accounting profession and auditors
## Accounting profession and auditors 0.199    -                                 
## Company (listed)                   0.786    0.165                             
## Company (unlisted)                 0.799    0.199                             
## Government                         1.000    0.162                             
## intergovernmental organization     0.162    0.928                             
## NGO / not for profit               0.798    0.199                             
## Trade association                  0.363    0.199                             
##                                    Company (listed) Company (unlisted)
## Accounting profession and auditors -                -                 
## Company (listed)                   -                -                 
## Company (unlisted)                 0.958            -                 
## Government                         0.671            0.723             
## intergovernmental organization     0.088            0.088             
## NGO / not for profit               0.996            0.996             
## Trade association                  0.162            0.162             
##                                    Government intergovernmental organization
## Accounting profession and auditors -          -                             
## Company (listed)                   -          -                             
## Company (unlisted)                 -          -                             
## Government                         -          -                             
## intergovernmental organization     0.088      -                             
## NGO / not for profit               0.709      0.088                         
## Trade association                  0.199      0.356                         
##                                    NGO / not for profit
## Accounting profession and auditors -                   
## Company (listed)                   -                   
## Company (unlisted)                 -                   
## Government                         -                   
## intergovernmental organization     -                   
## NGO / not for profit               -                   
## Trade association                  0.162               
## 
## P value adjustment method: BH

Comment

Although the Kruskal-Wallis test indicated a global significant difference across organisation types (p = 0.011), pairwise comparisons with both Bonferroni and Benjamini-Hochberg corrections revealed no individually significant pairs. This likely reflects the limited sample size, which reduces the statistical power needed to detect differences at the pairwise level.

4.3 Reply Time & Sector

New columns: reply time (days) and reply status

In this next part I want to compute the reply time for each application. First, I have to exclude the applications with no application date or no reply date. I compute these values in the new column called “days to reply”, by substracting the date of reply minus the date of application, and turning the value into a numeric value. In the table below I create a column called “replied”, to later be able to compute the percentage of replies I received.

job_applications_clean <- cleaned_sectors_less %>%
  mutate(
    replied = !is.na(date_of_reply),
    days_to_reply = as.numeric(date_of_reply - date_of_application)
  )

job_applications_clean %>% 
  count(replied) %>%
  mutate(percentage = n / sum(n) * 100)

## # A tibble: 2 × 3
##   replied     n percentage
##   <lgl>   <int>      <dbl>
## 1 FALSE      87       51.2
## 2 TRUE       83       48.8

job_applications_clean %>%
  count(replied) %>%
  mutate(percentage = round(n / sum(n) * 100, 1)) %>%
  ggplot(aes(x = replied, y = percentage, fill = replied)) +
  geom_col(alpha = 0.8) +
  geom_text(aes(label = paste0(percentage, "%")), vjust = -0.5) +
  theme_minimal()+
  labs(
    title = "Percentage of applications with replies and without",
    subtitle = "In red applications without response, in blue replied applications",
    x = "Application replied",
    y = "Percentage")

Comment

Out of 170 applications, 48.8% received a reply, which is above the typical industry response rate of ~25%.

Descriptive statistics

Descriptive statistics for the number of days to reply

job_applications_clean %>% 
  filter(days_to_reply > 0) %>% 
  summarise(mean = mean(days_to_reply),
            min = min(days_to_reply),
            max = max(days_to_reply),
            group_size = n(),
            median = median(days_to_reply),
            sd_var = sd(days_to_reply))

## # A tibble: 1 × 6
##    mean   min   max group_size median sd_var
##   <dbl> <dbl> <dbl>      <int>  <dbl>  <dbl>
## 1  29.8     1   174         78     21   29.7

Comment

In the above table, I notice that the average number of days to reply is close to 30 days, with a minimum of 1 day and a maximum of 174 days (almost 6 months).

job_applications_clean %>%
  filter(days_to_reply > 0) %>%
  ggplot(aes(x = days_to_reply)) +
  geom_histogram(binwidth = 5, fill = "#3498db", alpha = 0.7, color = "white") +
  geom_vline(aes(xintercept = median(days_to_reply)), 
             color = "#e74c3c", linetype = "dashed", linewidth = 0.8) +
  annotate("text", x = median(job_applications_clean$days_to_reply, na.rm = TRUE) + 3, 
           y = Inf, vjust = 2, label = "Median", color = "#e74c3c", size = 3.5) +
  labs(
    title = "Distribution of Reply Time",
    subtitle = "Number of days between application and reply",
    x = "Days to Reply",
    y = "Count",
    caption = paste0("n = ", nrow(filter(job_applications_clean, days_to_reply > 0)))
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
    axis.title = element_text(size = 11, face = "bold")
  )

Comment

The histogram shows that the majority of replies were received within the first 25 days, with a right-skewed distribution. A small number of applications took over 100 days to receive a reply, which inflates the mean relative to the median.

Distribution plot

job_applications_clean %>%
  filter(days_to_reply > 0) %>%
  ggplot(aes(x = "", y = days_to_reply)) +
  geom_violin(fill = "#3498db", alpha = 0.4, linewidth = 0.3, trim = FALSE) +
  geom_boxplot(width = 0.2, fill = "#3498db", alpha = 0.7, outlier.alpha = 0.5) +
  scale_y_continuous(breaks = seq(0, 180, by = 20)) +
  labs(
    title = "Distribution of number of Days to Reply",
    subtitle = "Across all replied applications",
    x = "All applications",
    y = "Number of days to reply",
    caption = paste0("n = ", nrow(filter(job_applications_clean, days_to_reply > 0)))
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
    axis.title = element_text(size = 11, face = "bold"),
    axis.text = element_text(size = 10)
  )

Comment

The above graph shows that the distribution of days to reply is mostly concentrated below 25 days.

Descriptive statistics

Let’s examine the mean, minimum or maximum values of the days to reply based on the sectors.

job_applications_clean_mean <- job_applications_clean %>% 
  filter(days_to_reply > 0) %>% 
  group_by(sector_grouped) %>% 
  summarise(mean = mean(days_to_reply),
            st_var = sd(days_to_reply),
            min = min(days_to_reply),
            max = max(days_to_reply),
            median = median(days_to_reply),
            group_size = n()) %>% 
  ungroup() 
  
job_applications_clean_mean %>% 
    kable(
    caption = "Reply Time (days) by Sector",
    col.names = c("Sector", "Mean", "Std Dev", "Min", "Max", "Median", "N"),
    align = c("l", rep("c", 6))
  ) %>% 
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = FALSE
  )

Reply Time (days) by Sector
Sector	Mean	Std Dev	Min	Max	Median	N
Consulting & Services	28.09524	33.50508	1	119	18	21
Finance & Business	34.58333	19.77353	2	60	34	12
Industry & Manufacturing	25.34783	23.25473	2	85	20	23
Knowledge & Education	28.20000	21.08791	10	62	19	5
Public & Social Sector	33.54545	48.15467	1	174	26	11
Sustainability & Environment	43.40000	22.73324	11	72	40	5
NA	9.00000	NA	9	9	9	1

Comment

The group size per sector varies from 23 (Industry & Manufacturing) to 5 (Sustainability & Environment) which influences the mean. The mean of days to reply is 43 days for the Sustainability & Environment sector. The mean of number of days to replay is 25 for the Industry & Manufacturing sector, which is the lowest mean but also with the smallest group size.

Visualization

job_applications_clean %>%
  filter(days_to_reply > 0, !is.na(sector_grouped)) %>%
  ggplot(aes(x = sector_grouped, y = days_to_reply, fill = sector_grouped)) +
  geom_boxplot(alpha = 0.7, outlier.alpha = 0.5) +
  geom_jitter(width = 0.2, alpha = 0.3, size = 2) +
  labs(
    title = "Reply Time by Sector",
    subtitle = "Distribution of days to reply per sector",
    x = "Sector",
    y = "Days to Reply",
    caption = paste0("n = ", nrow(filter(job_applications_clean, days_to_reply > 0, !is.na(sector_grouped))))
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title = element_text(size = 11, face = "bold"),
    legend.position = "none"
  )

Comment

The Public & Social Sector has one of the maximal values for the days to reply. On average the Consulting & Services sector has the lowest number of days to reply.

Inferential statistics

In this chapter I want to test the relationship between the number of days to reply and other variables. But first, I have to test if it is normally distributed with a Shapiro-Wilk test.

shapiro.test(job_applications_clean$days_to_reply[!is.na(job_applications_clean$days_to_reply) & job_applications_clean$days_to_reply > 0])

## 
##  Shapiro-Wilk normality test
## 
## data:  job_applications_clean$days_to_reply[!is.na(job_applications_clean$days_to_reply) & job_applications_clean$days_to_reply > 0]
## W = 0.79231, p-value = 4.001e-09

Comment

The result of the test is

4.001e-09

which is below 0.05. It shows the variable of days to reply is not normally distributed across the dataset. This might be because the sample is small. Since I still want to test the relationship between the number of days to reply and the sectors, I use the Kruskal-Wallis test.

kruskal.test(days_to_reply ~ sector_grouped, data = job_applications_clean)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  days_to_reply by sector_grouped
## Kruskal-Wallis chi-squared = 8.4901, df = 5, p-value = 0.1312

Comment

The p-value is 0.1312, which is higher than 0.05 which means that I have to accept the null Hypothesis. This means there is no statistically significant difference in the number of days to reply across sectors. All sectors seem to have on average a similar number of days to reply.

4.4 Reply Time & Organisation Type

Descriptive statistics

Below I want to test the relationship between the number of days to reply and the organization type. The first step is to observe how the mean, minimum and maximum values behave depending on the organization type.

job_applications_clean_mean2 <- job_applications_clean %>% 
  filter(days_to_reply > 0) %>% 
  group_by(org_type) %>% 
  summarise(mean = mean(days_to_reply),
            st_var = sd(days_to_reply),
            min = min(days_to_reply),
            max = max(days_to_reply),
            median = median(days_to_reply),
            group_size = n()) %>% 
  ungroup()

job_applications_clean_mean2 %>% 
    kable(
    caption = "Reply Time (days) by Organization type",
    col.names = c("Org Type", "Mean", "Std Dev", "Min", "Max", "Median", "N"),
    align = c("l", rep("c", 6))
  ) %>% 
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = FALSE
  )

Reply Time (days) by Organization type
Org Type	Mean	Std Dev	Min	Max	Median	N
Academia	27.00000	11.31371	19	35	27.0	2
Accounting profession and auditors	27.87500	36.27450	1	107	15.0	8
Company (listed)	19.47059	18.85510	2	60	14.0	17
Company (unlisted)	27.68000	22.98862	2	85	21.0	25
Government	26.00000	13.28533	4	37	27.0	5
NGO / not for profit	34.40000	30.33809	1	119	26.0	15
Trade association	26.50000	17.67767	14	39	26.5	2
intergovernmental organization	83.66667	81.68435	15	174	62.0	3
NA	74.00000	NA	74	74	74.0	1

Comment

The most common organization is the company (unlisted) type followed by the NGO.

Visualization

job_applications_clean %>%
  filter(days_to_reply > 0, !is.na(org_type)) %>%
  ggplot(aes(x = org_type, y = days_to_reply, fill = org_type)) +
  geom_boxplot(alpha = 0.7, outlier.alpha = 0.5) +
  geom_jitter(width = 0.2, alpha = 0.3, size = 2) +
  labs(
    title = "Reply Time by Organisation Type",
    subtitle = "Distribution of days to reply per organisation type",
    x = "Organisation Type",
    y = "Days to Reply",
    caption = paste0("n = ", nrow(filter(job_applications_clean, days_to_reply > 0, !is.na(org_type))))
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title = element_text(size = 11, face = "bold"),
    legend.position = "none"
  )

Comment

The plot shows considerable variation in reply time across organisation types. Some organisation types such as intergovernmental organizations show a wider spread, indicating less consistency in their reply times, while others such as the Academia or Trade associations tend to reply more quickly and consistently.The organization type that is the longest to reply is the intergovernmental organization, with a median value higher tan 50 days. Whereas the median values of the remaining organization types are below 50 days. The organization types with the lowest days to reply are the Accounting profession and auditors and the listed companies.

4.5 Reply Time & Application Month

New columns: application month and reply month

job_applications4 <- job_applications_clean %>% 
  separate(month, into = c("month_name", "year"), sep = " ")


job_applications4 %>% select(-name, -website, -link_to_the_offer, -position, -sector, -status) %>% 
    kable(
    caption = "Applications") %>% 
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = FALSE
  )%>%
  scroll_box(height = "400px")

Applications
month_name	year	date_of_application	date_of_reply	location	org_type	min_years_of_experience	sector_lowercase	sector_clean	sector_grouped	replied	days_to_reply
May	2024	2024-05-05	NA	Zurich	Company (listed)	NA	automation	automation	Consulting & Services	FALSE	NA
May	2025	2025-05-05	2025-07-18	NA	NA	NA	recruitment	recruitment	Consulting & Services	TRUE	74
October	2024	2024-10-21	NA	Geneva	NA	NA	recruitment	recruitment	Consulting & Services	FALSE	NA
June	2025	NA	NA	Geneva	NA	NA	recruitment	recruitment	Consulting & Services	FALSE	NA
August	2024	2024-08-19	2024-09-13	Geneva	Company (unlisted)	5.0	aviation	aviation	Industry & Manufacturing	TRUE	25
August	2025	2025-08-29	NA	Geneva	Company (unlisted)	NA	recruitment	recruitment	Consulting & Services	FALSE	NA
December	2024	2024-12-17	NA	Geneva	NA	NA	recruitment	recruitment	Consulting & Services	FALSE	NA
June	2024	2024-06-25	NA	Geneva	NGO / not for profit	NA	culture	culture	Public & Social Sector	FALSE	NA
April	2025	NA	NA	NA	Company (listed)	NA	insurance	insurance	Finance & Business	FALSE	NA
NA	NA	NA	NA	Olten	Company (unlisted)	NA	energy	energy	Sustainability & Environment	FALSE	NA
August	2025	2025-08-29	NA	Geneva	Company (unlisted)	NA	consulting	consulting	Consulting & Services	FALSE	NA
August	2025	2025-08-29	NA	Geneva	Company (unlisted)	NA	consulting	consulting	Consulting & Services	FALSE	NA
August	2025	2025-07-29	2025-08-06	Geneva	Company (unlisted)	3.0	consulting	consulting	Consulting & Services	TRUE	8
May	2024	2024-05-22	NA	Lausanne	NGO / not for profit	3.0	certification	certification	Consulting & Services	FALSE	NA
August	2025	2025-07-30	NA	Geneva	NGO / not for profit	5.0	certification	certification	Consulting & Services	FALSE	NA
January	2025	2025-01-18	NA	NA	Company (unlisted)	NA	recruitment	recruitment	Consulting & Services	FALSE	NA
February	2025	2025-02-12	2025-04-02	Geneva	NGO / not for profit	NA	commodities	commodities	Industry & Manufacturing	TRUE	49
May	2024	2024-05-22	NA	Geneva	NGO / not for profit	NA	commodities	commodities	Industry & Manufacturing	FALSE	NA
December	2024	2024-12-06	NA	NA	Company (unlisted)	NA	consulting	consulting	Consulting & Services	FALSE	NA
November	2024	2024-11-05	2024-11-12	Basel	Company (unlisted)	3.0	garment	garment	Industry & Manufacturing	TRUE	7
June	2025	2025-06-29	2025-07-18	Fribourg	Company (unlisted)	NA	manufacture	manufacturing	Industry & Manufacturing	TRUE	19
July	2025	2025-07-26	2025-08-21	Geneva	NGO / not for profit	2.0	finance	finance	Finance & Business	TRUE	26
November	2024	2024-11-28	2024-12-03	Meyrin	Company (listed)	2.0	commodities	commodities	Industry & Manufacturing	TRUE	5
July	2024	2024-07-02	2024-07-17	Geneva	intergovernmental organization	1.0	science	science	Knowledge & Education	TRUE	15
October	2024	2024-10-10	2024-12-11	Geneva	intergovernmental organization	2.0	science	science	Knowledge & Education	TRUE	62
February	2025	2025-02-20	NA	Geneva	intergovernmental organization	0.0	science	science	Knowledge & Education	FALSE	NA
NA	NA	NA	NA	NA	intergovernmental organization	NA	science	science	Knowledge & Education	FALSE	NA
NA	NA	NA	NA	NA	Company (unlisted)	NA	supply chain	supply chain	Consulting & Services	FALSE	NA
October	2024	2024-10-24	2024-11-14	Geneva	Company (unlisted)	NA	automation	automation	Consulting & Services	TRUE	21
December	2024	2024-12-02	2024-12-16	Pratteln	Company (listed)	5.0	chemicals	chemicals	Industry & Manufacturing	TRUE	14
November	2024	2024-11-20	NA	Geneva	NA	8.0	recruitment	recruitment	Consulting & Services	FALSE	NA
March	2025	2025-03-25	NA	Geneva	NGO / not for profit	3.0	biodiversity	biodiversity	Sustainability & Environment	FALSE	NA
NA	NA	NA	NA	Zurich	Accounting profession and auditors	NA	consulting	consulting	Consulting & Services	FALSE	NA
March	2025	2025-03-21	2025-03-25	Zurich	Accounting profession and auditors	NA	consulting	consulting	Consulting & Services	TRUE	4
June	2025	2025-06-19	2025-06-19	Geneva	Accounting profession and auditors	NA	consulting	consulting	Consulting & Services	TRUE	0
July	2024	2024-07-29	2024-08-14	Geneva	Company (unlisted)	4.0	consulting	consulting	Consulting & Services	TRUE	16
June	2025	2025-06-29	2025-07-04	Lausanne	Company (unlisted)	NA	consulting	consulting	Consulting & Services	TRUE	5
April	2025	2025-04-11	NA	Geneva	NGO / not for profit	5.0	biodiversity	biodiversity	Sustainability & Environment	FALSE	NA
March	2025	2025-03-25	2025-04-29	Renens	Academia	5.0	academia	academia	Knowledge & Education	TRUE	35
June	2025	2025-06-20	2025-07-09	Lausanne	Academia	NA	academia	academia	Knowledge & Education	TRUE	19
August	2025	2025-08-19	2025-08-20	NA	NGO / not for profit	NA	human rights	human rights	Public & Social Sector	TRUE	1
February	2025	2025-02-04	NA	Geneva	Company (unlisted)	5.0	consulting	consulting	Consulting & Services	FALSE	NA
October	2024	2024-10-14	2024-11-12	Geneva	Company (unlisted)	NA	finance	finance	Finance & Business	TRUE	29
September	2024	2024-09-20	2024-01-24	Geneva	Accounting profession and auditors	NA	consulting	consulting	Consulting & Services	TRUE	-240
January	2025	2025-01-20	2025-01-30	Geneva	Accounting profession and auditors	NA	consulting	consulting	Consulting & Services	TRUE	10
April	2025	2025-04-28	2025-08-13	Geneva	Accounting profession and auditors	NA	consulting	consulting	Consulting & Services	TRUE	107
February	2025	2025-02-21	NA	Geneva	NGO / not for profit	NA	automobile	automobile	Industry & Manufacturing	FALSE	NA
April	2025	2025-04-03	2025-04-07	Geneva	Company (listed)	1.0	chemicals	chemicals	Industry & Manufacturing	TRUE	4
June	2024	2024-06-09	NA	NA	NA	NA	recruitment	recruitment	Consulting & Services	FALSE	NA
July	2024	2024-07-02	2024-07-04	Geneva	Company (listed)	3.0	chemicals	chemicals	Industry & Manufacturing	TRUE	2
July	2024	2024-07-27	2024-08-17	Geneva	Company (listed)	3.0	chemicals	chemicals	Industry & Manufacturing	TRUE	21
August	2024	2024-08-07	NA	Geneva	NGO / not for profit	NA	finance	finance	Finance & Business	FALSE	NA
July	2024	2024-07-27	2024-08-14	Remote	NGO / not for profit	3.0	standard-setter	standard-setter	Consulting & Services	TRUE	18
NA	NA	NA	NA	NA	NGO / not for profit	NA	standard-setter	standard-setter	Consulting & Services	FALSE	NA
May	2024	2024-05-05	2024-06-30	Geneva	Company (listed)	3.0	water	water	Sustainability & Environment	TRUE	56
NA	NA	NA	NA	NA	NGO / not for profit	NA	human rights	human rights	Public & Social Sector	FALSE	NA
November	2024	2024-11-02	2024-11-06	Basel	Company (listed)	NA	insurance	insurance	Finance & Business	TRUE	4
July	2025	2025-07-26	NA	NA	NA	NA	recruitment	recruitment	Consulting & Services	FALSE	NA
November	2024	2024-11-20	2024-12-16	Pfäfikon	Company (listed)	3.0	automation	automation	Consulting & Services	TRUE	26
June	2024	2024-06-18	NA	Geneva	Trade association	NA	aviation	aviation	Industry & Manufacturing	FALSE	NA
December	2024	2024-12-25	2025-01-08	Geneva	Trade association	1.0	aviation	aviation	Industry & Manufacturing	TRUE	14
May	2025	2025-05-08	2025-06-16	Geneva	Trade association	1.0	aviation	aviation	Industry & Manufacturing	TRUE	39
July	2025	2025-07-15	2025-08-04	Geneva	Company (unlisted)	NA	furniture	furniture	Industry & Manufacturing	TRUE	20
NA	NA	NA	NA	NA	Company (listed)	NA	real estate	real estate	Finance & Business	FALSE	NA
April	2025	2025-04-14	2025-04-16	Geneva	Company (unlisted)	NA	finance	finance	Finance & Business	TRUE	2
July	2024	2024-07-27	2024-08-11	Lausanne	NGO / not for profit	NA	sport	sport	Public & Social Sector	TRUE	15
February	2025	2025-02-04	2025-06-03	Geneva	NGO / not for profit	2.0	standards-setter	standard-setter	Consulting & Services	TRUE	119
August	2024	2024-08-29	2024-10-08	Gland	NGO / not for profit	0.5	biodiversity	biodiversity	Sustainability & Environment	TRUE	40
January	2025	2025-01-22	2025-04-04	Gland	NGO / not for profit	5.0	biodiversity	biodiversity	Sustainability & Environment	TRUE	72
March	2024	2024-03-14	2024-03-20	Geneva	Accounting profession and auditors	2.0	consulting	consulting	Consulting & Services	TRUE	6
May	2025	2025-05-08	2025-05-18	Geneva	Company (listed)	NA	chemicals	chemicals	Industry & Manufacturing	TRUE	10
October	2024	2024-10-18	NA	Geneva	NA	7.0	recruitment	recruitment	Consulting & Services	FALSE	NA
April	2025	NA	NA	NA	NGO / not for profit	NA	certification	certification	Consulting & Services	FALSE	NA
May	2024	2024-05-10	NA	Geneva	NA	3.0	recruitment	recruitment	Consulting & Services	FALSE	NA
September	2024	2024-09-27	2024-11-04	Geneva	Company (unlisted)	1.0	shipping	shipping	Industry & Manufacturing	TRUE	38
December	2024	2024-12-11	NA	Geneva	Company (unlisted)	NA	shipping	shipping	Industry & Manufacturing	FALSE	NA
February	2025	2025-02-21	2025-02-21	Geneva	Company (unlisted)	NA	shipping	shipping	Industry & Manufacturing	TRUE	0
June	2024	2024-06-23	NA	Remote	NGO / not for profit	NA	sustainability	sustainability	Sustainability & Environment	FALSE	NA
October	2024	2024-10-03	NA	Remote	NGO / not for profit	NA	sustainable finance	sustainable finance	Finance & Business	FALSE	NA
September	2024	2024-09-22	2024-09-25	La Tour de Peliz	Company (listed)	NA	beverage	beverage	Industry & Manufacturing	TRUE	3
September	2024	2024-09-22	2024-09-22	Vevey	Company (listed)	NA	beverage	beverage	Industry & Manufacturing	TRUE	0
NA	NA	NA	NA	NA	Company (listed)	NA	beverage	beverage	Industry & Manufacturing	FALSE	NA
May	2025	2025-05-02	2025-05-27	Vevey	Company (listed)	NA	beverage	beverage	Industry & Manufacturing	TRUE	25
August	2025	2025-08-29	NA	Geneva	Company (unlisted)	NA	manufacturing	manufacturing	Industry & Manufacturing	FALSE	NA
NA	NA	NA	NA	NA	Company (unlisted)	NA	NA	NA	NA	FALSE	NA
September	2024	2024-09-21	2024-09-23	Geneva	Company (listed)	5.0	chemicals	chemicals	Industry & Manufacturing	TRUE	2
April	2025	2025-04-11	2025-05-07	Bern	Government	NA	government	government	Public & Social Sector	TRUE	26
June	2025	2025-06-20	2025-07-29	Geneva	Company (unlisted)	2.0	banking	banking	Finance & Business	TRUE	39
March	2025	2025-03-25	NA	Geneva	NGO / not for profit	NA	certification	certification	Consulting & Services	FALSE	NA
July	2025	2025-07-16	2025-07-21	Lausanne	Company (unlisted)	NA	it	it	Consulting & Services	TRUE	5
April	2024	2024-04-29	NA	Geneva	Company (unlisted)	5.0	banking	banking	Finance & Business	FALSE	NA
June	2024	2024-06-18	2024-08-13	Geneva	Company (unlisted)	3.0	banking	banking	Finance & Business	TRUE	56
August	2024	2024-08-07	2024-10-03	Geneva	Company (unlisted)	NA	banking	banking	Finance & Business	TRUE	57
NA	NA	NA	NA	Geneva	Company (unlisted)	NA	banking	banking	Finance & Business	FALSE	NA
NA	NA	NA	NA	Lausanne	Company (listed)	NA	NA	NA	NA	FALSE	NA
April	2025	2025-04-11	NA	Geneva	NGO / not for profit	5.0	environment	environment	Sustainability & Environment	FALSE	NA
January	2025	2025-01-03	2025-01-13	Lausanne	NGO / not for profit	NA	media	media	Knowledge & Education	TRUE	10
September	2024	2024-09-10	2024-09-11	Geneva	Accounting profession and auditors	NA	consulting	consulting	Consulting & Services	TRUE	1
October	2024	2024-10-03	2024-10-23	Geneva	Accounting profession and auditors	2.0	consulting	consulting	Consulting & Services	TRUE	20
December	2024	2024-12-19	2025-02-12	Geneva	Accounting profession and auditors	NA	consulting	consulting	Consulting & Services	TRUE	55
December	2024	2024-12-25	2025-01-14	Geneva	Accounting profession and auditors	2.0	consulting	consulting	Consulting & Services	TRUE	20
May	2024	2024-05-30	NA	Lausanne	Company (unlisted)	3.0	consulting	consulting	Consulting & Services	FALSE	NA
July	2024	2024-07-04	2024-07-19	Lausanne	Company (unlisted)	5.0	consulting	consulting	Consulting & Services	TRUE	15
February	2025	2025-02-21	2025-04-02	Lausanne	Company (unlisted)	3.0	consulting	consulting	Consulting & Services	TRUE	40
November	2024	2024-11-12	2024-11-12	Geneva	Company (listed)	3.0	manufacture	manufacturing	Industry & Manufacturing	TRUE	0
March	2025	2025-03-08	2025-04-25	Geneva	Company (listed)	NA	manufacture	manufacturing	Industry & Manufacturing	TRUE	48
June	2025	2025-06-20	NA	Geneva	Company (listed)	3.0	manufacture	manufacturing	Industry & Manufacturing	FALSE	NA
October	2024	2024-10-18	NA	Geneva	NA	NA	recruitment	recruitment	Consulting & Services	FALSE	NA
January	2025	2025-01-17	NA	Geneva	NA	4.0	recruitment	recruitment	Consulting & Services	FALSE	NA
June	2025	2025-06-27	NA	Geneva	NA	NA	recruitment	recruitment	Consulting & Services	FALSE	NA
November	2024	2024-11-05	2024-11-14	Ebikon	Company (listed)	10.0	pharmaceutical	pharmaceutical	NA	TRUE	9
August	2024	2024-08-19	2024-11-11	Geneva	Company (unlisted)	5.0	manufacture/watchmaking	manufacturing	Industry & Manufacturing	TRUE	84
December	2024	2024-12-19	2025-01-17	Geneva	Company (unlisted)	NA	manufacture	manufacturing	Industry & Manufacturing	TRUE	29
May	2025	2025-05-05	2025-05-21	Geneva	Company (unlisted)	3.0	manufacture	manufacturing	Industry & Manufacturing	TRUE	16
May	2025	2025-05-28	2025-08-21	Geneva	Company (unlisted)	5.0	manufacture	manufacturing	Industry & Manufacturing	TRUE	85
January	2025	2025-01-22	2025-01-24	Nyon	Company (unlisted)	NA	consulting	consulting	Consulting & Services	TRUE	2
November	2024	2024-11-07	2024-11-25	Basel	Company (listed)	NA	automation	automation	Consulting & Services	TRUE	18
July	2025	2025-07-21	NA	Zurich	Company (listed)	NA	finance	finance	Finance & Business	FALSE	NA
August	2025	2025-08-29	NA	Zug	Company (listed)	NA	consulting	consulting	Consulting & Services	FALSE	NA
May	2025	2025-05-28	NA	NA	NA	2.0	recruitment	recruitment	Consulting & Services	FALSE	NA
NA	NA	NA	NA	Lausanne	NGO / not for profit	NA	energy	energy	Sustainability & Environment	FALSE	NA
May	2024	2024-05-10	NA	Basel	NGO / not for profit	6.0	recruitment	recruitment	Consulting & Services	FALSE	NA
August	2025	2025-07-29	2025-08-07	Bern	NGO / not for profit	NA	human rights	human rights	Public & Social Sector	TRUE	9
October	2024	2024-10-03	2024-10-30	Basel	NGO / not for profit	NA	human rights	human rights	Public & Social Sector	TRUE	27
April	2025	2025-04-04	2025-06-03	Gland	Company (listed)	NA	banking	banking	Finance & Business	TRUE	60
December	2024	2024-12-12	NA	Nyon	Company (unlisted)	NA	consulting	consulting	Consulting & Services	FALSE	NA
February	2025	2025-02-12	NA	Geneva	Company (unlisted)	NA	manufacture	manufacturing	Industry & Manufacturing	FALSE	NA
June	2025	2025-06-17	NA	Geneva	NGO / not for profit	2.0	biodiversity	biodiversity	Sustainability & Environment	FALSE	NA
April	2024	2024-04-20	NA	Geneva	Company (unlisted)	3.0	shipping	shipping	Industry & Manufacturing	FALSE	NA
July	2025	2025-07-06	NA	Remote	NGO / not for profit	NA	standard-setter	standard-setter	Consulting & Services	FALSE	NA
November	2024	2024-11-25	NA	Geneva	Company (unlisted)	NA	commodities	commodities	Industry & Manufacturing	FALSE	NA
December	2024	2024-12-11	2025-01-05	Zurich	Company (unlisted)	1.0	banking	banking	Finance & Business	TRUE	25
February	2025	2025-02-12	2025-02-25	Nyon	NGO / not for profit	NA	sport	sport	Public & Social Sector	TRUE	13
September	2024	2024-09-22	NA	Geneva	Government	3.0	diplomacy	diplomacy	Public & Social Sector	FALSE	NA
March	2025	2025-03-30	NA	Remote	intergovernmental organization	2.0	human rights	human rights	Public & Social Sector	FALSE	NA
NA	NA	NA	NA	Geneva	Academia	3.0	academia	academia	Knowledge & Education	FALSE	NA
NA	NA	NA	NA	NA	intergovernmental organization	NA	human rights	human rights	Public & Social Sector	FALSE	NA
March	2025	2025-03-21	2025-09-11	Remote	intergovernmental organization	2.0	human rights	human rights	Public & Social Sector	TRUE	174
March	2025	2025-03-02	NA	Remote	NGO / not for profit	5.0	education	education	Knowledge & Education	FALSE	NA
May	2024	2024-05-14	NA	Geneva	Company (unlisted)	5.0	manufacture/watchmaking	manufacturing	Industry & Manufacturing	FALSE	NA
June	2025	2025-06-17	2025-07-11	Stabio	Company (listed)	NA	garnment	garment	Industry & Manufacturing	TRUE	24
January	2025	2025-01-03	2025-01-30	Geneva	Government	NA	government	government	Public & Social Sector	TRUE	27
February	2025	2025-02-04	2025-03-12	Geneva	Government	NA	government	government	Public & Social Sector	TRUE	36
July	2025	2025-07-15	2025-08-21	Geneva	Government	5.0	government	government	Public & Social Sector	TRUE	37
July	2025	2025-07-04	2025-07-08	Geneva	Government	5.0	government	government	Public & Social Sector	TRUE	4
March	2025	2025-03-09	NA	Geneva	Government	3.0	government	government	Public & Social Sector	FALSE	NA
March	2024	2024-03-07	2024-03-18	Geneva	Company (unlisted)	1.0	energy	energy	Sustainability & Environment	TRUE	11
June	2024	2024-06-09	NA	Geneva	Company (unlisted)	NA	energy	energy	Sustainability & Environment	FALSE	NA
February	2025	2025-02-07	2025-03-17	Geneva	Company (unlisted)	NA	energy	energy	Sustainability & Environment	TRUE	38
January	2025	2025-01-10	NA	Geneva	Company (unlisted)	NA	commodities	commodities	Industry & Manufacturing	FALSE	NA
June	2024	2024-06-18	2024-07-28	Geneva	NGO / not for profit	NA	finance	finance	Finance & Business	TRUE	40
October	2024	2024-10-10	2024-11-03	Geneva	NGO / not for profit	5.0	finance	finance	Finance & Business	TRUE	24
January	2025	2025-01-22	2025-03-16	Geneva	NGO / not for profit	2.0	finance	finance	Finance & Business	TRUE	53
September	2025	2025-09-12	NA	Geneva	NGO / not for profit	3.0	finance	finance	Finance & Business	FALSE	NA
August	2024	2024-08-07	NA	Geneva	intergovernmental organization	NA	environment	environment	Sustainability & Environment	FALSE	NA
August	2024	2024-08-14	NA	Geneva	intergovernmental organization	NA	environment	environment	Sustainability & Environment	FALSE	NA
August	2024	2024-08-19	NA	Geneva	intergovernmental organization	2.0	environment	environment	Sustainability & Environment	FALSE	NA
September	2024	2024-09-26	NA	Geneva	intergovernmental organization	3.0	environment	environment	Sustainability & Environment	FALSE	NA
November	2024	2024-11-01	NA	Geneva	intergovernmental organization	2.0	environment	environment	Sustainability & Environment	FALSE	NA
January	2025	2025-01-10	NA	Geneva	intergovernmental organization	NA	environment	environment	Sustainability & Environment	FALSE	NA
February	2025	2025-02-24	NA	Geneva	intergovernmental organization	NA	environment	environment	Sustainability & Environment	FALSE	NA
March	2025	2025-03-21	NA	Geneva	intergovernmental organization	NA	environment	environment	Sustainability & Environment	FALSE	NA
April	2025	2025-04-28	NA	Geneva	intergovernmental organization	NA	environment	environment	Sustainability & Environment	FALSE	NA
May	2025	2025-05-28	NA	Geneva	intergovernmental organization	NA	environment	environment	Sustainability & Environment	FALSE	NA
May	2025	2025-05-28	NA	Geneva	intergovernmental organization	NA	environment	environment	Sustainability & Environment	FALSE	NA
May	2025	2025-05-28	NA	Geneva	intergovernmental organization	NA	environment	environment	Sustainability & Environment	FALSE	NA
July	2025	2025-07-25	NA	Geneva	intergovernmental organization	NA	environment	environment	Sustainability & Environment	FALSE	NA
July	2025	2025-07-26	NA	Geneva	intergovernmental organization	NA	environment	environment	Sustainability & Environment	FALSE	NA
NA	NA	NA	NA	NA	NA	NA	commodities	commodities	Industry & Manufacturing	FALSE	NA
August	2024	2024-08-29	NA	Remote	NGO / not for profit	2.0	environment	environment	Sustainability & Environment	FALSE	NA

job_applications4_months <- job_applications4 %>% 
  mutate(
    month_name = factor(
      month_name,
      levels = month.name,  # jan → dec
      ordered = TRUE
    )
  )

levels(job_applications4_months$month_name)

##  [1] "January"   "February"  "March"     "April"     "May"       "June"     
##  [7] "July"      "August"    "September" "October"   "November"  "December"

job_application6 <- job_applications4_months %>% 
  mutate(
    reply_yearmonth = floor_date(date_of_reply, unit = "month")  # 2024-04-01, 2025-04-01
  )

Descriptive statistics

job_applications_month_summary <- job_application6


job_applications_mean <- job_applications_month_summary %>% 
  filter(days_to_reply > 0) %>% 
  group_by(month_name) %>% 
  summarise(mean = mean(days_to_reply),
            st_var = sd(days_to_reply),
            min = min(days_to_reply),
            max = max(days_to_reply),
            median = median(days_to_reply),
            group_size = n()) %>% 
  ungroup()

job_applications_mean %>% 
     kable(
    caption = "Descriptive statistics of the reply time range based on the application month") %>% 
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = FALSE
  )

Descriptive statistics of the reply time range based on the application month
month_name	mean	st_var	min	max	median	group_size
January	29.00000	27.856777	2	72	18.5	6
February	49.16667	36.240401	13	119	39.0	6
March	46.33333	64.957422	4	174	23.0	6
April	39.80000	44.228950	2	107	26.0	5
May	43.57143	29.010671	10	85	39.0	7
June	28.85714	17.082433	5	56	24.0	7
July	16.16667	9.768533	2	37	15.5	12
August	32.00000	30.298515	1	84	25.0	7
September	11.00000	18.018509	1	38	2.5	4
October	30.50000	15.808226	20	62	25.5	6
November	11.50000	8.689074	4	26	8.0	6
December	26.16667	15.328622	14	55	22.5	6

ggplot(job_applications_mean, aes(x = month_name)) +
  geom_linerange(aes(ymin = min, ymax = max, color = month_name),
                 linewidth = 1) +
  geom_point(aes(y = mean, colour = month_name),
    size = 2) +
  labs(
    title = "Reply Time Range by Application Month",
    x = "Months",
    y = "Days to Reply",
    colour = "Months")+
  theme_minimal()+
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1))

Comment

I notice that the reply time range is th longest with applications from March, February or April, August is also a month during which the applications sent take longer to be replied to. This can start of the year can be the slowest and busiest for companies, with annual reports or other administrative tasks to manage, while the rest of the year can be a more quiet time.

The next step is to compute the number of replies per month of reply. I have to create a new column:

replies_per_month <- job_application6 %>%
    filter(!is.na(reply_yearmonth)) %>%        # only keep rows with a reply date
  filter(!is.na(date_of_reply)) %>%           # double check reply date exists
  group_by(reply_yearmonth) %>%
  summarise(n_replies =n()) %>% 
  ungroup()

Visualization

ggplot(replies_per_month, aes(x = reply_yearmonth, y = n_replies)) +
  geom_col(fill = "#3498db") +
  labs(title = "Number of Replies per Month", x = "Month-Year", y = "Replies") +
  theme_minimal() +
  scale_x_date(date_labels = "%b %Y", date_breaks = "1 month") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Comment

November 2024, January 2025, July 2025 and August 2025 were the months with the most replies. Although the number of application was of 10 per month.

Inferential statistics

I want to observe whether the number of replies is normally distributed across the dataset.

shapiro.test(replies_per_month$n_replies)

## 
##  Shapiro-Wilk normality test
## 
## data:  replies_per_month$n_replies
## W = 0.919, p-value = 0.1241

The p-value is higher than 0.05. The Shapiro test shows that the number of replies per month is normally distributed. The number of observations is 18 so it won’t be enough for an ANOVA test or a Kruskal-Wallis test.

replies_per_month <- replies_per_month %>%
  mutate(time_index = row_number())  # 1, 2, 3... 18

cor.test(replies_per_month$time_index, 
         replies_per_month$n_replies, 
         method = "spearman")

## 
##  Spearman's rank correlation rho
## 
## data:  replies_per_month$time_index and replies_per_month$n_replies
## S = 656.83, p-value = 0.1923
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.3221566

Comment

A Spearman correlation was used to test whether the number of replies changed over time. Given that each month contains only one aggregated observation, group comparison tests such as ANOVA or Kruskal-Wallis were not appropriate. The results show a weak positive correlation between time and number of replies (rho = 0.322, p = 0.192), suggesting a slight upward trend over the job search period, though this trend does not reach statistical significance. The number of replies per month therefore appears relatively stable across the observation period.

4.6 Reply Status & Sector

Descriptive statistics

I want to know if the response rate is different from sector to sector, therefore I create a new column with the percentage of response.

job_applications_month_summary %>%
  filter(!is.na(sector_grouped), !is.na(replied)) %>%
  group_by(sector_grouped, replied) %>%
  summarise(n = n()) %>%
  mutate(percentage = round(n / sum(n) * 100, 1)) %>% 
       kable(
    caption = "Descriptive statistics of the reply status based on the sector") %>% 
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = FALSE
  )

Descriptive statistics of the reply status based on the sector
sector_grouped	replied	n	percentage
Consulting & Services	FALSE	31	57.4
Consulting & Services	TRUE	23	42.6
Finance & Business	FALSE	8	40.0
Finance & Business	TRUE	12	60.0
Industry & Manufacturing	FALSE	13	33.3
Industry & Manufacturing	TRUE	26	66.7
Knowledge & Education	FALSE	4	44.4
Knowledge & Education	TRUE	5	55.6
Public & Social Sector	FALSE	6	35.3
Public & Social Sector	TRUE	11	64.7
Sustainability & Environment	FALSE	23	82.1
Sustainability & Environment	TRUE	5	17.9

Visualization

job_applications_month_summary %>%
  filter(!is.na(sector_grouped), !is.na(replied)) %>%
  group_by(sector_grouped, replied) %>%
  summarise(n = n()) %>%
  mutate(percentage = round(n / sum(n) * 100, 1)) %>%
  ggplot(aes(x = sector_grouped, y = percentage, fill = replied)) +
  geom_col(position = "fill", alpha = 0.8) +
  scale_y_continuous(labels = scales::percent) +
  scale_fill_manual(values = c("#e74c3c", "#3498db"), 
                    labels = c("No reply", "Replied")) +
  labs(
    title = "Reply Status by Sector",
    subtitle = "Proportion of replied vs non-replied applications per sector",
    x = "Sector",
    y = "Percentage",
    fill = "Reply Status"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title = element_text(size = 11, face = "bold")
  )+
  geom_text(
    data = ~ filter(.x, replied == TRUE),                  # only label the replied segment
    aes(label = paste0(percentage, "%")),
    position = position_fill(vjust = 0.5),
    size = 3.5, color = "white", fontface = "bold"
  )

Comment

The response rate is the lowest for the Sustainability & Environment sector (17.9%), while it is the highest for the Industry and Manufacturing sector 66.7%.

Inferential

In this case I want to test the relationship between a logical variable and a categorical value.

Chi-square test

chisq.test(table(job_applications_month_summary$replied, job_applications_month_summary$sector_grouped))

## 
##  Pearson's Chi-squared test
## 
## data:  table(job_applications_month_summary$replied, job_applications_month_summary$sector_grouped)
## X-squared = 19.424, df = 5, p-value = 0.001602

Comment

The p-value from the Chi-square test is below the 0.05 threshold. Which means that there is a statistically significant difference between the groups of sectors.

fisher.test(table(job_applications_month_summary$replied, job_applications_month_summary$sector_grouped), 
            simulate.p.value = TRUE)

## 
##  Fisher's Exact Test for Count Data with simulated p-value (based on
##  2000 replicates)
## 
## data:  table(job_applications_month_summary$replied, job_applications_month_summary$sector_grouped)
## p-value = 0.001499
## alternative hypothesis: two.sided

Comment

The Fisher exact test confirms the Chi-square result, showing a statistically significant relationship between reply status and sector (p < 0.05). This result suggests that the sector of the organisation influences the likelihood of receiving a reply. Given the warning raised by the Chi-square test about small cell counts, the Fisher test result is the more reliable of the two.

4.7 Reply Status & Organisation Type

Descriptive statistics

job_applications_month_summary %>%
  filter(!is.na(org_type), !is.na(replied)) %>%
  group_by(org_type, replied) %>%
  summarise(n = n()) %>%
  mutate(percentage = round(n / sum(n) * 100, 1)) %>% 
       kable(
    caption = "Descriptive statistics of the reply time range based on the organization type") %>% 
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = FALSE
  )

Descriptive statistics of the reply time range based on the organization type
org_type	replied	n	percentage
Academia	FALSE	1	33.3
Academia	TRUE	2	66.7
Accounting profession and auditors	FALSE	1	9.1
Accounting profession and auditors	TRUE	10	90.9
Company (listed)	FALSE	8	29.6
Company (listed)	TRUE	19	70.4
Company (unlisted)	FALSE	21	44.7
Company (unlisted)	TRUE	26	55.3
Government	FALSE	2	28.6
Government	TRUE	5	71.4
NGO / not for profit	FALSE	22	59.5
NGO / not for profit	TRUE	15	40.5
Trade association	FALSE	1	33.3
Trade association	TRUE	2	66.7
intergovernmental organization	FALSE	18	85.7
intergovernmental organization	TRUE	3	14.3

Visualization

job_applications_month_summary %>%
  filter(!is.na(org_type), !is.na(replied)) %>%
  group_by(org_type, replied) %>%
  summarise(n = n()) %>%
  mutate(percentage = round(n / sum(n) * 100, 1)) %>%
  ggplot(aes(x = org_type, y = percentage, fill = replied)) +
  geom_col(position = "fill", alpha = 0.8) +
  scale_y_continuous(labels = scales::percent) +
  scale_fill_manual(values = c("#e74c3c", "#3498db"), 
                    labels = c("No reply", "Replied")) +
  labs(
    title = "Reply Status by Organization Type",
    subtitle = "Proportion of replied vs non-replied applications per organization type",
    x = "Organization Type",
    y = "Percentage",
    fill = "Reply Status"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title = element_text(size = 11, face = "bold")
  )+
  geom_text(
    data = ~ filter(.x, replied == TRUE),                  # only label the replied segment
    aes(label = paste0(percentage, "%")),
    position = position_fill(vjust = 0.5),
    size = 3.5, color = "white", fontface = "bold"
  )

Comment

The organization type with the highest response rate is the Accounting profession and auditors with 90%, while the intergovernmental organizations have the lowest rate (14%). This result is aligned with the analysis of the time to reply to the applications.

Inferential

Fisher test

fisher.test(table(job_applications_month_summary$replied, job_applications_month_summary$org_type), 
            simulate.p.value = TRUE)

## 
##  Fisher's Exact Test for Count Data with simulated p-value (based on
##  2000 replicates)
## 
## data:  table(job_applications_month_summary$replied, job_applications_month_summary$org_type)
## p-value = 0.0009995
## alternative hypothesis: two.sided

Comment

Fisher’s exact test revealed a statistically significant relationship between reply status and organization type (p = 0.0005). This confirms that the type of organization significantly influences whether an application receives a reply. Despite the significant global result, the small group sizes for some organisation types mean these findings should be interpreted with caution.

4.8 Reply Status & Application Month

Descriptive statistics

I want to know the percentage of replies per application month.

job_applications_month_summary %>%
  filter(!is.na(month_name), !is.na(replied)) %>%
  group_by(month_name, replied) %>%
  summarise(n = n()) %>%
  mutate(percentage = round(n / sum(n) * 100, 1)) %>% 
       kable(
    caption = "Descriptive statistics of the reply status based on the application month") %>% 
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = FALSE
  )

Descriptive statistics of the reply status based on the application month
month_name	replied	n	percentage
January	FALSE	4	40.0
January	TRUE	6	60.0
February	FALSE	5	41.7
February	TRUE	7	58.3
March	FALSE	6	50.0
March	TRUE	6	50.0
April	FALSE	7	58.3
April	TRUE	5	41.7
May	FALSE	11	61.1
May	TRUE	7	38.9
June	FALSE	9	52.9
June	TRUE	8	47.1
July	FALSE	5	29.4
July	TRUE	12	70.6
August	FALSE	11	61.1
August	TRUE	7	38.9
September	FALSE	3	33.3
September	TRUE	6	66.7
October	FALSE	4	40.0
October	TRUE	6	60.0
November	FALSE	3	30.0
November	TRUE	7	70.0
December	FALSE	4	40.0
December	TRUE	6	60.0

Visualization

job_applications_month_summary %>%
  filter(!is.na(month_name), !is.na(replied)) %>%
  group_by(month_name, replied) %>%
  summarise(n = n()) %>%
  mutate(percentage = round(n / sum(n) * 100, 1)) %>%
  ggplot(aes(x = month_name, y = percentage, fill = replied)) +
  geom_col(position = "fill", alpha = 0.8) +
  scale_y_continuous(labels = scales::percent) +
  scale_fill_manual(values = c("#e74c3c", "#3498db"), 
                    labels = c("No reply", "Replied")) +
  labs(
    title = "Reply Status by Application Month",
    subtitle = "Proportion of replied vs non-replied applications per month",
    x = "Application Month",
    y = "Percentage",
    fill = "Reply Status"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title = element_text(size = 11, face = "bold")
  )+
  geom_text(
    data = ~ filter(.x, replied == TRUE),                  # only label the replied segment
    aes(label = paste0(percentage, "%")),
    position = position_fill(vjust = 0.5),
    size = 2.5, color = "white", fontface = "bold"
  )

Inferential Analysis

Fisher test

The Fischer test helps me to test the relationship between the reply status and application month.

fisher.test(table(job_applications_month_summary$replied, job_applications_month_summary$month_name), 
            simulate.p.value = TRUE)

## 
##  Fisher's Exact Test for Count Data with simulated p-value (based on
##  2000 replicates)
## 
## data:  table(job_applications_month_summary$replied, job_applications_month_summary$month_name)
## p-value = 0.6967
## alternative hypothesis: two.sided

Comment

The p-value is 0.68, which is above the 0.05 threshold. This means that the difference is not statistically significant. And that the month might not have an influence on the response rate.

5. Cross Analysis

This new table has several columns. Some job descriptions included in the applications table didn’t have any of the listed acronyms and therefore were not discarded by using left_join. There are also more than one row per job description since I still want to account for the different acronyms per job description. There are also duplicates of course. And the “passed” don’t have a sector or more information on them.

# I have to keep the same column names to join both tables
job_applications2 <- job_applications_clean %>% 
  rename(company = name) %>% 
  rename(job_title = position)

job_app_acronyms <- job_applications2 %>% 
  full_join(number_of_jobs_w_acronym, by = c("job_title", "company"))

Missing data from the merged table “job_app_acronyms”

vis_miss(job_app_acronyms)

Comment

This newly merged table has many rows per job application/description. The table also includes job positions which were not applied to and therefore have no dates of application/reply, month of application, sector, location, organisation type, minimum year of experience required. There are some unknown companies because I was not able to retrieve the company names from the job descriptions when extracting the acronyms. In addition, some job positions are mentioned in the job_application table but are missing from the acronym table because they didn’t include any of the acronyms I selected. Finally, it is possible there are some mismatches, as the job title might be slightly different or because I haven’t applied to all of the job descriptions I saved. The contrary can be true as I might have not saved all of the job descriptions I applied to.

5.1 Number of Acronyms & Sector

Descriptive statistics

I first create a column that computes the number of acronyms per job.

job_app_acronyms1 <- job_app_acronyms %>% 
  group_by(job_title) %>% 
  mutate(acronyms_per_job = sum(count)) %>% 
  ungroup()

Next, I compute the minimum, maximum and mean values of the number of acronyms per job based on the sector.

job_app_acronyms1_mean <- job_app_acronyms1 %>%
  filter(!is.na(acronyms_per_job)) %>% 
  group_by(sector_grouped) %>% 
  summarise(mean = mean(acronyms_per_job),
            st_var = sd(acronyms_per_job),
            min = min(acronyms_per_job),
            max = max(acronyms_per_job),
            median = median(acronyms_per_job),
            group_size = n()) %>% 
  ungroup()

job_app_acronyms1_mean %>% 
       kable(
    caption = "Descriptive statistics of the number of acronyms per job based on the sector") %>% 
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = FALSE
  )

Descriptive statistics of the number of acronyms per job based on the sector
sector_grouped	mean	st_var	min	max	median	group_size
Consulting & Services	22.337349	17.511862	1	44	16	83
Finance & Business	22.415385	8.442623	4	36	27	65
Industry & Manufacturing	10.607143	7.236151	1	36	10	84
Knowledge & Education	6.466667	2.825058	3	9	9	15
Public & Social Sector	10.058824	6.628260	1	16	16	17
Sustainability & Environment	12.185185	13.399303	1	36	5	27
NA	14.428571	13.766702	1	44	10	56

Comment

The table shows variation in the mean number of acronyms per job across sectors. The Consulting & Service sector has the highest average number of acronyms per job description, suggesting more technical language is used in those postings. The Knowledge and Education sector has the lowest average, which may reflect less standardised reporting requirements.

Inferential statistics

I first need to examine whether the variable of the number of acronyms per job is normally distributed.

shapiro.test(job_app_acronyms1$acronyms_per_job)

## 
##  Shapiro-Wilk normality test
## 
## data:  job_app_acronyms1$acronyms_per_job
## W = 0.85644, p-value < 2.2e-16

Comment

The p-value is below the 0.05 threshold, which means that the data is not normally distributed, therefore a Kruskal-Wallis test is used to investigate the relationship between the number of acronym per job and the sector.

kruskal.test(acronyms_per_job ~ sector_grouped, data = job_app_acronyms1)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  acronyms_per_job by sector_grouped
## Kruskal-Wallis chi-squared = 59.632, df = 5, p-value = 1.448e-11

Comment

The Kruskal-Wallis test result shows that the p-value is below 0.05. This means that the difference in the number of acronyms per job across sectors is significant. There is a statistically significant difference in the number of acronyms per kob across sectors. This suggests that sector influences the technical complexity of job descriptions. A pairwise Wilcoxon test will therefore be conducted to identify which sectors differ. As observed in section 4.2, if no individual pairs reach significance, this likely reflects the limited sample size reducing statistical power rather than a true absence of difference.

pairwise.wilcox.test(job_app_acronyms1$acronyms_per_job,
                     job_app_acronyms1$sector_grouped,
                     p.adjust.method = "bonferroni")

## 
##  Pairwise comparisons using Wilcoxon rank sum test with continuity correction 
## 
## data:  job_app_acronyms1$acronyms_per_job and job_app_acronyms1$sector_grouped 
## 
##                              Consulting & Services Finance & Business
## Finance & Business           1.000                 -                 
## Industry & Manufacturing     0.022                 3.4e-13           
## Knowledge & Education        0.149                 2.4e-06           
## Public & Social Sector       0.438                 1.4e-05           
## Sustainability & Environment 0.020                 0.006             
##                              Industry & Manufacturing Knowledge & Education
## Finance & Business           -                        -                    
## Industry & Manufacturing     -                        -                    
## Knowledge & Education        0.200                    -                    
## Public & Social Sector       1.000                    1.000                
## Sustainability & Environment 1.000                    1.000                
##                              Public & Social Sector
## Finance & Business           -                     
## Industry & Manufacturing     -                     
## Knowledge & Education        -                     
## Public & Social Sector       -                     
## Sustainability & Environment 1.000                 
## 
## P value adjustment method: bonferroni

pairwise.wilcox.test(job_app_acronyms1$acronyms_per_job,
                     job_app_acronyms1$sector_grouped,
                     p.adjust.method = "BH")

## 
##  Pairwise comparisons using Wilcoxon rank sum test with continuity correction 
## 
## data:  job_app_acronyms1$acronyms_per_job and job_app_acronyms1$sector_grouped 
## 
##                              Consulting & Services Finance & Business
## Finance & Business           0.6859                -                 
## Industry & Manufacturing     0.0036                3.4e-13           
## Knowledge & Education        0.0213                1.2e-06           
## Public & Social Sector       0.0487                4.7e-06           
## Sustainability & Environment 0.0036                0.0015            
##                              Industry & Manufacturing Knowledge & Education
## Finance & Business           -                        -                    
## Industry & Manufacturing     -                        -                    
## Knowledge & Education        0.0251                   -                    
## Public & Social Sector       0.5901                   0.1835               
## Sustainability & Environment 0.2216                   0.8833               
##                              Public & Social Sector
## Finance & Business           -                     
## Industry & Manufacturing     -                     
## Knowledge & Education        -                     
## Public & Social Sector       -                     
## Sustainability & Environment 0.8337                
## 
## P value adjustment method: BH

Visualization

bonferroni_result <- pairwise.wilcox.test(job_app_acronyms1$acronyms_per_job,
                                           job_app_acronyms1$sector_grouped,
                                           p.adjust.method = "bonferroni")

bh_result <- pairwise.wilcox.test(job_app_acronyms1$acronyms_per_job,
                                   job_app_acronyms1$sector_grouped,
                                   p.adjust.method = "BH")

library(tibble)  # for rownames_to_column

pval_to_long <- function(test_result, method_name) {
  mat <- test_result$p.value
  
  # Make the matrix symmetric manually
  all_sectors <- union(rownames(mat), colnames(mat))
  n <- length(all_sectors)
  full_mat <- matrix(NA, nrow = n, ncol = n,
                     dimnames = list(all_sectors, all_sectors))
  
  for (r in rownames(mat)) {
    for (c in colnames(mat)) {
      full_mat[r, c] <- mat[r, c]
      full_mat[c, r] <- mat[r, c]  # mirror
    }
  }
  diag(full_mat) <- NA
  
  # Convert to long format
  expand.grid(Sector1 = all_sectors, 
              Sector2 = all_sectors,
              stringsAsFactors = FALSE) %>%
    mutate(p_value = map2_dbl(Sector1, Sector2, ~ full_mat[.x, .y]),
           method = method_name,
           significant = ifelse(!is.na(p_value), p_value < 0.05, NA))
}

bonferroni_long <- pval_to_long(bonferroni_result, "Bonferroni")
bh_long         <- pval_to_long(bh_result, "BH")

# Combine both
combined <- bind_rows(bonferroni_long, bh_long)

# Plot
combined %>%
  ggplot(aes(x = Sector1, y = Sector2, fill = p_value)) +
  geom_tile(color = "white", linewidth = 0.5) +
  geom_text(aes(label = ifelse(!is.na(p_value),
                               ifelse(p_value < 0.001, "<0.001", round(p_value, 3)),
                               "")),
            size = 2.8, color = "white", fontface = "bold") +
  scale_fill_gradient2(low = "#e74c3c",
                       mid = "#f39c12",
                       high = "#ecf0f1",
                       midpoint = 0.05,
                       na.value = "grey90",
                       name = "p-value",
                       limits = c(0, 1)) +
  facet_wrap(~ method, ncol = 2) +
  labs(
    title = "Pairwise Wilcoxon Test P-values by Correction Method",
    subtitle = "Red = significant (p < 0.05), lighter = not significant",
    x = NULL,
    y = NULL
  ) +
  theme_minimal(base_size = 11) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    strip.text = element_text(face = "bold", size = 12),
    legend.position = "bottom"
  )

6. Limitations

Sample size and representativeness The dataset covers 170 applications made over 18 months in a specific field (sustainability/ESG). The findings therefore reflect my personal job search experience and cannot be generalised to broader labour market trends.

Self-reported and manually collected data The data was collected manually, which introduces the risk of entry errors, inconsistencies in sector classification, and incomplete records. Some applications may have been omitted, and not all job descriptions were saved for acronym extraction.

Small group sizes Several organisation types and sectors have fewer than 10 observations, which reduces the statistical power of inferential tests and makes pairwise comparisons unreliable even when a global test is significant. This is particularly relevant for the pairwise Wilcoxon tests in sections 4.2 and 5.1, where the Bonferroni correction may have been too conservative given the small sample, and the additional pairs detected by the BH correction should be interpreted with caution.

Aggregated monthly reply data The monthly reply counts in section 4.5 represent one aggregated observation per month, which made group comparison tests such as ANOVA and Kruskal-Wallis inappropriate. A Spearman correlation was used instead to test for a trend over time, which is a more suitable but less powerful approach given the small number of time points (18 months).

Acronym extraction limitations The acronym extraction relied on a predefined list of keywords, which may have missed relevant terms or misclassified others. The Python script output also contained duplicates that required manual cleaning. Furthermore, the acronym dataset does not cover all applications, meaning the cross-analysis in section 5 is based on a partial overlap between the two tables.

Reply status interpretation A “reply” includes both positive and negative responses, meaning a high reply rate does not necessarily indicate success. This analysis does not distinguish between rejections, interview invitations, or other outcomes, which limits the practical interpretation of the reply rate findings.

Correction method dependency The choice of p-value adjustment method meaningfully affects which pairwise comparisons are deemed significant. The Bonferroni correction identified 6 significant pairs while the BH correction identified 9. Conclusions drawn from the pairwise analysis are therefore sensitive to the correction method chosen, and both should be considered together rather than in isolation.

7. Discussion

A 48.8% overall reply rate is notably higher than the commonly cited industry average of ~25%, which may reflect the niche nature of sustainability roles, the targeted nature of the applications, or the strong presence of structured HR processes in the sectors applied to.

The statistically significant relationships between reply status and both sector (Fisher, p < 0.05) and organisation type (Fisher, p = 0.0005) suggest that these structural characteristics of employers meaningfully influence recruitment responsiveness. Accounting and auditing firms had the highest reply rate (90%), possibly reflecting more formalised recruitment pipelines, while intergovernmental organisations had the lowest (14%), which may be explained by longer and more bureaucratic hiring processes consistent with the reply time analysis in section 4.4.

Regarding the temporal dimension of replies, the Spearman correlation found a weak positive trend between time and number of replies (rho = 0.322, p = 0.192), suggesting a slight increase in replies over the 18-month period that does not reach statistical significance. This result should be interpreted cautiously given that monthly reply counts represent single aggregated observations, limiting the power of any temporal analysis. The absence of a significant relationship between reply time and sector or month of application further suggests that timing a job application strategically by month or field may not substantially improve responsiveness, at least within this dataset.

The acronym analysis in section 5.1 provides perhaps the most insightful finding of the report. The global Kruskal-Wallis test confirmed that the number of acronyms per job description differs significantly across sectors. The subsequent pairwise Wilcoxon tests revealed that Finance & Business is the most distinct sector, differing significantly from all other sectors under both Bonferroni and BH corrections. This is a somewhat counterintuitive finding, as one might expect Finance & Business to use more technical language, but it may reflect that financial sector job descriptions in this sample were more generalist in nature or targeted a broader audience. Consulting & Services on the other hand consistently used more technical acronyms than Industry & Manufacturing, Sustainability & Environment, Knowledge & Education, and Public & Social Sector, the latter two only emerging as significant under the less conservative BH correction. Knowledge & Education consistently showed the lowest acronym usage across both correction methods, which aligns with expectations given the more descriptive and less regulatory nature of academic and educational job postings.

The prevalence of ESG, CSRD, and EU-related acronyms throughout the dataset reflects the regulatory environment of 2024–2025, where the EU sustainability disclosure framework was actively debated. The subsequent Omnibus proposal in November 2025, which reduced the scope of CSRD dramatically, may shift this acronym landscape significantly in future job postings, with implications for the technical skills demanded by employers.

8. Conclusion

This analysis of 170 personal job applications submitted between March 2024 and September 2025 provides a detailed snapshot of a sustainability-focused job search. The data reveals that applications were spread across 37 sectors (reduced to 6 sectors to increase the number of observations per sector), predominantly in Consulting & Services, and that the majority targeted roles requiring 2–3 years of experience, consistent with the applicant’s profile.

Statistically significant relationships were found between reply status and both sector and organisation type, suggesting that these structural factors play a meaningful role in employer responsiveness. In contrast, no significant relationship was found between reply time and sector or application month, and the temporal analysis of monthly reply counts revealed only a weak, non-significant upward trend over the 18-month period (rho = 0.322, p = 0.192). These findings collectively suggest that the timing and field of an application are less important than the structural characteristics of the target organisation in determining whether and how quickly a reply is received.

The acronym analysis highlighted meaningful differences in technical language use across sectors. Finance & Business stood out as the most distinct group, differing significantly from all other sectors in acronym usage, while Consulting & Services consistently used the most technical language. These findings, robust under both Bonferroni and BH corrections, suggest that the density of technical acronyms in job descriptions is partly a function of the sector, with implications for how candidates tailor their application materials.

The centrality of EU sustainability regulation acronyms such as ESG, CSRD, and GRI throughout the dataset reflects the regulatory moment of 2024–2025 in Europe. As the Omnibus proposal reshapes the scope of mandatory sustainability reporting, future analyses may capture a shift in the technical vocabulary of sustainability job descriptions, potentially reducing the dominance of EU regulatory frameworks in favour of broader international standards such as ISSB.

While the findings are limited by the personal and small-scale nature of the dataset, this project demonstrates the value of systematic data collection during a job search and provides a methodological template that could be scaled or replicated. Future work could enrich the analysis by distinguishing between types of replies, incorporating salary data, expanding the acronym list to capture a broader range of technical frameworks, or collecting reply counts at the individual application level rather than as monthly aggregates to enable more robust temporal analysis.

Analysis of job applications from March 2024 to September 2025

1. Introduction

2. Data and methodology

Data collection process

Description of the two Excel tables

Data Import

Loading files into R

Data cleaning

Handling missing values

Formatting columns

Cleaning columns and class types

Cleaning duplicates

3. Descriptive Analysis

3.1 Companies

Number of distinct companies applied to

3.2 Sectors

Most frequent sectors

Comment

Comment

Comment

3.3 Job Requirements: Years of Experience

Descriptive statistics

Comment

Distribution plot

Comment

3.4 Job Requirements: Acronyms and technical language

Number of acronym occurrences

Comment

Top 15 acronyms by distinct jobs

Comment

Average acronyms per job description

4. Relational Analysis

4.1 Sector & Years of experience

Descriptive statistics

Comment

Inferential statistics

Normality test

Comment

Comment

4.2 Organisation Type & Years of Experience

Descriptive statistics

Comment

Inferential statistics

Comment

Comment

Comment

4.3 Reply Time & Sector

New columns: reply time (days) and reply status

Comment

Descriptive statistics

Descriptive statistics for the number of days to reply

Comment

Comment

Distribution plot

Comment

Descriptive statistics

Comment

Visualization

Comment

Inferential statistics

Comment

Comment

4.4 Reply Time & Organisation Type

Descriptive statistics

Comment

Visualization

Comment

4.5 Reply Time & Application Month

New columns: application month and reply month

Descriptive statistics

Comment

Visualization

Comment

Inferential statistics

Comment

4.6 Reply Status & Sector

Descriptive statistics

Visualization

Comment

Inferential