1. Dental Office 🦷🪥¶

Business Analysis 📈

I’d like to say that for the last two years, I’ve been freelance working in a dental office. I really didn’t study any similar to it. I have no practical knowledge on, notwithstanding I have taken part in managing data. In other words, I know we should take care of patients data and manipulate it wisely. Process, understand and interpret data is important in the decision-making process so we can know who is visiting us and measure the impact it generates on business profits.

1.1. Data source📔¶

The database that we query in this analysis is temporarily hosted on an Azure IaaS service, specifically Azure Data Base for PostgreSQL, so if you try to connect to it, you won’t have access given that the information is sensitive and must be protected.

1.2. Connect DataBase/Python¶

Psycopg is the most popular PostgreSQL database adapter for python and all we need to do is import psycopg2 instantiate connect and indicate server-general information easily obtained from azure.

#We'll use pandas and numpy for data manipulation
import pandas as pd
import numpy as np
#We use seaborn and matplotlib to plot the data
import seaborn as sns
import matplotlib.pyplot as plt 
#Library to connect with postgresql
import psycopg2

#Establish connection with server adminsql@az-postgresql-server host on azure as IaaS
conn_sql = psycopg2.connect(user = "adminsql@az-postgresql-server",
                            password = "CEdndm1246",
                            host = "az-postgresql-server.postgres.database.azure.com",
                            port = "5432",
                            database = "postgres")

---------------------------------------------------------------------------
OperationalError                          Traceback (most recent call last)
Input In [1], in <cell line: 11>()
      8 import psycopg2
     10 #Establish connection with server adminsql@az-postgresql-server host on azure as IaaS
---> 11 conn_sql = psycopg2.connect(user = "adminsql@az-postgresql-server",
     12                             password = "CEdndm1246",
     13                             host = "az-postgresql-server.postgres.database.azure.com",
     14                             port = "5432",
     15                             database = "postgres")

File ~/packages/anaconda3/lib/python3.9/site-packages/psycopg2/__init__.py:127, in connect(dsn, connection_factory, cursor_factory, **kwargs)
    124     raise TypeError('missing dsn and no parameters')
    126 dsn = _ext.make_dsn(dsn, **kwargs)
--> 127 conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
    128 if cursor_factory is not None:
    129     conn.cursor_factory = cursor_factory

OperationalError: could not translate host name "az-postgresql-server.postgres.database.azure.com" to address: Name or service not known

1.3. Database Query¶

The database is made up of eleven tables, however, only three are dependent and the remaining eight are independent, which means that it is a low-complexity database where the patient table is connected to the treatment table through the transition table patient_treatment and that’s it, as shown below the query indicates the type of JOIN and the aliases given to each column.## Database Query The database is made up of eleven tables, however, only three are dependent and the remaining eight are independent, which means that it is a low-complexity database where the patient table is connected to the treatment table through the transition table patient_treatment and that’s it, as shown below the query indicates the type of JOIN and the aliases given to each column.

query_sql = '''
SELECT date AS "appointment day", hour AS "appointment hour", 
LOWER(f_name) AS "patient name", LOWER(type_of_id) AS "type of id",
LOWER(place_of_issuance) AS "id-document-issuance",  date_of_birth AS birthday,
LOWER(genre) AS genre, LOWER(health_care_facility) AS "healthcare facility", 
LOWER(occupation) AS occupation, LOWER(oral_hygiene) AS "oral hygiene", 
brushed_per_day AS brushing , LOWER(smoke) AS smoker, dental_floss AS "floss usage", 
LOWER(alcohol) AS alcoholic, LOWER(proceed) AS "dental-procedure",
LOWER(method_of_payment) AS "payment method", cost, material_expense AS "dental-materials-cost",
LOWER(procedure_status) AS "procedure-status", extract(year from AGE(date_of_birth)) as age

-- INNER JOIN is better than LEFT JOIN given raw data is quite incomple
FROM patient
INNER JOIN type_of_id USING (type_of_id_id)
INNER JOIN genre USING (genre_id)
INNER JOIN health_care_facility USING (health_care_facility_id)
INNER JOIN occupation USING (occupation_id)
INNER JOIN neighborhood USING (neighborhood_id)
INNER JOIN patient_treatment USING (patient_id)
INNER JOIN treatment USING (treatment_id)
INNER JOIN proceed USING (proceed_id)
INNER JOIN business_reference USING (business_reference_id)
INNER JOIN method_of_payment USING (method_of_payment_id)
'''
#Using pd.read_sql we read the data and store it in dental_office_raw or raw data.
dental_office_raw = pd.read_sql(query_sql, conn_sql)
#Create a copy() in case we need to get back to it.
dental_office = dental_office_raw.copy()
#Extract a sample from it making sure it's been query properly.
dental_office.sample(10)

/home/danmuner/packages/anaconda3/lib/python3.9/site-packages/pandas/io/sql.py:761: UserWarning: pandas only support SQLAlchemy connectable(engine/connection) ordatabase string URI or sqlite3 DBAPI2 connectionother DBAPI2 objects are not tested, please consider using SQLAlchemy
  warnings.warn(

	appointment day	appointment hour	patient name	type of id	id-document-issuance	birthday	genre	healthcare facility	occupation	oral hygiene	brushing	smoker	floss usage	alcoholic	dental-procedure	payment method	cost	dental-materials-cost	procedure-status	age
367	2022-01-20	15:30:00	luis	cc	bogota	1994-05-18	masculino	sanitas	independiente	buena	3.0	si	MNO	no	endodoncia uniradicular	efectivo	165000.0	NaN	p	28.0
323	2021-12-27	16:33:00	manuel	cc	bogota	1919-06-08	masculino	sura	independiente	buena	3.0	no	SI	no	pulpectomia	efectivo	70000.0	11465.0	t	103.0
397	2022-02-11	17:05:00	maria	ce	bogota	1980-01-01	femenino	compensar	ama de casa	buena	NaN	no	NO	no	corona zirconio	efectivo	60000.0	NaN	p	42.0
412	2022-02-21	14:25:00	miguel	cc	bogota	1940-10-08	masculino	salud total	desempleado	buena	NaN	no	SI	no	protesis flexible isosit parcial	efectivo	1000000.0	NaN	p	81.0
214	2021-11-05	14:08:00	maria	cc	bogota	1951-03-04	femenino	compensar	ama de casa	buena	NaN	no	SI	no	diagnostico	efectivo	50000.0	NaN	p	71.0
98	2021-07-24	16:20:00	wiliam	cc	bogota	1990-11-03	masculino	salud total	empleado	regular	2.0	si	NO	no	diagnostico	efectivo	NaN	NaN	t	31.0
37	2021-05-19	12:30:00	dalia	cc	bogota	1988-04-01	femenino	salud total	independiente	buena	2.0	no	SI	no	resina u obturacion ii sup	efectivo	290000.0	49796.0	t	34.0
587	2022-05-09	18:23:00	dianys	cc	bolivar	1989-01-09	femenino	famisanar	estudiante	regular	2.0	si	NO	si	protesis convensional biodent parcial	efectivo	50000.0	8012.0	t	33.0
157	2021-09-18	16:35:00	maria	cc	bogota	1979-03-13	femenino	compensar	empleado	buena	2.0	no	NO	no	resina estetica	efectivo	110000.0	11133.0	None	43.0
40	2021-05-25	00:00:00	nathalie	cc	bogota	1987-07-29	femenino	sanitas	independiente	buena	2.0	no	NO	no	nucleo	efectivo	190000.0	12090.0	t	34.0

#Identify how many columns and rows has the DataFrame 
dental_office.shape

(625, 20)

1.4. Data Exploration 👨‍💻🔍¶

`info () besides indicating the data type shows us null values, so here is were we can decide, whether to cast or not data to their respective types and even categorize it for further analysis.

Identify data types and null data
Casting birthday and appointment day to date type.
Identify unique values within the next columns, smoker, floss usage, alcoholic, oral hygiene, correct semantic errors and remove blanks.

#Data types and missing values
dental_office.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 625 entries, 0 to 624
Data columns (total 20 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 appointment day        624 non-null    object 
 appointment hour       625 non-null    object 
 patient name           624 non-null    object 
 type of id             625 non-null    object 
 id-document-issuance   625 non-null    object 
 birthday               617 non-null    object 
 genre                  625 non-null    object 
 healthcare facility    625 non-null    object 
 occupation             625 non-null    object 
 oral hygiene           624 non-null    object 
brushing               397 non-null    float64
smoker                 624 non-null    object 
floss usage            621 non-null    object 
alcoholic              620 non-null    object 
dental-procedure       625 non-null    object 
payment method         625 non-null    object 
cost                   590 non-null    float64
dental-materials-cost  343 non-null    float64
procedure-status       609 non-null    object 
age                    617 non-null    float64
dtypes: float64(4), object(16)
memory usage: 97.8+ KB

#Turning object type to datetime64
dental_office['appointment day'] = pd.to_datetime(dental_office['appointment day'])
dental_office['birthday'] = pd.to_datetime(dental_office['birthday'])

#Semantic errors identified in smoker column
dental_office['smoker'].unique()[1:4]

array(['no ', 'si ', 'non '], dtype=object)

dental_office['smoker'].replace({'no ': 'no', 'non ': 'no','si ':'si'}, inplace=True)
dental_office['smoker'].unique()

array(['no', 'si', None], dtype=object)

#Semantic errors identified in floss usage column
dental_office['floss usage'].unique()

array(['SI', 'NO ', 'NO', 'SI ', None, ' NO ', 'MNO '], dtype=object)

dental_office['floss usage'].replace({'NO ': 'no','NO': 'no','MNO ': 'no',' NO ': 'no', 'SI ': 'si','SI':'si'}, inplace=True)
dental_office['floss usage'].unique()

array(['si', 'no', None], dtype=object)

#Semantic errors identified in alcoholic column
dental_office['alcoholic'].unique()

array(['no', 'si', 'no ', None, 'nio', 'si '], dtype=object)

dental_office['alcoholic'].replace({'no ':'no','nio':'no','si ':'si'},inplace=True)
dental_office['alcoholic'].unique()

array(['no', 'si', None], dtype=object)

#Semantic errors identified in oral hygeine column
dental_office['oral hygiene'].replace({'buena ':'buena','mala ':'mala','regular ':'regular','regualr':'regular','bueno ':'buena','buena 2':'buena','bueno':'buena'}, inplace=True)
dental_office['oral hygiene'].unique()

array(['buena', 'regular', 'mala', 'beuna ', None], dtype=object)

#Below we indicate the number of missing values per column
null_values = dental_office.isnull().sum().reset_index()
null_values.rename(columns={'index':'col_name',0:'null_val'},inplace=True)
null_values = null_values[null_values['null_val']>0].reset_index(drop=True)
null_values

	col_name	null_val
0	appointment day	1
1	patient name	1
2	birthday	8
3	oral hygiene	1
4	brushing	228
5	smoker	1
6	floss usage	4
7	alcoholic	5
8	cost	35
9	dental-materials-cost	282
10	procedure-status	16
11	age	8

There are control or follow-up dental visits, which do not have any associated cost, so we’ll replace all the missing values in the cost column with 0. This does not happen with other columns such as brushing, which we will discard from the analysis given the missing values; in doing so, we’ll proceed to remove duplicates based on the remaining columns.

dental_office['cost'].fillna(0, inplace=True)
dental_office['dental-materials-cost'].fillna(0,inplace=True)

dental_office.dropna(how='any', subset=['patient name','birthday','smoker','floss usage','alcoholic','procedure-status','appointment day'], inplace=True)

dental_office = dental_office[['appointment day', 'appointment hour', 'patient name', 'type of id',
       'id-document-issuance', 'birthday','age', 'genre', 'healthcare facility',
       'occupation', 'oral hygiene', 'smoker', 'floss usage',
       'alcoholic', 'dental-procedure', 'payment method', 'cost','dental-materials-cost', 'procedure-status']]

#Turning object type into categorical 
dental_office = dental_office.astype({"type of id":"category", "id-document-issuance":'category',"genre":'category',
                                      "healthcare facility":'category',"occupation":'category',"oral hygiene":'category',
                                      "smoker":'category',"floss usage":'category',"alcoholic":'category',
                                      "dental-procedure":'category',"payment method":'category',
                                      "procedure-status":'category',"age":'int'})

1.4.1. Procedures performed on patients grouped by gender with the highest cost.¶

This information compiles the procedures in which both men and women invested more money. We create a grouped table for each gender, then we do an append.

The most expensive procedure accessed by patients grouped by gender and most frequently used are flexible prostheses.

#Set up display options for floating format
pd.options.display.float_format = '{:.0f}'.format
#Add to columns filtered by gender
dental_office['counts'] = 1
women_top_proc = dental_office[dental_office['genre']=='femenino'].groupby(['genre','dental-procedure'])['cost'].median().sort_values(ascending=False).head(5)
men_top_proc = dental_office[dental_office['genre']=='masculino'].groupby(['genre','dental-procedure'])['cost'].median().sort_values(ascending=False).head(5)
men_top_proc.append(women_top_proc).to_frame()

/tmp/ipykernel_1352/732345006.py:7: FutureWarning: The series.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  men_top_proc.append(women_top_proc).to_frame()

		cost
genre	dental-procedure
masculino	protesis flexible isosit parcial	1000000
	protesis alto impacto duratone total	500000
	protesis flexible duratone parcial	380000
	protesisflexible biodent parcial	335000
	recubrimiento multiradicular	315000
femenino	protesis flexible superc parcial	700000
	protesis alto impacto biodent parcial	330000
	recubrimiento multiradicular	325000
	exo caninos cirujano	320000
	protesis flexible duratone parcial	300000

1.4.2. Descriptive Income Statistics regarding gender and occupation¶

Note: Errors are evident in the male gender with respect to the occupation housewife which does not usually apply to this gender and was applied by mistake.

The median is the same for self-employed and employees regardless of gender.
Unemployed men have the highest standard deviation, which could be due to an error when entering the type of patient. In addition, there is no agreement that the unemployed present the second highest income. Opposed to what happens with women.

#Income grouped by genre and occupation
dental_office.groupby(['genre','occupation'])['cost'].describe()

		count	mean	std	min	25%	50%	75%	max
genre	occupation
femenino	ama de casa	57	149333	221572	0	40000	80000	140000	1200000
	desempleado	11	64455	87363	0	14500	50000	55000	300000
	empleado	121	145117	235849	0	50000	90000	130000	2123123
	estudiante	54	87278	93164	0	50000	61000	98750	600000
	independiente	122	113713	104347	0	56250	90000	147500	740000
masculino	ama de casa	4	72500	26300	50000	50000	70000	92500	100000
	desempleado	12	326667	596659	10000	15000	85000	207500	2000000
	empleado	73	108274	102846	0	55000	90000	110000	650000
	estudiante	18	47778	36146	0	15000	50000	70000	140000
	independiente	122	112574	101579	0	50000	90000	140000	620000

1.4.3. Analysis of essential aspects in oral care¶

There are bad habits that affect teeth health, such as smoking ', drinking alcohol`, or not brushing properly. They can cause either short- or long-term problems depending on each patient. Focusing health and prevention campaigns on populations with higher incidences of bad habits can help.

Smokers

In terms of occupation, employees and self-employed have the most significant percentage or a higher tendency to smoke. It should be noted that the amount of data for categories other than employees and self-employed is not significant, which increases the bias in the given conclusion.

Oral hygiene

Oral hygiene is related to the number of times we brush our teeth per day, where 3 is good, 2 is regular and 1 is bad. -More than 50%of patients, both men and women, have good hygiene, which makes sense in patients who are attentive to their oral health. Usually patients who have bad habits in their oral health do not usually visit the dentist.

Dental Floss

Dental floss is a very important habit, but just a few 4 out of 10 people use it.

Alcohol

The question, Do you consume alcoholic beverages?. Well, patients don’t know what to answer, or they associate it with people who abuse alcoholic beverages, so we should reword the question and make the patient feel more comfortable instead of being attacked with the question.

#Smokers percentage
pd.options.display.float_format = '{:.2f}%'.format
p_smoker = dental_office.groupby(['genre','smoker','occupation'])[['counts']].sum()
p_smoker = p_smoker.groupby(level=0).apply(lambda x :  x /x.sum()*100).unstack('occupation')
p_smoker.xs('si',level=1)

	counts
occupation	ama de casa	desempleado	empleado	estudiante	independiente
genre
femenino	2.19%	0.00%	10.68%	4.66%	9.32%
masculino	1.31%	0.44%	11.79%	2.62%	19.21%

#Oral hygeine percentage grouped by gender
p_hygiene = dental_office.groupby(['genre','oral hygiene'])[['counts']].sum()
p_hygiene.groupby(level=0).apply(lambda x : x / x.sum()*100)

		counts
genre	oral hygiene
femenino	buena	63.84%
	mala	0.55%
	regular	35.62%
masculino	buena	50.22%
	mala	9.17%
	regular	40.61%

#Dental floss usage grouped by gender
p_floss = dental_office.groupby(['genre','floss usage'])[['counts']].sum()
p_floss.groupby(level=0).apply(lambda x : x / x.sum()*100)

		counts
genre	floss usage
femenino	no	58.90%
femenino	si	41.10%
masculino	no	68.56%
masculino	si	31.44%

#Alcohol consumption grouped by gender
p_alcoholic = dental_office.groupby(['genre','alcoholic','occupation'])[['counts']].sum()
p_alcoholic = p_alcoholic.groupby(level=0).apply(lambda x : x / x.sum()*100)
p_alcoholic.unstack('occupation').xs('si',level=1)

	counts
occupation	ama de casa	desempleado	empleado	estudiante	independiente
genre
femenino	0.82%	0.00%	8.49%	4.11%	7.67%
masculino	1.31%	0.00%	5.68%	2.62%	14.41%

###Patients Preferred Payment Methods

Payment options are increasingly diverse to make it easier for patients to access dental treatment. We could analyze payment methods with regard to different population groups. but for this analysis we will focus on relating them in regard to the occupation and the generation each patient belongs to.

Generation/Occupation

Despite the convenience of bank or mobile transfers, they aren’t as representative as cash, or traditional payment methods. Creditcart transfers in some cases involve tax payments and have also not been adopted by all generations or age ranges. The graph percentage by generation shows the segment of patients that belongs to each generation.

Approximately 45% of patients are Gen Z and Millennials i.e., a population made up of youth and young adults whose use of EFT’s is likely to be greater than older adults.
When we cross generations data regarding payment methods, we observe that a higher percentage of electronic payments are in Millennials compared to other generations. Despite being a representative population of the total number of patients, the preferred payment method is the traditional one by far.
Note: Create a table that allows validating whether the Millennials generation that has access to dental treatments are actually the most active at work so they can afford them. Answer: The Millennials generation are the most active at work. Most of them are self-employed.
People whose occupation is self-employed are the population that uses the most electronic payment methods, with more than 20% of the total.

#This func allow us to categorize patient age group by generations 
def patient_gen(x):
  if x >= 10 and x <= 25:
    return 'Gen Z'
  elif x >= 26 and x <= 41:
    return 'Millennials'
  elif x >= 42 and x <= 57:
    return 'Gen X'
  elif x >= 58 and x <= 67:
    return 'Boomers II'
  elif x >= 68 and x <= 76:
    return 'Boomers I'
  elif x >= 77 and x <= 94:
    return 'Post War Gen'
  else:
    return 'None'

dental_office['patient_gen'] = dental_office['age'].apply(patient_gen)

#Patients bar chart according to the generational group to which they belong.

patient_gens = dental_office.groupby('patient_gen')[['counts']].sum()
total_patient_gen = patient_gens.reset_index().rename(columns={'counts':'total'})
percent_patient_gen = patient_gens.groupby(level=0).apply(lambda x:x/patient_gens.sum()*100).reset_index().rename(columns={'counts':'percentage'})

fig,axes = plt.subplots(1,2,figsize=(12,5))
axes[0].set_title('Total Patients per Generation')

t = sns.barplot(ax=axes[0], data=total_patient_gen, x='patient_gen',y='total',palette='magma')
plt.xticks(rotation=45)
t.set_xlabel('Generations')
t.set_ylabel('Freq Total')
t.set_xticklabels(['Boomers I', 'Boomers II', 'Gen X', 'Gen Z', 'Millennials', 'None',
       'Post War Gen'], rotation=45)

axes[1].set_title('Percentage per Generation')

p = sns.barplot(ax=axes[1], data=percent_patient_gen, x='patient_gen',y='percentage',palette='magma')
p.set_xlabel('Generations')
p.set_ylabel("Percentage")
p.set_xticklabels(['Boomers I', 'Boomers II', 'Gen X', 'Gen Z', 'Millennials', 'None',
       'Post War Gen'], rotation=45)
fig.tight_layout()
percent_patient_gen
plt.show()

#Tabulate crossed data between generation and payment method

pd.options.display.float_format = '{:.2f}%'.format
non_common_methods = ['trans_bancolombia','trans_davivienda','otro']
prefer_payment = dental_office[~dental_office['payment method'].isin(non_common_methods)]
prefer_payment = prefer_payment.groupby(['patient_gen','payment method'])[['counts']].sum()
prefer_payment = prefer_payment.groupby(level=0).apply(lambda x:x/x.sum()*100)
prefer_payment.unstack('payment method').iloc[:,:4]

	counts
payment method	bold	daviplata	efectivo	nequi
patient_gen
Boomers I	34.09%	0.00%	65.91%	0.00%
Boomers II	15.52%	0.00%	81.03%	3.45%
Gen X	8.72%	1.16%	85.47%	4.65%
Gen Z	3.37%	5.62%	85.39%	5.62%
Millennials	12.57%	2.73%	77.60%	7.10%
None	3.85%	3.85%	88.46%	3.85%
Post War Gen	0.00%	0.00%	100.00%	0.00%

#Tabulate crossed data between generation and occupation
pd.options.display.float_format = '{:.2f}%'.format
non_common_methods = ['trans_bancolombia','trans_davivienda','otro']
generation_payment = dental_office[~dental_office['payment method'].isin(non_common_methods)]
generation_payment = generation_payment.groupby(['patient_gen','occupation'])['counts'].sum()
generation_payment.groupby(level=0).apply(lambda x:x/x.sum()*100).unstack('occupation')

occupation	ama de casa	desempleado	empleado	estudiante	independiente
patient_gen
Boomers I	40.91%	6.82%	15.91%	2.27%	34.09%
Boomers II	24.14%	5.17%	29.31%	1.72%	39.66%
Gen X	8.72%	2.33%	36.05%	1.16%	51.74%
Gen Z	10.11%	2.25%	32.58%	40.45%	14.61%
Millennials	1.64%	3.28%	38.25%	3.83%	53.01%
None	0.00%	0.00%	11.54%	84.62%	3.85%
Post War Gen	12.50%	62.50%	0.00%	0.00%	25.00%

#Prefered payment methods groupped by occupation 
non_common_methods = ['trans_bancolombia','trans_davivienda','otro']
p_payment = dental_office[~dental_office['payment method'].isin(non_common_methods)]
p_payment = p_payment.groupby(['occupation','payment method'])[['counts']].sum()
p_payment = p_payment.groupby(level=0).apply(lambda x : x / x.sum()*100)
p_payment.unstack('payment method').iloc[:,:4]

	counts
payment method	bold	daviplata	efectivo	nequi
occupation
ama de casa	15.00%	0.00%	83.33%	1.67%
desempleado	8.70%	0.00%	82.61%	8.70%
empleado	10.64%	5.32%	82.45%	1.60%
estudiante	5.80%	2.90%	84.06%	7.25%
independiente	12.92%	0.42%	79.17%	7.50%

1.5. Revenue analysis over time series¶

An analysis of an income statement is normally made up of sales income, administrative expenses, operational costs, among others, in order to analyze net profit. However, for this case study we will only consider sales revenue without any deductions.

-The best way to analyze income regarding series series is from the mobile sock method, as shown in the graphic Expenditure analysis per sex the trend is clearer to create 30 -rows windows for a daily frequency for a daily frequency. This makes it possible to avoid negative spikes on days where there was no revenue.

The categorization of income can be carried out with respect to different population groups. However, it will only be done with respect to gender and occupation. -Men are the demographic group that are carried out less dental treatments. The downward trend in the last 2 years is worrisome, so it is suggested to perform a detailed analysis regarding this deficit of men.
The categorization by occupation reflects that there is a greater fluctuation in the group of independent workers than in the group of employees, being the employees, the ones who have allowed the growth of the dental practice income.

#Business income groupped by gender
dental_office['women_exp'] = dental_office[dental_office['genre'] == 'femenino']['cost']
dental_office['men_exp'] = dental_office[dental_office['genre'] == 'masculino']['cost']
date_analysis_D = dental_office.groupby(pd.Grouper(key='appointment day', freq='D'))[['cost','women_exp','men_exp']].sum()
sales_total = date_analysis_D.rolling(30).mean()
fig = plt.figure(figsize=(15,8))
sale_to = sns.lineplot(data=sales_total)
sale_to.set_title('Expenditure Analysis per Sex',fontsize=20)
sale_to.set_xlabel('Period of Time')
sale_to.set_ylabel('Moving Avg 30 Days(Amount)')
plt.show()

#Business income groupped by patient occupation
from matplotlib.lines import Line2D
from matplotlib import rcParams, cycler
cmap = plt.cm.coolwarm
custom_lines = [Line2D([0], [0], color=cmap(1.), lw=4),
                Line2D([0], [0], color=cmap(.8), lw=4),
                Line2D([0], [0], color=cmap(0.6), lw=4),
                Line2D([0], [0], color=cmap(0.2), lw=4),
                Line2D([0], [0], color=cmap(.0), lw=4)]

rcParams['axes.prop_cycle'] = cycler(color=cmap(np.linspace(0, 1, 5)))


dental_office['student_exp'] = dental_office[dental_office['occupation'] == 'estudiante']['cost']
dental_office['employee_exp'] = dental_office[dental_office['occupation'] == 'empleado']['cost']
dental_office['hkeeper_exp'] = dental_office[dental_office['occupation'] == 'ama de casa']['cost']
dental_office['ind_exp'] = dental_office[dental_office['occupation'] == 'independiente']['cost']
dental_office['unemployee_exp'] = dental_office[dental_office['occupation'] == 'desempleado']['cost']

date_analysis_occupation = dental_office.groupby(pd.Grouper(key='appointment day', freq='D'))[['student_exp','employee_exp','hkeeper_exp','ind_exp','unemployee_exp']].sum()
sales_occupation = date_analysis_occupation.rolling(30).mean()

fig, ax = plt.subplots(figsize=(15, 8))
lines = ax.plot(sales_occupation)

sale_to.set_title('Expenditure Analysis per Occupation',fontsize=20)
sale_to.set_xlabel('Period of Time')
sale_to.set_ylabel('Moving Avg 30 Day(Amount)')
ax.legend(custom_lines, ['student_exp', 'employee_exp', 'hkeeper_exp','ind_exp','unemployee_exp']);
plt.show()

#Business income groupped by gender time series per year
pd.options.display.float_format = '{:.0f}'.format
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun","Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
dental_office['year'] = dental_office['appointment day'].dt.year
dental_office['month'] = dental_office['appointment day'].dt.month
#dental_office['month'] = pd.to_datetime(dental_office['appointment day'], format='%m').dt.month_name()
revenue = dental_office.groupby(['year','month'])[['cost']].sum()
revenue_2021 = revenue.xs(2021,level=0).rename(columns={'cost':'revenue 2021'})
revenue_2022 = revenue.xs(2022,level=0).rename(columns={'cost':'revenue 2022'})
revenue_year = pd.concat([revenue_2021,revenue_2022],axis=1).reset_index()
revenue_year['month'] = pd.Categorical(revenue_year['month'].apply(str))
revenue_year.plot.bar(stacked=False,width=1.0)
plt.xticks(np.arange(12),months, rotation=45)
plt.show()

1.6. Conclusions¶

The case of study includes real data that contains sensitive information and that compromises the safety of patients, for this reason you will not be able to access the data if you try to access the server later on.
Understanding the categorical variables as population groups, it could be interesting to make a cross-union between the largest number of groups to analyze more in detail sales in detailed cases, such as sales income given by self-employed women who belongs to generation X and use dental floss properly.
The dental office should look forward to better communication channels that allow us to increase men’s interest in oral health. Identify the possible reasons why this population hardly has access or interest to dental treatments.

Daniel's Portfolio

Dental Office 🦷🪥

Contents