Work Samples
Use automated tools to extract data from primary and secondary sources
Use SAP ERP Modules to export material data and insert pivot table in Microsoft Excel.
Utilize strong Microsoft Excel and SQL skills to provide high-quality data analysis and build daily and monthly production performance report and KPI reports, highlight high impact factors to drive manufacturing improvements
We have 19 production lines which daily produce more than 10 different kinds of products. I analyze production speed, efficiency, yield rate of each production order to complete Production Performance Report
In Plant Performance Report, I analyze yield rate both by line and by category, line utilization, overall efficency and scrap rate, etc.
I use Excel pivot-table and vlookup and the knowledge of variance to analyze the difference between production target and actual output in Production Adherence Report.
Supply Chain KPI Report analyzes absolute variance to Production Adherence, inventory turnover days and order fulfilment rate
Use Microsoft Excel and statistical knowledge to analyze the movement of raw materials and packing materials for product
Create analysis of slow-moving materials to generate Aging Reports and define the solution in both short term and long term for materials planning every month
Use RStudio to identify, analyze patterns in the consumption of raw materials who contains juice concentrate, and interpret the findings in both English and Chinese
juice_data<-read.csv(“juice material.csv”)
juice_data_t<-juice_data_t[,-9]
names(juice_data_t)<-c(“CONCENTRATED LEMON JUICE”,”PINEAPPLE CONCENTRATE”,
“ORANGE JUICE CONCENTRATE”,”LIME JUICE CONCENTRATE”,
“SALTED PLUM PUREE”,”TANGERINE CONCENTRATE”,
“COCONUT MILK POWDERTE”,”CHERRY JUICE CONCENTRATE”)
juice_data_t<-apply(juice_data_t,2,as.numeric)
library(pheatmap)
pheatmap(cor(juice_data_t))
From the relationship between each two juice raw material consumptions, we can see find interesting conclusions:
The correlation coefficient between CONCENTRATED LEMON JUICE and PINEAPPLE CONCENTRATE reached 0.74, indicates that there is a significant correlation between the consumption of these concentrated juice raw material. Similarly, LIME JUICE CONCENTRATE and PINEAPPLE CONCENTRATE also have a very high correlation relationship, with a correlation coefficient of 0.73. In contrast, the consumption of TANGERINE CONCENTRATE and CONCENTRATED LEMON JUICE, as well as the consumption of TANGERINE CONCENTRATE and PINEAPPLE CONCENTRATE, have a significant negative correlation, and their correlation coefficients are -0.62 and -0.42, respectively, which can be considered as the more one consumes, the less the other consumes. The same situation also occurs in CHERRY JUICE CONCENTRATE. Its correlation coefficients with CONCENTRATED LEMON JUICE and PINEAPPLE CONCENTRATE are -0.66 and -0.78, respectively. Therefore, there is also an obvious negative correlation between them, that is, the consumption of a raw material increases, the consumption of the other raw material decreases.
我们通过上图含有浓缩果汁的原材料之间的相关性得到一定的结论:
浓缩柠檬汁和浓缩菠萝汁的相关系数达到了0.74,说明这两种浓缩果汁原料的消耗量存在显著的相关性。同样,浓缩酸橙汁和浓缩菠萝汁也有非常高的相关性,其相关系数达到了0.73。相比之下,浓缩柑橘汁和浓缩柠檬汁的消耗量,以及浓缩柑橘汁和浓缩菠萝汁的消耗量,都存在明显的负相关关系,其相关系数分别为-0.62和-0.42,可以认为他们一种消耗量越多,另一种消耗得就越少。同样的情况也发生在浓缩樱桃汁上,它和浓缩柠檬汁,浓缩菠萝汁的相关系数分别是-0.66和-0.78,因此它们之间也是明显的负相关关系,也就是一种原料的消耗量增加,另一种原料的消耗量就减少。
Utilize predictive modeling to analyze current data in order to set standard for production speed, scrap rate and cooking time for next year
Take one of our sauce products, Panda Oyster Sauce (5lb US version) for example. After I build a predictive model that made up of current production scrap rate data, a downward trend can be seen on the forecast chart. We cannot simply use average scrap rate to set standard for next year. Applying the knowledge of ?, scrap rate is set for next year’s standard.
Use Python to prepare final analysis reports for the stakeholders and help translate analytics into non-technical insights
Analysis on 5LB CAN Oyster Sauce Products
1. Overview
KC OS 5LB CAN and PANDA OS 5LB CAN are both US version canned oyster sauce products. KC oyster sauce is less expensive than PANDA oyster sauce, and Panda oyster sauce have higher quality than KC oyster sauce. Based on the Python tools, this report mainly analyzes the yield distribution, descriptive statistics, correlation, and yield fluctuations of the two products in 2019 and 2020.
2. Data Distribution Graphs and Tables
2.1 Before the COVID-19
Let KC denote KC OS 5LB CAN and P denote PANDA OS 5LB CAN
Table 2.1 Data’s Describe Of 2019
KC
P
count
12
12
mean
31456.08
94212.92
std
11165.45
29568.89
min
14508
24402
25%
23967.25
81350.25
50%
32387
96037.5
75%
41266.75
106333
max
44795
139926
Graph2.1 Product Distribution Of 2019
Graph2.2 Product Boxplot Of 2019
Graph2.3 Product Pairplot Of 2019
Graph2.4 Product Heatmap Of 2019
Table 2.2 Correlation Coefficient
KC
P
KC
1.0
0.03852096945267418
P
0.03852096945267418
1.0
2.2 During the Pandemic
Table 2.3 Data’s Describe Of 2020
KC
P
count
12
12
mean
22677.75
64603.83333
std
14035.44712
31313.59383
min
0
6706
25%
11393.5
47867.75
50%
22505.5
65893
75%
27936.5
90492
max
48122
105851
Graph2.5 Product Distribution Of 2020
Graph2.6 Product Boxplot Of 2020
Graph2.7 Product Pairplot Of 2020
Graph2.8 Product Heatmap Of 2020
Table 2.4 Correlation Coefficient
KC
P
KC
1.0
0.6476861633832216
P
0.6476861633832216
1.0
COVID-19
0.45012988308354707
0.5240297456748445
3. Data Analysis Conclusion
(1) It can be seen from the distribution map, box diagram and data in Table 2.1 and table 2.2 in Part 2 that the yield of PANDA products per month is higher than that of KC products before and during the pandemic, indicating that customers care more about quality than price.
(2) It can be seen from table 2.2 and Table 2.3 that before the pandemic, the correlation coefficient between the yield of PANDA products and KC products was 0.03852096945267418, there was no correlation, and there was no specific relationship between the yield of the two products; During the pandemic, the correlation coefficient was 0.6476861633832216, with weak correlation. Under the influence of the pandemic, the yield of the two products showed a regular rise and fall.
(3) During the pandemic, the yield of P products and KC products showed a downward trend, and the average monthly yield of PANDA products decreased by about 29609 cases and KC products decreased by about 8778 cases; In terms of correlation, the correlation coefficient of PANDA product is 0.5240297456748445, and the correlation coefficient of KC product is 0.45012988308354707. Both of them have weak correlation with the pandemic situation, but from the analysis of yield and correlation coefficient, PANDA product is more affected by the pandemic situation.
Appendix
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
import seaborn as sns
def main(split_date):
start_year, end_year, number = -1, -1, -1
if split_date == 0:
start_year = 2019
end_year = 2020
number = 12
else:
start_year = 2020
end_year = 2022
number = 19
df = pd.DataFrame()
for year in range(start_year, end_year):
data = pd.read_excel(‘prod data.xlsx’, sheet_name=str(year), index_col=0) # read file data
df = df.append(pd.DataFrame(data.transpose()), ignore_index=True) # get Dataframe transpose
print(df.describe())
print(“#” * 50)
covid_19_data = pd.read_excel(‘COVID-19.xlsx’)
print(covid_19_data)
# Visual analysis
fig, ax = plt.subplots(1, 1, figsize=(8, 6))
months = [x + 1 for x in range(number)]
plt.plot(months, df[‘KC’], ‘o’, label=’KC_Pro’)
plt.plot(months, df[‘P’], ‘^’, label=’p_Pro’)
plt.xticks(months)
plt.xlabel(‘Pro_Type’, size=15)
plt.ylabel(‘Number’, size=15)
plt.legend(loc=’best’)
plt.title(‘Pro_Distribution’)
plt.show()
# Visual analysis boxplot
fig_1, ax_1 = plt.subplots(1, 1, figsize=(8, 6))
rvs1 = df[‘KC’]
rvs2 = df[‘P’]
plt.ylabel(‘Pro_Type’, size=15)
plt.ylabel(‘Number’, size=15)
plt.title(‘Pro_BoxPlot’)
plt.boxplot([rvs1, rvs2], labels=[‘KC_Pro’, ‘P_Pro’])
plt.show()
# Visual analysis pairplot & heatmap
sns.pairplot(df)
plt.show()
sns.heatmap(df.corr())
plt.show()
# Correlation coefficient(KC & P)
corrcoef = np.corrcoef(df, rowvar=False)
print(corrcoef)
print(“#” * 50)
count = stats.pearsonr(df[‘KC’], df[‘P’])
print(count)
print(“#” * 50)
# Correlation coefficient(KCP & COVID-19)
if split_date != 0:
count_KC_COV = stats.pearsonr(df[‘KC’], covid_19_data[‘Cases’])
count_P_COV = stats.pearsonr(df[‘P’], covid_19_data[‘Cases’])
print(count_KC_COV)
print(count_P_COV)
if __name__ == ‘__main__’:
split_date_list = [0, 1] # 0:before COVID-19; 1:during COVID-19
for split_date in split_date_list:
main(split_date)
Global Lean Steering Committee Project
In this 5S Improvement Project, I extracted label data from SAP system and used SQL queries to do data arrangement. Then applied Excel skills to get a label inventory list. So, we can get the exact information of each label such as quantity, size, location and so on in one second. This improvement much improves working efficiency.
The post Work Samples Use automated tools to extract data from primary and secondary appeared first on PapersSpot.