SOLUTION: Work Samples Use automated tools to extract data from primary and secondary

Work Samples

Use automated tools to extract data from primary and secondary sources

Use SAP ERP Modules to export material data and insert pivot table in Microsoft Excel.

Utilize strong Microsoft Excel and SQL skills to provide high-quality data analysis and build daily and monthly production performance report and KPI reports, highlight high impact factors to drive manufacturing improvements

We have 19 production lines which daily produce more than 10 different kinds of products. I analyze production speed, efficiency, yield rate of each production order to complete Production Performance Report

In Plant Performance Report, I analyze yield rate both by line and by category, line utilization, overall efficency and scrap rate, etc.

I use Excel pivot-table and vlookup and the knowledge of variance to analyze the difference between production target and actual output in Production Adherence Report.

Supply Chain KPI Report analyzes absolute variance to Production Adherence, inventory turnover days and order fulfilment rate

Use Microsoft Excel and statistical knowledge to analyze the movement of raw materials and packing materials for product

Create analysis of slow-moving materials to generate Aging Reports and define the solution in both short term and long term for materials planning every month

Use RStudio to identify, analyze patterns in the consumption of raw materials who contains juice concentrate, and interpret the findings in both English and Chinese

juice_data<-read.csv(“juice material.csv”)

juice_data_t<-juice_data_t[,-9]

names(juice_data_t)<-c(“CONCENTRATED LEMON JUICE”,”PINEAPPLE CONCENTRATE”,

“ORANGE JUICE CONCENTRATE”,”LIME JUICE CONCENTRATE”,

“SALTED PLUM PUREE”,”TANGERINE CONCENTRATE”,

“COCONUT MILK POWDERTE”,”CHERRY JUICE CONCENTRATE”)

juice_data_t<-apply(juice_data_t,2,as.numeric)

library(pheatmap)

pheatmap(cor(juice_data_t))

From the relationship between each two juice raw material consumptions, we can see find interesting conclusions:

The correlation coefficient between CONCENTRATED LEMON JUICE and PINEAPPLE CONCENTRATE reached 0.74, indicates that there is a significant correlation between the consumption of these concentrated juice raw material. Similarly, LIME JUICE CONCENTRATE and PINEAPPLE CONCENTRATE also have a very high correlation relationship, with a correlation coefficient of 0.73. In contrast, the consumption of TANGERINE CONCENTRATE and CONCENTRATED LEMON JUICE, as well as the consumption of TANGERINE CONCENTRATE and PINEAPPLE CONCENTRATE, have a significant negative correlation, and their correlation coefficients are -0.62 and -0.42, respectively, which can be considered as the more one consumes, the less the other consumes. The same situation also occurs in CHERRY JUICE CONCENTRATE. Its correlation coefficients with CONCENTRATED LEMON JUICE and PINEAPPLE CONCENTRATE are -0.66 and -0.78, respectively. Therefore, there is also an obvious negative correlation between them, that is, the consumption of a raw material increases, the consumption of the other raw material decreases.

我们通过上图含有浓缩果汁的原材料之间的相关性得到一定的结论：

浓缩柠檬汁和浓缩菠萝汁的相关系数达到了0.74，说明这两种浓缩果汁原料的消耗量存在显著的相关性。同样，浓缩酸橙汁和浓缩菠萝汁也有非常高的相关性，其相关系数达到了0.73。相比之下，浓缩柑橘汁和浓缩柠檬汁的消耗量，以及浓缩柑橘汁和浓缩菠萝汁的消耗量，都存在明显的负相关关系，其相关系数分别为-0.62和-0.42，可以认为他们一种消耗量越多，另一种消耗得就越少。同样的情况也发生在浓缩樱桃汁上，它和浓缩柠檬汁，浓缩菠萝汁的相关系数分别是-0.66和-0.78，因此它们之间也是明显的负相关关系，也就是一种原料的消耗量增加，另一种原料的消耗量就减少。

Utilize predictive modeling to analyze current data in order to set standard for production speed, scrap rate and cooking time for next year

Take one of our sauce products, Panda Oyster Sauce (5lb US version) for example. After I build a predictive model that made up of current production scrap rate data, a downward trend can be seen on the forecast chart. We cannot simply use average scrap rate to set standard for next year. Applying the knowledge of ?, scrap rate is set for next year’s standard.

Use Python to prepare final analysis reports for the stakeholders and help translate analytics into non-technical insights

Analysis on 5LB CAN Oyster Sauce Products

1. Overview

KC OS 5LB CAN and PANDA OS 5LB CAN are both US version canned oyster sauce products. KC oyster sauce is less expensive than PANDA oyster sauce, and Panda oyster sauce have higher quality than KC oyster sauce. Based on the Python tools, this report mainly analyzes the yield distribution, descriptive statistics, correlation, and yield fluctuations of the two products in 2019 and 2020.

2. Data Distribution Graphs and Tables

2.1 Before the COVID-19

Let KC denote KC OS 5LB CAN and P denote PANDA OS 5LB CAN

Table 2.1 Data’s Describe Of 2019

count

mean

31456.08

94212.92

std

11165.45

29568.89

min

14508

24402

25%

23967.25

81350.25

50%

32387

96037.5

75%

41266.75

106333

max

44795

139926

Graph2.1 Product Distribution Of 2019

Graph2.2 Product Boxplot Of 2019

Graph2.3 Product Pairplot Of 2019

Graph2.4 Product Heatmap Of 2019

Table 2.2 Correlation Coefficient

1.0

0.03852096945267418

1.0

2.2 During the Pandemic

Table 2.3 Data’s Describe Of 2020

count

mean

22677.75

64603.83333

std

14035.44712

31313.59383

min

6706

25%

11393.5

47867.75

50%

22505.5

65893

75%

27936.5

90492

max

48122

105851

Graph2.5 Product Distribution Of 2020

Graph2.6 Product Boxplot Of 2020

Graph2.7 Product Pairplot Of 2020

Graph2.8 Product Heatmap Of 2020

Table 2.4 Correlation Coefficient

1.0

0.6476861633832216

1.0

COVID-19

0.45012988308354707

0.5240297456748445

3. Data Analysis Conclusion

(1) It can be seen from the distribution map, box diagram and data in Table 2.1 and table 2.2 in Part 2 that the yield of PANDA products per month is higher than that of KC products before and during the pandemic, indicating that customers care more about quality than price.

(2) It can be seen from table 2.2 and Table 2.3 that before the pandemic, the correlation coefficient between the yield of PANDA products and KC products was 0.03852096945267418, there was no correlation, and there was no specific relationship between the yield of the two products; During the pandemic, the correlation coefficient was 0.6476861633832216, with weak correlation. Under the influence of the pandemic, the yield of the two products showed a regular rise and fall.

(3) During the pandemic, the yield of P products and KC products showed a downward trend, and the average monthly yield of PANDA products decreased by about 29609 cases and KC products decreased by about 8778 cases; In terms of correlation, the correlation coefficient of PANDA product is 0.5240297456748445, and the correlation coefficient of KC product is 0.45012988308354707. Both of them have weak correlation with the pandemic situation, but from the analysis of yield and correlation coefficient, PANDA product is more affected by the pandemic situation.

Appendix

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from scipy import stats

import seaborn as sns

def main(split_date):

start_year, end_year, number = -1, -1, -1

if split_date == 0:

start_year = 2019

end_year = 2020

number = 12

else:

start_year = 2020

end_year = 2022

number = 19

df = pd.DataFrame()

for year in range(start_year, end_year):

data = pd.read_excel(‘prod data.xlsx’, sheet_name=str(year), index_col=0) # read file data

df = df.append(pd.DataFrame(data.transpose()), ignore_index=True) # get Dataframe transpose

print(df.describe())

print(“#” * 50)

covid_19_data = pd.read_excel(‘COVID-19.xlsx’)

print(covid_19_data)

# Visual analysis

fig, ax = plt.subplots(1, 1, figsize=(8, 6))

months = [x + 1 for x in range(number)]

plt.plot(months, df[‘KC’], ‘o’, label=’KC_Pro’)

plt.plot(months, df[‘P’], ‘^’, label=’p_Pro’)

plt.xticks(months)

plt.xlabel(‘Pro_Type’, size=15)

plt.ylabel(‘Number’, size=15)

plt.legend(loc=’best’)

plt.title(‘Pro_Distribution’)

plt.show()

# Visual analysis boxplot

fig_1, ax_1 = plt.subplots(1, 1, figsize=(8, 6))

rvs1 = df[‘KC’]

rvs2 = df[‘P’]

plt.ylabel(‘Pro_Type’, size=15)

plt.ylabel(‘Number’, size=15)

plt.title(‘Pro_BoxPlot’)

plt.boxplot([rvs1, rvs2], labels=[‘KC_Pro’, ‘P_Pro’])

plt.show()

# Visual analysis pairplot & heatmap

sns.pairplot(df)

plt.show()

sns.heatmap(df.corr())

plt.show()

# Correlation coefficient(KC & P)

corrcoef = np.corrcoef(df, rowvar=False)

print(corrcoef)

print(“#” * 50)

count = stats.pearsonr(df[‘KC’], df[‘P’])

print(count)

print(“#” * 50)

# Correlation coefficient(KCP & COVID-19)

if split_date != 0:

count_KC_COV = stats.pearsonr(df[‘KC’], covid_19_data[‘Cases’])

count_P_COV = stats.pearsonr(df[‘P’], covid_19_data[‘Cases’])

print(count_KC_COV)

print(count_P_COV)

if __name__ == ‘__main__’:

split_date_list = [0, 1] # 0:before COVID-19; 1:during COVID-19

for split_date in split_date_list:

main(split_date)

Global Lean Steering Committee Project

In this 5S Improvement Project, I extracted label data from SAP system and used SQL queries to do data arrangement. Then applied Excel skills to get a label inventory list. So, we can get the exact information of each label such as quantity, size, location and so on in one second. This improvement much improves working efficiency.

The post Work Samples Use automated tools to extract data from primary and secondary appeared first on PapersSpot.