

problem
This project involves building simulations to generate training data, which is then used for statistical analysis to better understand and interpret the underlying patterns and phenomena.
This project uses Excel-based simulations to introduce controlled variables and probabilities, effectively randomizing the original training data. From this randomized dataset, simple statistics are extracted to generate new testing data. By comparing the original training data with the testing data, the project explores how subtle changes in variables and probabilities can influence the final outcomes.
Solution
#1 Generate Testing Data Sample
To ensure the simulation produced reasonable and realistic data, I introduced several variables and set limitations on the probabilities used
- Establish salary bins and restrict the distribution within each salary group.


=IF(RAND()<0.2,RANDBETWEEN($Q$2,$S$6),IF(RAND()<0.5,RANDBETWEEN($Q$7,$S$11),IF(RAND()<0.5, RANDBETWEEN($Q$12,$S$16),IF(RAND()<0.2,RANDBETWEEN($Q$17,$S$22),RANDBETWEEN($Q$7,$S$16)))))
#2 Chance of Coming Into the Store


=IFS(AND(A2>$Q$2,A2<$S$6),IF(RAND()<$U$2,"YES","NO"),AND(A2>$Q$7,A2<$S$11),IF(RAND()<$U$7,"YES","NO"), AND(A2>$Q$12,A2<$S$16),IF(RAND()<$U$12,"YES","NO"),AND(A2>$Q$17,A2<$S$22),IF(RAND()<$U$17,"YES","NO"))
#3 Make a personalized decision (BUY or No) based on the in-store time
Each customer is randomly assigned a time spent in the store, ranging from 1 to 30 minutes. The probability of making a purchase depends on this duration: customers who stay more than 10 minutes have a 50% chance of buying, those who stay less than 2 minutes have a 10% chance, and all others have a 50% probability of making a purchase.

=IF(B2="YES",RANDBETWEEN(1,30),"-")
=IF(C2="-","NO",IF(C2>10,IF(RAND()<0.5,"BUY","NO"),IF(C2<2,IF(RAND()<0.1,"YES","NO"),IF(RAND()>0.5,"BUY","NO"))))
#4 Customer Decision Regarding Jewelry Purchase
Professor Jones provided a simulation modeling customers’ decisions to purchase jewelry, including probabilities for each variable. I enhanced this simulation by incorporating real-world salary data, linking the probabilities to customers’ salary levels. While the probabilities themselves remain unchanged, they now apply specifically to customers who have decided to make a purchase, adding a layer of realism to the model.

=IF(D3="BUY",IF(A3<$Q$12,IF(RAND()<0.7,"custom","pre-existing"),IF(RAND()<0.2,"custom","pre-existing")),"-")
Final Simulation
Starting from individual salaries, the simulation tracks the time spent in each jewelry process and the customer’s purchasing decision.
General Statistics Across Salary Ranges
Statistical Analysis
Original Data: Using the fixed statistics from one of the simulation and generating testing data.
Simulated Data: Using the active statistics from the simulation and generating testing data.
#1 The Result in the Change of Parameters
Scenario 1:
If we were to purchase a new computer-controlled reductive-manufacturing platform capable to reducing the metal-working time to build a new custom piece by an average of two days and cutting the standard deviation in half, will it make a significant difference in the total time to produce our jewelry?
Observe:
By reducing the average day of wait for metal work 2 days, the entire graph shafted to the earlier days making it to have significant difference

Scenario 2:
If we were to hold more gemstones in inventory, reducing the need to order and wait by 50%, will this make a significant difference in the total time to produce our jewelry?
Observe:
By reducing the average day of wait for metal work 2 days, the entire graph shafted to the earlier days making it to have significant difference

Scenario 3:
If the probability of customer choosing custom or pre-existing design change, will it make significant difference? (customer with salary less than $105,000, now will have 10% chance to choose custom design)
Observe:
This change will influence customers’ decisions on design work, leading them to choose more pre-existing designs. From a logical standpoint, the availability of more pre-existing design choices reduces the customer wait time for their final product.

As a result, significant differences arise when the customer’s decision changes. Also, from the distribution graph, it is clear that the early wait time has increased, while the later wait time has reduced, indicating that more customers are now receiving their products faster.
