Building Evidence: Simulation for Early-Stage Discovery
Finding realistic insights even in the absence of real data
Introduction
This paper discusses how Discrete Event Simulation (DES) helps create a realistic data set when real data is scarce. Problems abound where building a solution depends squarely on availability of real data and access to that is difficult without at least a draft solution that can be demonstrated to potential owners of the data. The paper draws directly on our experience of building such an initial solution, something which we felt would prove useful for manufacturing using Surface Mount Technology (SMT). The SMT process is used in assembling electronic components directly onto a circuit board.
We held a hypothesis born out of real - life experience that a pipelined SMT process has unaccounted for losses. However, there was no corroborating evidence as these losses are not found logged in the production records. Such losses make OEE (Overall Equipment Effectiveness) actually lower than what is calculated, quite a bit lower than what the equipment pipeline is capable of producing. Correctly calculating OEE would reveal to companies where they really are and show up the ways to improve productivity. [OEE is a key manufacturing metric that measures how effectively equipment is utilized by tracking availability (no downtime is 100% availability), performance (maximum speed is 100%), and
quality (producing only good parts for 100% quality)].
If the bottleneck machine in a SMT pipeline (i.e., the slowest in a pipeline) has an average cycle time of T, the OEE should be calculated using T as the cycle time for a SMT line plus a small tolerance factor for minor variations. Cycle time is the actual time it takes to produce one complete printed circuit board (PCB) or unit from start to finish within the production line. Our observation was that the actual cycle-time companies report exceeds T by a good measure. Could we demonstrate this happens? We believed equipment logs contained this information. If the logs were analysed and the results visualised using a well-designed UI. If we could quantify these losses and our hunch was, we could find the cause for these losses.
The problem, however, was that for doing all that, we needed detailed machine logs, and no potential customer would share them. To start with, they knew there were some losses, but no one really thought much about those losses anyway. So, to even begin we would need to create the loss data ourselves ... as if we had the logs. That is where we explored the route of DES.
A Few Words on SMT
In an SMT set up, the production of a PCB (including verification by testing machines) is done using 8-10 machines connected in a pipeline, including - 'Marker', 'Paste-Printer', 'Pick-and-Place' (P&P), 'Reflow-oven' and ‘Verification’. An unpopulated 'panel' (a PCB) is loaded by a 'Loader' machine, and the fully populated, validated PCB is collected at a 'Receiver'. An SMT plant may have several such pipelines (Lines) running in parallel.
Typically, each machine in an SMT line is expected to take a consistent amount of time to carry out the part of the process it is designated to do in the production of a given PCB design, subject to some small variations. OEE calculations take care of the occasional breakdown and changeover times (changeover from one PCB design to another). A calculated OEE is based on individual machines' cycle-times and theoretical changeover, breakdown times and scheduled line/plant stoppages.
The actual OEE is based on real throughput, and actual times for changeovers, breakdowns and plant/line stoppages. OEE in real life can fall short from the calculated OEE by a small percentage. There are undocumented losses. Our plan was to demonstrate (though initially using simulated data) that actual OEE in fact falls short by quite a lot from the calculated OEE and plants refused to accept that reality. In fact, plants tend to inflate cycle times to demonstrate a high coherence between calculated and actual OEE. Our experience pointed to that.
Our partner and collaborator is an experienced domain expert in the SMT assembly - based manufacturing. His opinion was that there are certain kinds of short-stops (short losses) which shop floor personnel are clueless about. These losses (called micro-stops) are just accepted but these micro-stops probably add up to a whole lot of losses (in his words “death by a thousand cuts”).
The Issues and How We Went About It
Our intended software was meant to show tables, charts and trends of actual (i.e., live) production, highlighting losses that cause real cycle-time to be bigger in many cases. The losses that caused them should also be similarly shown in tables, charts and trends. How can we get these losses? The answer is, get bulk losses from MES (Manufacturing Execution Systems) and finer losses, i.e., short-stops from machine logs. But we need to convince a customer first. So, we generated our own data through DES - an elaborate mock up using a couple of month’s production data for ten odd machines per line and five such lines. We simulated as many real - life complications as is possible and pieced everything together to look real - a fully running software, database and UI.
We generated the volume of data in some parts. A major bulk was for normal cases for ideal cycle times. Then, there was data for cycles with short stops time delays, i.e., where cycle-times got extended. Our data would probably not reflect reality. But the closer we can get the short stop times and their repetitions vis-a-vis real life, closer we will be to reality.
Our simulation needed to highlight losses that cause real cycle-time to be often longer. The software would employ tables, charts and trends of actual (i.e., live) production, highlighting losses that cause real cycle-time to be often larger than the calculated one. The cause of the losses would also be similarly displayed in tables, charts and though trends.
In real life, bulk loss data is available from MES (Manufacturing Execution Systems) and smaller losses (i.e., short-stops) can be extracted from machine logs. We generated our own datasets through DES - an elaborate mock up generating a couple of month’s production data for ten odd machines per line and five such lines. We simulated (designed in) as many real-life complications as possible and pieced everything together to resemble reality as closely as possible - the result being the ability to populate a fully running software, database and UI.
The bulk of the generated data was to represent production with ideal cycle times. But we additionally generated data for cycles with short stop time delays (or micro-stops), i.e., with extended cycle-times. Simulated data will not reflect reality completely, being it was built on approximations suggested by pure memory of real - life experience. But the objective was to get close to reality by building in as much randomness as techniques permit.
Simulation Run of One SMT-Machine
In Fig-1, one thing to note is that three P&P (Pick – Place) machines have been used. It is quite normal to use multiple P&P machines, especially when the circuit on the PCB to be produced is complex. P&P machines being slower than others, it is preferable that the multiple P&P machines are balanced, i.e., one tries such that the cycle-times for each are as close to each other for most PCB designs. This helps in optimising the time taken in the P&P process. [A Pick – Place machine automates placing of electronic components (like resistors, capacitors, ICs) onto printed circuit boards (PCBs) with high speed and precision, picking them from feeders and placing them onto designated spots].
Consider a machine M in a pipeline (Fig-1) which we call the current (or this) machine (Fig-2). We may assume that if it starts its run (i.e., starts processing a PCB P on its input) at time T, time ΔT (cycle-time) will be needed to finish its operation before it places the PCB at its output. At T + ΔT, the conveyance logic of the simulation engine carries the output PCB from M to the input of the next machine.
Short Stop Logic
The Short Stop Logic is a part of the run logic of a machine. This logic 'predicts' a short-stop of type S for duration 'D' on machine M. Upon receiving the prediction:
- The short stop event of type S is logged for machine M at T + ΔT
- The cycle time of M is simply increased by D
Note: The short-stop generation logic is hooked on to each machine being simulated. Once a stop event S is triggered, a randomisation logic is used to decide the type and duration of the next short stop.
The Run Logic
The run logic is best explained by a state machine in Fig-3. The state machine represents the notional operation of one machine for one time unit.
The Simulation Engine
The overall simulation engine is controlled by a logic called a time wheel in DES parlance - a logic that maintains the current time (called NOW) in some time unit (say a second). It runs the machines in a pipeline one by one starting from the first one till the last. It then moves over to the next pipeline, if any. Once all pipelines have been run, NOW is incremented and simulation of pipelines starts all over again. The logic of the Simulation Engine is thus:
- Initialize NOW to the configured start time of simulation
- Repeat until NOW has reached the configured end time of simulation
- Repeat for all pipelines
- Repeat for all machines M
- Run M
- Repeat for all machines M
- NOW ← NOW + 1
- Repeat for all pipelines
The Simulation Engine can run over a long range of NOW in hardly a few minutes. This way, one can create a large amount of data in a short time. If the cycle-time, short-stop times, etc., are good approximates, the generated data will reasonably resemble real data.
Some machines like 'Pre-reflow-AOI' in Fig-1 are those that validate the PCB given to it, and a small fraction of them are NOT-OK in inspection. For such a machine, we associated a configured pass-percentage value (given to us by the domain expert). Only OK PCBs are conveyed to the next machine, and the NOT-OK ones are added to a failed-PCB list.
For changeovers, we stopped the Loader machine for the changeover duration. Thus, after a time, all downstream machines will go on issuing pre-processing waits for the duration of the changeover.
The Configuration Data Driving Simulation
The domain expert came to us with the values that affect simulation, as given in Table-1.
| Machine Type | Cycle Time | Short-stop Type | Short-stop Duration | Short-stop Repetition | |||
|---|---|---|---|---|---|---|---|
| Mean | SD | Mean | SD | Mean | SD | ||
| … | |||||||
| Pick-and-Place | 25 | 2 | Error-1 | 30 | 5 | 900 | 115 |
| Error-2 | 50 | 10 | 1800 | 150 | |||
| Error-3 | 20 | 5 | 1000 | 135 | |||
| … | |||||||
Table-1: Values used in simulation as provided by domain expert Note: a) All durations in the table are in seconds b) actual short-stop types obfuscated for IP protection Configuration parameters also included changeover intervals, breakdown percentages and repetition.
Production Batch and Dates
A pipeline produces PCBs in a batch. Each batch has a name and a target number of units to produce for a given time duration. The time duration is somewhat flexible on the higher side; production may extend the duration if the target has not been met. In case the target is met within the stipulated batch duration, the simulation logic lets production proceed until the end time specified, thereby producing more PCBs than planned. All PCBs in a batch bear the batch's name as a prefix, which makes it possible to segregate the output PCBs of a batch cleanly. The concept of actual calendar dates and production shifts was also added. For example, a simulation could run from the start of Shift-A on 1st August till the end of Shift-C on 31st December. With the introduction of calendar dates and shifts with targeted production, the whole situation became even closer to real life.
Conclusion of DES Based Data Generation
Looking at the output produced by our simulation, our belief that it resembled reality quite closely was reaffirmed. We therefore built a UI around it. In this UI, we displayed shift-wise, day-wise and month-wise OEE values for each pipeline. Likewise, we could show short stop durations classified by line, individual machine, types of machines taken together with further classification by stop-type. We could also display simulation output piecemeal, for example, for two calendar weeks at a time. By allowing for changes in configuration values after each such interval, we could create 'what-if' situations where some machines behave better or worse. We allowed for graphical production trends - daily, weekly or monthly associating with it OEE, stoppage times, etc..
Afterthoughts
Our simulated data and UI piqued interest among some, and the first prospective customer made available the logs of a line with multiple Pick-and-Place (P&P) machines.
The data from machine logs were however a revelation.
We discovered there are many more short-stop types than we had considered, even imagined. There are at times multiple short stops of the same type within a cycle time, whereas the designed simulation did not provide for this. The durations of the short-stops are in most cases dependent on operator behaviour where a simple statistical model based on mean and standard deviation may not capture reality well enough, which is what was used. However, even simulating with these and other imperfect assumptions gave results good enough to interest potential clients leading to us getting access to real data and thereby beginning the evolution of the product. Simulating a real - life situation without real life data made it possible to demonstrate much more effectively what is possible, than can be done using slides presenting the concept. A simulation does bring out a reasonable shape of the reality even in the absence of real data.
