Contract between NNAPI systrace code and parser =============================================== This text files documents how tracing in the NNAPI manifests in systrace output and how that output is interpreted by the systrace parser. Please view in a 160 character window. Special cases ------------- - Execution is exposed as an asynchronous event from the runtime. Time taken by the runtime is calculated as the time between start of ANeuralNetworksExecution_startCompute and end of ANeuralNetworksEvent_wait. This special case is not reflected in the cases presented below. Notation -------- - ...: elided code - t_m_w: tracing_mark_write - tX: timestamps - T1, T2: thread ids Cases for the parser ==================== Source code Systrace Interpretation for timing statistics - all times are wallclock ------------------------------------------------------------------------------------------------------------------------------------------------------ *Baseline case* ... funcP(...) { t0: t_m_w:B|T1|[NN_LR_PP]funcP Add (t1-t0) to total time spent in Layer NNTRACE_RT(NNTRACE_PHASE_PREPARATION, t1: t_m_w:E|T1 Runtime, Phase Preparation "funcP); ... } ------------------------------------------------------------------------------------------------------------------------------------------------------ *Local call to other layer* ... funcA1(...) { t0: t_m_w:B|T1|[NN_LA_PP]funcE1 Add (t3-t0) to total time spent in Layer NNTRACE_APP(NNTRACE_PHASE_PREPARATION, t1: t_m_w:B|T1|[NN_LR_PP]funcC1 Application, Phase Preparation "funcA1); t2: t_m_w:E|T1 ... funcR1(...); t3: t_m_w:E|T1 Add (t2-t1) to total time spent in Layer ... Runtime, Phase Preparation } ... funcR1(...) { Note: Self-time of Layer Application, NNTRACE_RT(NNTRACE_PHASE_PREPARATION, Phase Preparation will be calculated as "funcR1"); ... total time in Layer Application - total time } in Layer Runtime Note: These can be nested as per rows below ("Switch phase ...", "Subphases ...", "Additional detail...") Note: If the called function specifies a phase that is not supposed to be nested, the parser will emit a diagnostic. ------------------------------------------------------------------------------------------------------------------------------------------------------ *Switch phase during function* ... funcC1(...) { t0: t_m_w:B|T1|[NN_LC_PTR]funcC1 Add (t1-t0) to total time spent in Layer NNTRACE_TRANS("funcC1"); t1: t_m_w:B|T1|[SW][NN_LC_PCO]funcC1 CPU, Phase Transformation ... t2: t_m_w:E|T1 NNTRACE_COMP_SWITCH("funcC1"); t3: t_m_w:E|T1 Add (t2-t1) to total time spent in Layer ... CPU, Phase Computation } (t3-t2 treated as negligible - only the destructors of objects created between the tracepoints) ------------------------------------------------------------------------------------------------------------------------------------------------------ *Subphases of execution* ... funcR2(...) { t0: t_m_w:B|T1|[NN_LR_PE]funcR2 Add (t2-t1) to total time spent in Layer NNTRACE_RT(NNTRACE_PHASE_EXECUTION, t1: t_m_w:B|T1|[NN_LC_PCO]funcC2 CPU, (sub)Phase Computation and to "funcR2); t2: t_m_w:E|T1 total time in Phase Execution ... funcC2(...); t3: t_m_w:E|T1 ... Add (t3-t0) to total time spent in Layer } Runtime, Phase Execution ... funcC2(...) { NNTRACE_COMP("funcC2"); ... } ------------------------------------------------------------------------------------------------------------------------------------------------------ *Additional detail in the same layer* ... funcR3(...) { t0: t_m_w:B|T1|[NN_LR_PE]funcR3 Add (t3-t0) to total time spent in Layer NNTRACE_RT(NNTRACE_PHASE_EXECUTION, t1: t_m_w:B|T1|[NN_LR_PE]funcR4 Runtime, Phase Execution "funcR3); t2: t_m_w:E|T1 ... funcR4(...); t3: t_m_w:E|T1 Note: funcR4 will be visible in the systrace ... visualization } ... funcR4(...) { NNTRACE_RT(NNTRACE_PHASE_EXECUTION, "funcR4"); ... } ------------------------------------------------------------------------------------------------------------------------------------------------------ *Synchronous IPC call* ... funcR5(...) { t0: t_m_w:B|T1|[NN_LR_PC]funcR5 Add (t5-t0) - (t4-r1) to total time spent in NNTRACE_RT(NNTRACE_PHASE_COMPILATION, t1: t_m_w:B|T1|[NN_LI_PI]getCapabilities Layer Runtime, Phase Compilation; see "funcR5"); t2: t_m_w:B|T1|HIDL::IDevice::getCapabilities::client "Onetime initialization code". ... device->getCapabilities() t3: t_m_w:E|T1 ... t4: t_m_w:E|T1 Add (t4-t1) to total time spent in Layer } t5: t_m_w:E|T1 IPC, Phase Initialization ... VersionedIDevice::getCapabilities(...) { NTRACE_FULL(NNTRACE_LAYER_IPC, Note: Self-time of Layer Runtime, Phase NNTRACE_PHASE_COMPILATION, Compilation will be calculated as total "getCapabilities"); time in Layer Runtime - total time in Layer IPC } Note: Tracepoints are needed for the client IPC calls. The HIDL tracing isn't guaranteed to wait for the server - it just sends the transaction even if the call is synchronous. ------------------------------------------------------------------------------------------------------------------------------------------------------ *Asynchronous IPC call that is synchronously waited for by the runtime* // Runtime code t0: t_m_w:B|T1|[NN_LI_PC]prepareModel Add (t10-t0) to total time spent in Layer ... funcRC(...) { t1: t_m_w:B|T1|HIDL::IDevice::prepareModel_1_1::client IPC, Phase Compilation ... t2: t_m_w:B|T2|HIDL::IDevice::prepareModel_1_1::server NTRACE_FULL(NNTRACE_LAYER_IPC, t3: t_m_w:B|T2|[NN_LD_PC]SampleDriver::prepareModel Add (t6-t2) to total time spent in Layer NNTRACE_PHASE_COMPILATION, t4: t_m_w:B|T2|HIDL::IPreparedModelCallback::notify::clie Driver, Phase Compilation. This includes "prapareModel"); t5: t_m_w:E|T2 the generated HIDL stub code, which is ... t6: t_m_w:E|T2 <0.05ms. device->prepareModel(...); t7: t_m_w:E|T2 ... t8: t_m_w:B|T1|HIDL::IPreparedModelCallback::notify::serv Note: the HIDL trace rows are added by cb->wait(); t9: t_m_w:E|T1 the automatically generated proxy and ... t10: t_m_w:E|T1 stub code. For the driver side, the } t11: t_m_w:E|T1 mapping of the HIDL functions to layers and phases is done in the systrace // Driver code parser. ... SampleDriver::prepareModel(...) { NNTRACE_FULL(NNTRACE_LAYER_DRIVER, Note: the SampleDriver::prepareModel is NNTRACE_PHASE_COMPILATION, treated as additional detail for Layer "SampleDriver::prepareModel"); Driver, Phase Compilation. } Note: the "t_m_w" output of systrace uses thread ids, so that the call stack can be reconstructed. The systrace rows are also annotated with process ids. The parser uses the process ids to distinguish between the application process from the driver process (used for diagnostics and for distinguishing CPU fallback from sample driver). Note: the next row in this table gives more detail for prepareModel specifically Note: the driver-side HIDL traces get us the time spent in sample and hvx drivers. With a different driver threading model this may not be the case - future drivers should add manual tracing. TODO: attribute driver process IPC call (callback) overhead to IPC layer. ------------------------------------------------------------------------------------------------------------------------------------------------------ *Subtracting time when nesting is violated* // Runtime code t0: t_m_w:B|T1|[NN_LI_PC]prepareModel Add (t3 - t0) - (t2 - t1) to total time spent ... funcRC(...) { t1: t_m_w:B|T1|[SUB][NN_LR_PC]VersionedIDevice::prepareM in Layer IPC, Phase compilation ... t2: t_m_w:E|T1 NTRACE_FULL(NNTRACE_LAYER_IPC, t3: t_m_w:E|T1 Add (t2-t1) to total time spent in Layer NNTRACE_PHASE_COMPILATION, Runtime, Phase compilation "prapareModel"); ... device->prepareModel(...); ... cb->wait(); ... } ... VersionedIDevice::prepareModel(...) { // IPC work { NNTRACE_FULL_SUBTRACT( NNTRACE_LAYER_RUNTIME, NNTRACE_PHASE_COMPILATION, "VersionedIDevice::prepareModel"); // Runtime work } // IPC work } ------------------------------------------------------------------------------------------------------------------------------------------------------ *Onetime initialization code* ... funcR5(...) { t0: t_m_w:B|T1|[NN_LR_PP]funcR5 Add (t2-t1) to total time spent in Layer NNTRACE_RT(NNTRACE_PHASE_PREPARATION, t1: t_m_w:B|T1|[NN_LR_PI]funcI Runtime, Phase Initialization "runcR5); t2: t_m_w:E|T1 ... funcI(...); t3: t_m_w:E|T1 Add (t3 - t0) - (t2 - t1) to total time spent ... in Layer Runtime, Phase Preparation. } ... funcI(...) { NNTRACE_RT(NNTRACE_PHASE_INITIALIZATION, "funcI") ... } ------------------------------------------------------------------------------------------------------------------------------------------------------ *Utility code* ... funcR6(...) { t0: t_m_w:B|T1|[NN_LR_PP]funcR6 Add (t3-t0) to total time spent in Layer NNTRACE_RT(NNTRACE_PHASE_PREPARATION, t1: t_m_w:B|T1|[NN_LU_PU]funcU Runtime, Phase Preparation "funcR6"); t2: t_m_w:E|T1 ... funcU(...); t3: t_m_w:E|T1 Note: the funcU is treated as additional ... detail. } ... funcU(...) { NNTRACE_FULL(NNTRACE_LAYER_UTILITY, NNTRACE_PHASE_UNSPECIFIED, "funcU") ... }