OpenSolaris

You are not signed in. Sign in or register.

Performance

  • Scheduler optimizations and shared cache awareness
    • Shared cache detection / optimization
      • The CPUID instruction allows detection of shared caches between cores and threads on the physical processor. Solaris should detect which cores/threads share caches, so that the appropriate affinity and load balancing polices can be implemented. The RFE to track this work is 6495401. To maximize aggregate cache availability, on partially utilized systems the dispatcher should load balance across the shared cache level, and to improve utilization, where possible threads needing to migrate should prefer to do so between the set of CPUs sharing the cache.
    • Monitor/mwait based "halt/wakeup" mechanism
      • When CPUs go idle, they eventually invoke the halt instruction, which puts the CPU in the C1 state. At this point the CPU is suspended (in a sense), and can/will be awoken via an interrupt. When work destined for the halted CPU becomes available, the CPU enqueueing the thread on the halted CPUs run queue must send an inter-processor interrupt to the CPU to awaken it, and in practice, this is done in the context of a given thread dropping a lock. Eliminating the need to send the ipi would improve the performance of the lock dropping path. This can be accomplished by using a monitor/mwait based mechanism...since mwait can also be used in the idle() code path to put the CPU in the C1 (or deeper) state. The sleeping CPU can monitor the runnable thread count in it's associated dispatch queue...so that as soon as the awakening thread is enqueued, the sleeping CPU will come out of the C1 state. This work is tracked by RFE 6495342
  • New instruction support
    • SSE3
      • x87 floating point to integer conversion (fisttp)
      • Complex arithmetic (addsubps, addsubpd, movsldup, movshdup, movddup)
      • Video encoding (lddqu)
      • Graphics (haddps, hsubps, haddpd, hsubpd)
      • Thread synchronization (monitor, mwait)
    • SSSE3
  • Kernel architecture primitive operations
    • Page copy
    • Page clear
    • bcopy
  • Library performance optimization
    • C-runtime library
    • libm
    • HPC
    • ACML
    • MPI

Details

Performance work is being tracked with bug reports (also known as CR "Change Request").
List of CRs sorted by priority with links to public CR information, owners, and status:

Related previous work

Previously completed related work above CRs may build on: