A Cortex-A57 processor-based smartphone, wirelessly connected to a screen, keyboard and mouse, delivers a full laptop experience that consumers receive from their typical laptop today.
The Cortex-A57 processor:
- Can deliver all the compute capability a typical consumer needs, from replacing your gaming console to your laptop in innovative portable form factors
- Efficiently runs legacy ARM 32-bit applications
- Features cache coherent interoperability with ARM Mali™ family graphics processing units (GPUs) for GPU compute applications
- Offers optional reliability and scalability features for high-performance enterprise applications
- Connects seamlessly to ARM interconnect IP with up to 16 core configurations with more in the future
|Debug & Trace||CoreSight™ DK-A57|
|Cortex-A57 Architectural Features|
|ARMv8 architecture||64 and 32-bit execution states for scalable high performance||Yes||Yes|
|Hardware-accelerated cryptography||3x-10x better software encryption performance Useful for small granule decrypt/encrypt too small to efficiently offload to HW accelerator (e.g. https)||Yes||Yes|
|Floating Point||Hardware support for floating point operations in half-, single- and double-precision floating point arithmetic.Now with IEE754-2008 enhancements||Yes||Yes|
|Load Acquire, Store Release instructions||Designed for C++11, C11, Java memory models. Improves performance of thread-safe code by eliminating explicit memory barrier instructions||Yes||Yes|
|Hardware Virtualization||Enables multiple software environments and their applications to simultaneously access the system capabilities||Yes||Yes|
|Large Physical Address Reach||Enables the processor to access beyond 4GB of physical memory.||Yes||Yes|
|Automatic event signalling||For power-efficient, high-performance spinlocks||Yes||Yes|
|Double Precision Floating Point SIMD||Allows SIMD vectorisation to be applied to a much wider set of algorithms (e.g. scientific / High Performance Computing (HPC) and supercomputer)||No||Yes|
|64b Virtual address reach||Enables virtual memory beyond 4GB 32b limit. Important for modern desktop and server software using memory mapped file i/o, sparse addressing.||No||Yes|
|Larger register files||31 x 64-bit general purpose registers: increases performance, reduces stack use. Fewer stack spills, enabling more aggressive compilers. SIMD usable for more applications, e.g. HPC||No||Yes|
|Efficient 64-bit immediate generation||Less need for literal pools||No||Yes|
|Large PC-relative addressing range||(+/-4GB) for efficient data addressing within shared libraries and position-independent executables||No||Yes|
|64k pages||Reduce TLB miss rates and depth of page walks||No||Yes|
|New exception model||Reduces OS and Hypervisor software complexity||No||Yes|
|Enhanced Cache management||User space cache operations improve dynamic code generation efficiency, Data Cache Zero for fast clear||No||Yes|
|Cortex-A57 Microachitectural Features|
|Deeply Out of Order Pipeline||Increased actual instruction throughput in broader range of scenarios; in cases where instructions are blocked on a dependency the processor can look for other instructions to run. Full out-of-order scheduling on all execution paths allows more types of instructions to be re-ordered, keeping the back end of the pipeline full more of the time. Support for high-bandwidth out of order back-end, 128 in-flight instructions, instruction-result handling optimized for 32b/64b operands|
|Wide multi-issue capability||Increased peak instruction throughput via duplication of execution resources. Power-optimized instruction decode with localized decoding, 3-wide decode bandwidth High-capacity register renaming provides 3-wide, large-instruction rename bandwidth. Support for 8 issue slots and up to 128 instructions in flight|
|16-way associative, banked L2 cache||Performance optimized L2 cache design allows more than one CPU in the cluster to access the L2 at the same time. Sophisticated per-core hardware prefetch units improve memory loads into L2. A balanced design approach allows reduced latency and lower power in the L2 subsystem.|
|1024 entry mail TLB||Improved performance on code with complex memory access patterns, e.g. web browsing.|
|Large uTLBs||48 entry I-side uTLB allows large set of pages to be handled very quickly by the memory management unit. 32-entry fully-associative D-TLBs (with large-page support) are more responsive to modern memory access patterns.|
|Advanced Branch Predictor||2K-4K Branch Target Buffer (BTB) with zero-cycle taken-branch penalty minimizes pipeline flushes. Sophisticated indirect-predictor w/ path-history increases branch hit rate. Dedicated branch resolution unit enables fully out of order branch execution. Also includes a high-performance mispredict-recovery microarchitecture.|
|Optimized D-Size memory system||Sophisticated multi-stream L1 hardware prefetcher, exhaustive store/data-forwarding capabilities increase data throughput to the main datapath.|
|Extensive power-saving features||Way-prediction, tag-reduction, cache-lookup suppression, and other features minimize dynamic power.|
|Advanced MultiCore Features|
|The processor also utilizes the widely established ARM MPCore multicore technology, enabling performance scalability and control over power consumption to exceed the performance of today's comparable high-performance devices while remaining within tight mobile power constraints. Multicore processing provides the ability for any of the four component processors, within a cluster, to shut down when not in use, for instance when the device is in standby mode, to save power. When higher performance is required, every processor is in use to meet the demand while still sharing the workload to keep power consumption as low as possible.|
|Snoop Control Unit||The SCU is responsible for managing the interconnect, arbitration, communication, cache-2-cache and system memory transfers, cache coherence and other capabilities for the processor. The Cortex-A57 processor also exposes these capabilities to other system accelerators and non-cached DMA driven peripherals to increase performance and reduce system wide power consumption. This system coherence also reduces software complexity involved maintaining software coherence within each OS driver.|
|Accelerator Coherence Port||This AMBA 4 AXI™ compatible slave interface on the SCU provides an interconnect point for masters that are interfaced directly with the Cortex-A15 processor. This interface supports all standard read and write transactions without additional coherence requirements. However, any read transactions to a coherent region of memory will interact with the SCU to test whether the information is already stored in the L1 caches. The SCU will enforce write coherence before the write is forwarded to the memory system and may allocate into the L2 cache, removing the power and performance impact of writing directly to off chip memory|
|Generic Interrupt Controller||Implementing the standardized and architected interrupt controller, the GIC provides a rich and flexible approach to inter-processor communication and the routing and prioritization of system interrupts. Supporting up to 224 independent interrupts, under software control, each interrupt can be distributed across CPU, hardware prioritized, and routed between the operating system and TrustZone software management layer. This routing flexibility and the support for virtualization of interrupts into the operating system, provides one of the key features required to enhance the capabilities of a solution utilizing a hypervisor.|
The ARM CoreLink™ interconnect and memory controller system IP addresses the critical challenge of efficiently moving and storing data between up to 16 Cortex-A series processors, high performance media processors and dynamic memories to optimize the system performance and power consumption of the SoC. The CoreLink system IP enables SoC designers to maximize the utilization of system memory bandwidth and reduce static and dynamic latencies. While the ARM CoreSight technology provides complete on-chip debug and correlated, real-time trace visibility for all cores of the Cortex-A57 MPCore processor, reducing risk and speeding development of high quality multiprocessing software. The new ARM CoreLink CCN-504 Cache Coherent Network provides optimum system bandwidth and latency. The CCN-504 provides AMBA 4 AXI™ Coherency Extensions (ACE) compliant ports for full coherency between multiple Cortex-A series processors, better utilizing caches and simplifying software development. This feature is essential for high bandwidth applications including gaming, servers and networking that require clusters of coherent single and multicore processors. Combined with the ARM CoreLink network interconnect and memory controller IP, the CCN increases system performance and power efficiency.
ARM Physical IP Platforms deliver process optimized IP, for best-in-class implementations of the Cortex-A57 processor at 20nm and below. A set of high performance POP™ IP containing advanced ARM Physical IP for 28nm technologies supports the Cortex-A57, to enabling rapid development of leadership physical implementations. ARM is also working early to assure a roadmap to 20nm optimizations. POP IP supports the ARM strategy of offering specifically targeted Physical IP to enable Partners to achieve tuned implementations of ARM cores. ARM is uniquely able to design the optimization packs in parallel with the Cortex-A57 MPCore processor architecture, enabling the processor and physical IP combination to deliver workstation class performance in a mobile power envelope while facilitating rapid time-to-market.
The ARM Development Suite 5 (DS-5™) for ARMv8 fully supports all ARMv8 (AArch64) processors as well as a wide range of third party tools, operating systems and EDA flows. ARM DS-5 software development tools are unique in their ability to provide solutions that take full advantage of the complete ARM technology portfolio. The ARM Development Studio 5 (DS-5™) provides a complete range of software tools to create, debug and optimize systems based on the Cortex-A15 MPCore processor. It incorporates the DS-5 Debugger, whose powerful and intuitive graphical environment enables fast debugging of bare metal, Linux and Android native applications. In addition, its new ARM Streamline™ Performance Analyzer simplifies the identification of hot spots in software and load balancing between cores. The ARM Compiler, which already includes specific optimizations for the Cortex-A57 MPCore processor, enables early software development before silicon availability and an ARM Versatile™ Reference Virtual Platform built on ARM Fast Models technology.
The Mali™ family of products combine to provide the complete graphics stack for all embedded graphics needs, enabling device manufacturers and content developers to deliver the highest quality, cutting edge graphics solutions across the broadest range of consumer devices.
ARM training courses and Active Assist on-site system-design advisory services enable licensees to integrate efficiently the Cortex-A57 MPCore processor into their design to realize maximum system performance with lowest risk and fastest time-to-market.