A Wider Back End

Moving beyond the micro-op queue, Tremont has an 8 execution ports, filled from 7 reservation stations.

The only two ports using a combined reservation station are the address generator units (AGUs) - this is in stark contrast to the Core design, which in Sunny Cove uses a unified reservation for all integer and floating point calculations and three for the AGUs. The reason that Tremont uses a unified reservation station for the two AGUs, also backed by extra memory for queued micro-ops, is in order to supply both AGUs with either 2x 16-byte stores, 2x 16-byte loads, or one of each. Intel clearly expects the AGUs on Tremont to be fairly active compared to other execution ports.

On the integer side, aside from the two AGUs, Tremont has 3 ALUs, a jump port, and a store data port. Each ALU supports different functions, with one enabling shift functions and another for multiplication and division. Compared to core, these ALUs are extremely lightweight, and Intel hasn’t gone into specifics here.

 

On the floating point side, we are a little bit more varied – the three ports are split between two ALUs and a store port. The two ALUs have one focused on fused additions (FADD), while the other focuses on fused multiplication and division (FMUL). Both ALUs support 128-bit SIMD and 128-bit AES instructions with a 4-cycle latency, as well as single instruction SHA256 at 4-cycles. There is no 256-bit vector support here. In order to help with certain calculations, GFNI instruction support is included.

There is also a larger 1024-entry L2 TLB, supporting 1024x 4K entries, 32x 2M entries, or 8x 1G entries. This is an upgrade from the 512-entry L2 TLB in Goldmont.

New Instructions

As with any generation, Intel adds new supported instructions to either accelerate common calculations that would traditionally require lots of instructions or to add new functionality. Tremont is no different.

TITLE
AnandTech Tremont Goldmont
Plus
Goldmont Airmont Silvermont
Process 10+ 14 14 14 22
Release Year 2019 2017 2016 2015 2013
New Instructions CLWB
GFNI
ENCLV
CLDEMOTE
MOVDIR*
TPAUSE
UMONITOR
UWAIT
SGX1
UMIP
PTWRITE
RDPID
RDSEED
SMAP
MPX
XSAVEC
XSAVES
CLFLUSHOPT
SHA
  SSE4.1
SSE4.2
MOVBE
CRC32
POPCNT
CLMUL
AES
RDRAND
PREFETCHW

(When asked what other new instructions are supported, Intel stated to look at the published documents about future instructions. When it was pointed out that those documents weren’t exactly clear and that in the past Intel hasn’t spoken about future designs, we were not afforded additional comments.)

When we get hold of a Tremont device, we’ll do a full instruction breakdown.

Tremont: A Wider Front End and Caches Beyond The Core, Conclusions
Comments Locked

101 Comments

View All Comments

  • 29a - Thursday, October 24, 2019 - link

    Did Atom processors ever stop sucking?
  • solidsnake1298 - Thursday, October 24, 2019 - link

    That depends on your needs. As a HTPC, starting with Apollo Lake (Goldmont) the iGPU was upgraded sufficiently that it can decode 4K HEVC. I haven't tested 4K HEVC, personally. But I have played 1080p60 HEVC without a single dropped frame.
  • vladx - Friday, October 25, 2019 - link

    I have a Goldmont tablet, 4K HEVC works fine as long as the bitrate doesn't surpass the limits of its eMMC storage, in which case artefacts and stuttering is present. Maybe I should look into replacing it with a SSD if that's even possible.
  • qap - Friday, October 25, 2019 - link

    Even the slowest eMMC storage can do 50MB/s sequential read. There is no way, you have 400Mbps+ HEVC video (and if that is the case, Atom is obviously not for you). The limit must be somewhere else. Most likely it supports hardware HEVC decoding up to some bitrate only and you are hitting this limit.
  • vladx - Friday, October 25, 2019 - link

    400Mbps no, but I have some 100+ Mbps videos and most sit around 60 so it can definitely push the eMMC to its limits especially considering it also needs to run the OS processes at the same time.
  • s.yu - Friday, October 25, 2019 - link

    A common confusion between B and b...
  • eddman - Friday, October 25, 2019 - link

    Unless the storage is so crap that can't even sustain 12.5 MB/s (a.k.a 100 Mbps), it's probably the decoder itself that is unable to properly accelerate such high bit-rate videos.
  • nathanddrews - Sunday, October 27, 2019 - link

    Quite a few eMMC implementations run off a USB 2.0 bus, so yes, it can bottleneck a system hard. Same thing frequently happens with networking components in devices. It will have AC/GbE, but can't reach those speeds.
  • eddman - Sunday, October 27, 2019 - link

    Even a USB 2.0 eMMC should be able to sustain a 12-13 MB/s sequential read.

    It has to be the decoder. He doesn't know the difference between bit and byte and thinks 60 Mbps is too much for 50 MB/s.
  • eek2121 - Monday, October 28, 2019 - link

    Not if the bus is shared with 2 network controllers, a bluetooth controller, etc. I haven't looked at how Atom is set up admittedly, but that is one of the major issues with SBCs. Everything hangs off the USB 2.0 bus. The USB 2.0 bus also can't really maintain true USB 2.0 speeds in quite a few cases due to hitting micro-usb power limits.

Log in

Don't have an account? Sign up now