Προς το περιεχόμενο

G92 : First rumours/speculations


littlelouievegos

Προτεινόμενες αναρτήσεις

Δημοσ.
*nods*

 

Τhat's a start :P

 

Uhmmm there's no yes or no answer here...:P
K then it's not like I care about the transistor count that much.. :lol:

 

I beg your pardon?
Semantics ftw.. :razz: By now I'm sure there's no 10.1 support from what I've heard (and you seem to verify that ) , but possibly they could have been talking about the revision of SM4.0.. Pipes diladi kai edo.. :lol:

 

Yeahrightsureok heh...Tesla != gaming GPU and no one asked how many cycles for it pffffft :roll:
...and so far we got bastards catering for that? LOL
Meaning?

 

Errrr....under circumstances yes. By the way even a 2*8800GTX@SLi delivers 1024 GFLOPs in shading power on paper. Kindly learn to separate marketing bullshit from real world applicable numbers.
Makes perfect sense..

 

3 FLOPs yes; the rest I'll leave to your imagination.
Is MUL available for general shading in every case? Are there any limitations? :P

 

In that sense G80 was "full scalar" too.
Agreed

 

Hahahahahahaha!
Ok makes sense again..

 

No (if that's per chip heh).
All I see is GDDR4 nothing more in quantative terms then? :P

 

 

No.
No.
Probably the most dissapointing noes I've heard for a long time.. I was suspecting it though.. :lol:

 

HAHAHAHAHAHAAAAAAAAAAAAAAAAAAAAAAAAAAA
Noobish question here..Isn't free 4xAA a requirement for 10.1 or am I mistaken? But then again you made it quite clear.. No 10.1 :P

 

 

G100 in =/>2009
Crap.. And again crap.. :(

 

 

With the increase of bandwidth I wouldn't be surprised if we'd see higher coverage sampling modes.

 

Interesting..

 

On a single G92 expect something above 1/2.
~50% and that in very selected case scenarios.
That's a realistic assessment in my eyes.. Didn't expect anything more tbh..

 

SLi x2 or x3 my friend and I've already said too much and I blame a couple of more beers I had :P:P:P
Να κερασω αλλη μια μπυριτσα τωρα που γυριζει μπας και δουμε τπτ περισσοτερο? :lol:

Περα απο την πλακα thanx.. Αν και mas ekanes thn kardia perivoli ειδικα με το tessellation unit..

Thanx ;)

  • Απαντ. 56
  • Δημ.
  • Τελ. απάντηση
Δημοσ.

Mάριε διάβασες καθόλου τί έλεγαν τα παιδιά στα προηγούμενα post (Ailuros & littlelouievegos) ? ... Από πού ακριβώς το συμπέρανες αυτό ?

Δημοσ.

Semantics ftw.. :razz: By now I'm sure there's no 10.1 support from what I've heard (and you seem to verify that ) , but possibly they could have been talking about the revision of SM4.0.. Pipes diladi kai edo.. :lol:

 

Let's not waste transistors for bullshit ;)

 

Meaning?

 

What's "native" anyway when it comes to GPGPU processing? It'll be as "native" as it is on G80, ie done on chip. Frankly I don't get what the bullshit really means, but someone that suggests eDRAM on a gaming GPU is insane anyway *snicker*

 

Is MUL available for general shading in every case? Are there any limitations? :P

 

R600 as much as G80 have to somewhere carry out Interpolation and/or SFs. In the case of G80 SFs are done in the serial MUL unit, meaning of course that in select cases you can use some persentage of them for general shading. Likewise it doesn't make the slightest difference if in a future GPU you have hypothetically MADD+MADD, MADD+ADD or MADD+MUL; the same "rule" applies for those cases too. Marketing will count of course the maximum theoretical throughput, but in a real game you have all above aspects nagging on that value as much as other functions as texturing and Lord knows what else. The reality behind it is that if you count only the 2 FLOPs on G80 you get something in the league of 346 GFLOPs + some small bonus from the MUL unit. In G9x you can fairly double that real rate and that's the reality behind it ;)

 

 

All I see is GDDR4 nothing more in quantative terms then? :P

 

Someone has been ordering at Samsung with 60% higher frequencies than your current GPU has :P

 

Probably the most dissapointing noes I've heard for a long time.. I was suspecting it though.. :lol:

 

What for f*cks sake are you going to do with a programmable tesselation unit, when D3D10 is NOT going to support it prior to 2009 and that's even optimistic? Get it in your head advanced HOS and higher level procedural geometry = D3D11 and not earlier.

 

Noobish question here..Isn't free 4xAA a requirement for 10.1 or am I mistaken? But then again you made it quite clear.. No 10.1 :P

 

There was never such a thing as "free" 4xAA and there never will be. Unless you're abysmally CPU bound that is.

 

Xenos claims supposed "free 4xMSAA" yet you might want to ask a developer how "free" it really is once he has to use macro tiling to fit all the shiznit into the eDRAM framebuffer. Beyond3D said yeard ago:

 

Tiling mechanisms can operate in a number of ways. With immediate mode rendering (i.e. the pixels being rendered are for the same frame as the geometry being sent) it is never known what pixels the geometry is going to be mapped to when the commands begin processing. This is not known until all the vertex processing is complete, setup has occurred and each primitive is scan converted. So if you wanted to tile the screen with an immediate mode rendering system, the geometry may need to be processed, setup and then discarded if it is found not to relate to pixels that are to be rendered in the current buffer space. The net result here is that geometry needs to be recalculated multiple times for each of the buffers. Another method for tiling would be to use Tile Based Deferred Rendering which processes the geometry and "bins" it into graphics RAM, saving which render "tile" the geometry affects as it does so - these mechanisms have traditionally operated by deferring the actual rendering by a frame in order to parallelise the geometry processing / binning and the rendering (you may wish to take a refresher on PowerVR's tile based deferred rendering process in our article here).

 

http://www.beyond3d.com/content/articles/4/5

 

In order to fit 4xMSAA into the 10MiB eDRAM module of Xenos with a 720p resolution one would need 3 macro tiles. Oversimplyfied you'd need then though to recalculate geometry probably 3x times, which is miles away from what I call "free".

 

A true tile based deferred renderer on the other hand won't face such problems with macro or micro tiling when handling geometry. Granted it'll have to have clever algorithms to fill and empty constantly the display list to keep it from fludding, but it's not as memory consuming by far as any form of AA is on an IMR.

 

Coverage sampling on G80 is roughly as "free" as MSAA would be on a typical well balanced deferred renderer, always only in terms of bandwidth and memory footprint consumption and NOT performance. That's the part where "free MSAA" as a term has been misunderstood for years. In that term 4xMSAA is nearly "for free" on G8x already.

 

Uhhhmm and no I haven't seen anything in D3D10.1 that suggests or comes even close to something like "free MSAA":

 

http://www.insomnia.gr/vb3/showpost.php?p=1741573&postcount=69

 

Η τρίτη τσατσαριά κατά σειρά το μόνο που προτείνει είναι ένας πίνακας από διάφορες τοποθετήσεις δειγμάτων για δημιουργούς.

 

Ordered grid είναι κάπως έτσι:

 

----x-----x-----

----------------

----------------

----x-----x----

 

Sparsed grid:

 

--x------------

-----------x---

-----x---------

---------x-----

 

Now make a table of X different sample positions for sparsed grid to find the optimal positioning for each case and you have what D3D10.1 is actually requiring. Above sample placement is quite crappy, but imagine how many combinations you can make. Clear enough?

 

Στην θεωρία της εξομάλυνσης η δεύτερη μέθοδος είναι γενικώς καλύτερη γιατί αν τραβήξεις νοητές οριζόντιες και κάθετες γραμμές από κάθε δείγμα θα πετύχεις 4 επαφές ανά δείγμα ανά άξονα (Χ/Υ). Από αυτό βγαίνει και το 4*4 edge equivalent resolution για το sparsed grid. To ιδανικό είναι να έχεις Ν "επαφές" με κάθε άξονα για Ν αριθμό δειγμάτων εξομάλυνσης.

 

Με ένα πίνακα με διαφορετικές τοποθετήσεις δειγμάτων ο δημιουργός παιχνιδιών μπορεί να επιλέξει ότι του ταιριάζει καλύτερα για περίπτωση Α & Β. Το ότι θα ήταν προτιμότερο να ήταν πλήρως προγραμματιζόμενοι οι εν λόγω αλγόρυθμοι το ανέφερα ήδη στην άλλη ενότητα ;)

 

 

That's a realistic assessment in my eyes.. Didn't expect anything more tbh..

 

+50% doesn't sound as good as twice as much; despite it being the exact same thing ROFL ;)

 

Να κερασω αλλη μια μπυριτσα τωρα που γυριζει μπας και δουμε τπτ περισσοτερο? :lol:

Περα απο την πλακα thanx.. Αν και mas ekanes thn kardia perivoli ειδικα με το tessellation unit..

Thanx ;)

 

΄Εχω να δηλώσω ότι είναι καιρός για Intel τα επόμενα χρονάκια. Μην περιμένεις θαύματα, αλλά ακόμα και η NVIDIA δεν παίρνει το Larabee ελαφρά την καρδία ασχέτως τελικών δυνατοτήτων ;)

Δημοσ.
Θα μπαίνει μόνο σε PCI Express 2.0 slot?

 

Καλή ερώτηση. Η λογικότερη απάντηση θα ήταν ότι θα κάνει και για PCI_e 1.0 & 2.0.

Δημοσ.

To eipe kapoios kai parapavw:

PCI_e 2.0 is backwards compatible with PCI_e 1.0

 

Episeis, kamia shmerivh karta dev mporei va a3iopoieisei sto epakro to PCI_e 1.0 opote apo movo tou to PCI_e 2 dev 8a prosferei kapoia ousiastikh bvelitwsh. Aloste gia va ftasoume sto shmerivo epipedo ekmetalefshs tou PCI_e 1 mas peire sxedov 3-4 xrovia.

Mhv 3exvame episeis oti akoma kai to AGP dev periorizei polles apo ths uparxouses kartes.

 

Gevika ka8e peripou 4 xrovia 8a blepoume kai vea ekdosh PCI_e kai pi8avov ta 4 xrovia va eivai gevikoteros kuklos kai gia alles texvologies DDR/RAM, SATA, etc.

Δημοσ.

Episeis, kamia shmerivh karta dev mporei va a3iopoieisei sto epakro to PCI_e 1.0 opote apo movo tou to PCI_e 2 dev 8a prosferei kapoia ousiastikh bvelitwsh.

 

Αν εννοείς το επιπλέον εύρος του PCI-e τότε όντως έχει μείνει ανεκμετάλλευτο. Το ρεύμα όμως που τραβάει τo PCI-e στα 75W (έναντι των 45W του AGP) αξιοποιείται στο έπακρο. Μια 8800GTX έχει 2* 6-πινες συνδέσεις από το τροφοδοτικό. Αν με την κατανάλωση που έχει υπήρχε σε AGP έκδοση θεωρητικά θα χρειαζόταν ένα χταπόδι από το τροφοδοτικό.

 

To PCI_e 2.0 εμφανίστηκε κυρίως για να αυξηθεί η διοχέτευση ρεύματος μέσω του διαύλου ακόμα κατά ένα ποσοστό. ΄Οποιος θέλει να υπερχρονίσει μια 2900XT θα το χαιρόταν ιδιαίτερα αν είχε ήδη μητρική με PCI_e 2.0.

 

Aloste gia va ftasoume sto shmerivo epipedo ekmetalefshs tou PCI_e 1 mas peire sxedov 3-4 xrovia.

Mhv 3exvame episeis oti akoma kai to AGP dev periorizei polles apo ths uparxouses kartes.

 

Δεν τις περιορίζει γιατί κανένας δημιουργός παιχνιδιών δεν έχει διάθεση να εκμεταλλευτεί τον αμφίδρομο δίαυλο του PCI-e, γιατί θα αδικούσε κάτα κόρο αρκετούς χρήστες AGP που υπάρχουν ακόμα.

 

Για φαντάσου τι "φανταστική ταχύτητα" θα είχαν σημερινά παιχνίδια αν στέλνανε δεδομένα από και προς τον δίαυλο σε ένα σύστημα AGP. ΄Οποτε τελείωνε η onboard μνήμη της κάρτας γραφικών θα κάνανε πανηγύρια τα spf (seconds per frame).

Δημοσ.
Για φαντάσου τι "φανταστική ταχύτητα" θα είχαν σημερινά παιχνίδια αν στέλνανε δεδομένα από και προς τον δίαυλο σε ένα σύστημα AGP. ΄Οποτε τελείωνε η onboard μνήμη της κάρτας γραφικών θα κάνανε πανηγύρια τα spf (seconds per frame).

 

Παλαιότερα γινότανε αυτό στις AGP κατά ένα μέρος, μόνο τα δεδομένα των texture "ανεβαίνανε" στην RAM του συστήματος (AGP texturing ή αλλιώς Direct Memory Execute). Ήταν βασικό χαρακτηριστικό του AGP.

 

Direct Memory Execute (Abbreviated as DIME; also called AGP texturing) is an important feature of AGP. It allows video cards to access your systems main memory for texture mapping rather than pre-loading the texture data to the graphics card's memory.
Δημοσ.
To eipe kapoios kai parapavw:

PCI_e 2.0 is backwards compatible with PCI_e 1.0

 

Aυτό νομίζω σημαίνει οτι μια μητρική με PCI-E 2.0 slot θα δέχεται και κάρτες που μπαίνουν σε PCI-E 1.0

 

Θα γίνεται όμως μια κάρτα γραφικών PCI-E 2.0 να μπει σε PCI-E 1.0 slot? Κομματάκι δύσκολο μου φαίνεται..:-?

Δημοσ.
Παλαιότερα γινότανε αυτό στις AGP κατά ένα μέρος, μόνο τα δεδομένα των texture "ανεβαίνανε" στην RAM του συστήματος (AGP texturing ή αλλιώς Direct Memory Execut). Ήταν βασικό χαρακτηριστικό του AGP.

 

Για πες μας τώρα και για πιθανή επιστροφή δεδομένων....LOL :D

Δημοσ.

Δεν σε κατάλαβα;

Μιλάς για το θέμα της AGP που δεν είχε δυνατότητα full duplex αποστολής/λήψης δεδομένων όπως η PCI-e;

Όπως και να έχει σαν τεχνική το AGP texturing έχει εγκαταλείφθει πολύ πριν βγει το PCI-e. Άλλωστε λόγω έυρους διαύλου η απόδοση έπεφτε όπως σωστά επισημαίνεις και εσύ παραπάνω σχετικά με την πρόσβαση της VGA στην σαφώς πιο αργή RAM που καθυστερείται περαιτέρω λόγω και του διαυλου.

 

AGP modes

AGP does allow for more than just additional throughput, however. There are a variety of different operation modes supported with AGP, each of which can provide performance advantages. DIME (Direct Memory Execute), otherwise known as AGP texturing, allows the hardware to use system memory for texture storage purposes. This, in effect, gives the graphics card a memory limitation only set by the maximum available system memory or AGP aperture size, depending on which is greater. Traditionally, when a texture must be fetched for a frame, it is accessed from system memory, written to local memory, and then accessed by the graphics chip for the scene.

 

With AGP texturing, the chip reads the texture data directly from system memory, removing the need to write it to local memory. Unfortunately, current AGP bandwidth is still very low and this can actually cause a performance slow down, depending on the quality of the accelerator's texture management and the amount of data being fetched across the bus. This slow down can often be seen when a new room or level is entered and there are moments of slow performance, such as a stutter.

 

Δεν κατάλαβα ακριβώς τι εννοείς πάντως...

Δημοσ.

Να γραφαν και κανα παιχνιδι που πραγματικα να εκμεταλευεται σωστα ολη αυτη την υπολογιστικη ισχυ καλα θα ηταν ..

Δημοσ.
R600 as much as G80 have to somewhere carry out Interpolation and/or SFs. In the case of G80 SFs are done in the serial MUL unit, meaning of course that in select cases you can use some persentage of them for general shading. Likewise it doesn't make the slightest difference if in a future GPU you have hypothetically MADD+MADD, MADD+ADD or MADD+MUL; the same "rule" applies for those cases too. Marketing will count of course the maximum theoretical throughput, but in a real game you have all above aspects nagging on that value as much as other functions as texturing and Lord knows what else. The reality behind it is that if you count only the 2 FLOPs on G80 you get something in the league of 346 GFLOPs + some small bonus from the MUL unit. In G9x you can fairly double that real rate and that's the reality behind it ;)

 

Understood about the interpolation part..

MADD+MADD? :shock: If that's the case that explains a lot..Besides that, what about a higher ALU/TEX ratio?

 

Someone has been ordering at Samsung with 60% higher frequencies than your current GPU has :P
Got the message :P

 

What for f*cks sake are you going to do with a programmable tesselation unit, when D3D10 is NOT going to support it prior to 2009 and that's even optimistic? Get it in your head advanced HOS and higher level procedural geometry = D3D11 and not earlier.
D3D11? What about surface tessellation ?

 

XNA presentation

Primary motivator is amplification of animation/morph targets/deformation models

Enables providing some data to GPU at coarser resolution

Everything stays on GPU if possible

Displacement mapped surfaces become first class primitives

 

There was never such a thing as "free" 4xAA and there never will be. Unless you're abysmally CPU bound that is.

 

Xenos claims supposed "free 4xMSAA" yet you might want to ask a developer how "free" it really is once he has to use macro tiling to fit all the shiznit into the eDRAM framebuffer. Beyond3D said yeard ago:

 

 

 

http://www.beyond3d.com/content/articles/4/5

 

In order to fit 4xMSAA into the 10MiB eDRAM module of Xenos with a 720p resolution one would need 3 macro tiles. Oversimplyfied you'd need then though to recalculate geometry probably 3x times, which is miles away from what I call "free".

 

A true tile based deferred renderer on the other hand won't face such problems with macro or micro tiling when handling geometry. Granted it'll have to have clever algorithms to fill and empty constantly the display list to keep it from fludding, but it's not as memory consuming by far as any form of AA is on an IMR.

 

Coverage sampling on G80 is roughly as "free" as MSAA would be on a typical well balanced deferred renderer, always only in terms of bandwidth and memory footprint consumption and NOT performance. That's the part where "free MSAA" as a term has been misunderstood for years. In that term 4xMSAA is nearly "for free" on G8x already.

 

Uhhhmm and no I haven't seen anything in D3D10.1 that suggests or comes even close to something like "free MSAA":

 

http://www.insomnia.gr/vb3/showpost.php?p=1741573&postcount=69

 

Η τρίτη τσατσαριά κατά σειρά το μόνο που προτείνει είναι ένας πίνακας από διάφορες τοποθετήσεις δειγμάτων για δημιουργούς.

 

Ordered grid είναι κάπως έτσι:

 

----x-----x-----

----------------

----------------

----x-----x----

 

Sparsed grid:

 

--x------------

-----------x---

-----x---------

---------x-----

 

Now make a table of X different sample positions for sparsed grid to find the optimal positioning for each case and you have what D3D10.1 is actually requiring. Above sample placement is quite crappy, but imagine how many combinations you can make. Clear enough?

 

Στην θεωρία της εξομάλυνσης η δεύτερη μέθοδος είναι γενικώς καλύτερη γιατί αν τραβήξεις νοητές οριζόντιες και κάθετες γραμμές από κάθε δείγμα θα πετύχεις 4 επαφές ανά δείγμα ανά άξονα (Χ/Υ). Από αυτό βγαίνει και το 4*4 edge equivalent resolution για το sparsed grid. To ιδανικό είναι να έχεις Ν "επαφές" με κάθε άξονα για Ν αριθμό δειγμάτων εξομάλυνσης.

 

Με ένα πίνακα με διαφορετικές τοποθετήσεις δειγμάτων ο δημιουργός παιχνιδιών μπορεί να επιλέξει ότι του ταιριάζει καλύτερα για περίπτωση Α & Β. Το ότι θα ήταν προτιμότερο να ήταν πλήρως προγραμματιζόμενοι οι εν λόγω αλγόρυθμοι το ανέφερα ήδη στην άλλη ενότητα ;)

Δεν μιλουσα για την αρχιτεκτονικη του Xenos..Και δεν διαφωνω οτι οι παροντες αλγοριθμοι στο G80 kανουν την δουλεια εξισου καλα αν οχι καλυτερα..

Βασικα μιλουσα για αυτο εδω

(Slide 6-7)..

 

Min 4x MSAA required on all hardware

Shader access to sample positions

Knows where samples will appear on screen

Shader controls which samples emitted

Explicit setting of coverage mask

Interpolation can be set to

[Sample -default –may go outside poly

Centroid -snaps point back into triangle

Center –for micropolygons

Separate per-MRT blend modes

Each of the eight 4-channel surfaces can have a separate blending operation applied

Usage: deferred shading

Int16 blending

Int8, float16 and float32 already in DX10

 

Δεν διαφωνω οτι δεν κανει τρομερα την δουλεια του σε tiling αυτη την στιγμη το υπαρχoν hardware απλα λεω οτι ισως η cache να βοηθησει με το blending performance περισσοτερο;

Αρχειοθετημένο

Αυτό το θέμα έχει αρχειοθετηθεί και είναι κλειστό για περαιτέρω απαντήσεις.

  • Δημιουργία νέου...