mirror of
https://github.com/AetherDroid/android_kernel_samsung_on5xelte.git
synced 2025-09-06 08:18:05 -04:00
Fixed MTP to work with TWRP
This commit is contained in:
commit
f6dfaef42e
50820 changed files with 20846062 additions and 0 deletions
16
Documentation/x86/x86_64/00-INDEX
Normal file
16
Documentation/x86/x86_64/00-INDEX
Normal file
|
@ -0,0 +1,16 @@
|
|||
00-INDEX
|
||||
- This file
|
||||
boot-options.txt
|
||||
- AMD64-specific boot options.
|
||||
cpu-hotplug-spec
|
||||
- Firmware support for CPU hotplug under Linux/x86-64
|
||||
fake-numa-for-cpusets
|
||||
- Using numa=fake and CPUSets for Resource Management
|
||||
kernel-stacks
|
||||
- Context-specific per-processor interrupt stacks.
|
||||
machinecheck
|
||||
- Configurable sysfs parameters for the x86-64 machine check code.
|
||||
mm.txt
|
||||
- Memory layout of x86-64 (4 level page tables, 46 bits physical).
|
||||
uefi.txt
|
||||
- Booting Linux via Unified Extensible Firmware Interface.
|
284
Documentation/x86/x86_64/boot-options.txt
Normal file
284
Documentation/x86/x86_64/boot-options.txt
Normal file
|
@ -0,0 +1,284 @@
|
|||
AMD64 specific boot options
|
||||
|
||||
There are many others (usually documented in driver documentation), but
|
||||
only the AMD64 specific ones are listed here.
|
||||
|
||||
Machine check
|
||||
|
||||
Please see Documentation/x86/x86_64/machinecheck for sysfs runtime tunables.
|
||||
|
||||
mce=off
|
||||
Disable machine check
|
||||
mce=no_cmci
|
||||
Disable CMCI(Corrected Machine Check Interrupt) that
|
||||
Intel processor supports. Usually this disablement is
|
||||
not recommended, but it might be handy if your hardware
|
||||
is misbehaving.
|
||||
Note that you'll get more problems without CMCI than with
|
||||
due to the shared banks, i.e. you might get duplicated
|
||||
error logs.
|
||||
mce=dont_log_ce
|
||||
Don't make logs for corrected errors. All events reported
|
||||
as corrected are silently cleared by OS.
|
||||
This option will be useful if you have no interest in any
|
||||
of corrected errors.
|
||||
mce=ignore_ce
|
||||
Disable features for corrected errors, e.g. polling timer
|
||||
and CMCI. All events reported as corrected are not cleared
|
||||
by OS and remained in its error banks.
|
||||
Usually this disablement is not recommended, however if
|
||||
there is an agent checking/clearing corrected errors
|
||||
(e.g. BIOS or hardware monitoring applications), conflicting
|
||||
with OS's error handling, and you cannot deactivate the agent,
|
||||
then this option will be a help.
|
||||
mce=bootlog
|
||||
Enable logging of machine checks left over from booting.
|
||||
Disabled by default on AMD because some BIOS leave bogus ones.
|
||||
If your BIOS doesn't do that it's a good idea to enable though
|
||||
to make sure you log even machine check events that result
|
||||
in a reboot. On Intel systems it is enabled by default.
|
||||
mce=nobootlog
|
||||
Disable boot machine check logging.
|
||||
mce=tolerancelevel[,monarchtimeout] (number,number)
|
||||
tolerance levels:
|
||||
0: always panic on uncorrected errors, log corrected errors
|
||||
1: panic or SIGBUS on uncorrected errors, log corrected errors
|
||||
2: SIGBUS or log uncorrected errors, log corrected errors
|
||||
3: never panic or SIGBUS, log all errors (for testing only)
|
||||
Default is 1
|
||||
Can be also set using sysfs which is preferable.
|
||||
monarchtimeout:
|
||||
Sets the time in us to wait for other CPUs on machine checks. 0
|
||||
to disable.
|
||||
mce=bios_cmci_threshold
|
||||
Don't overwrite the bios-set CMCI threshold. This boot option
|
||||
prevents Linux from overwriting the CMCI threshold set by the
|
||||
bios. Without this option, Linux always sets the CMCI
|
||||
threshold to 1. Enabling this may make memory predictive failure
|
||||
analysis less effective if the bios sets thresholds for memory
|
||||
errors since we will not see details for all errors.
|
||||
|
||||
nomce (for compatibility with i386): same as mce=off
|
||||
|
||||
Everything else is in sysfs now.
|
||||
|
||||
APICs
|
||||
|
||||
apic Use IO-APIC. Default
|
||||
|
||||
noapic Don't use the IO-APIC.
|
||||
|
||||
disableapic Don't use the local APIC
|
||||
|
||||
nolapic Don't use the local APIC (alias for i386 compatibility)
|
||||
|
||||
pirq=... See Documentation/x86/i386/IO-APIC.txt
|
||||
|
||||
noapictimer Don't set up the APIC timer
|
||||
|
||||
no_timer_check Don't check the IO-APIC timer. This can work around
|
||||
problems with incorrect timer initialization on some boards.
|
||||
apicpmtimer
|
||||
Do APIC timer calibration using the pmtimer. Implies
|
||||
apicmaintimer. Useful when your PIT timer is totally
|
||||
broken.
|
||||
|
||||
Timing
|
||||
|
||||
notsc
|
||||
Don't use the CPU time stamp counter to read the wall time.
|
||||
This can be used to work around timing problems on multiprocessor systems
|
||||
with not properly synchronized CPUs.
|
||||
|
||||
nohpet
|
||||
Don't use the HPET timer.
|
||||
|
||||
Idle loop
|
||||
|
||||
idle=poll
|
||||
Don't do power saving in the idle loop using HLT, but poll for rescheduling
|
||||
event. This will make the CPUs eat a lot more power, but may be useful
|
||||
to get slightly better performance in multiprocessor benchmarks. It also
|
||||
makes some profiling using performance counters more accurate.
|
||||
Please note that on systems with MONITOR/MWAIT support (like Intel EM64T
|
||||
CPUs) this option has no performance advantage over the normal idle loop.
|
||||
It may also interact badly with hyperthreading.
|
||||
|
||||
Rebooting
|
||||
|
||||
reboot=b[ios] | t[riple] | k[bd] | a[cpi] | e[fi] [, [w]arm | [c]old]
|
||||
bios Use the CPU reboot vector for warm reset
|
||||
warm Don't set the cold reboot flag
|
||||
cold Set the cold reboot flag
|
||||
triple Force a triple fault (init)
|
||||
kbd Use the keyboard controller. cold reset (default)
|
||||
acpi Use the ACPI RESET_REG in the FADT. If ACPI is not configured or the
|
||||
ACPI reset does not work, the reboot path attempts the reset using
|
||||
the keyboard controller.
|
||||
efi Use efi reset_system runtime service. If EFI is not configured or the
|
||||
EFI reset does not work, the reboot path attempts the reset using
|
||||
the keyboard controller.
|
||||
|
||||
Using warm reset will be much faster especially on big memory
|
||||
systems because the BIOS will not go through the memory check.
|
||||
Disadvantage is that not all hardware will be completely reinitialized
|
||||
on reboot so there may be boot problems on some systems.
|
||||
|
||||
reboot=force
|
||||
|
||||
Don't stop other CPUs on reboot. This can make reboot more reliable
|
||||
in some cases.
|
||||
|
||||
Non Executable Mappings
|
||||
|
||||
noexec=on|off
|
||||
|
||||
on Enable(default)
|
||||
off Disable
|
||||
|
||||
NUMA
|
||||
|
||||
numa=off Only set up a single NUMA node spanning all memory.
|
||||
|
||||
numa=noacpi Don't parse the SRAT table for NUMA setup
|
||||
|
||||
numa=fake=<size>[MG]
|
||||
If given as a memory unit, fills all system RAM with nodes of
|
||||
size interleaved over physical nodes.
|
||||
|
||||
numa=fake=<N>
|
||||
If given as an integer, fills all system RAM with N fake nodes
|
||||
interleaved over physical nodes.
|
||||
|
||||
ACPI
|
||||
|
||||
acpi=off Don't enable ACPI
|
||||
acpi=ht Use ACPI boot table parsing, but don't enable ACPI
|
||||
interpreter
|
||||
acpi=force Force ACPI on (currently not needed)
|
||||
|
||||
acpi=strict Disable out of spec ACPI workarounds.
|
||||
|
||||
acpi_sci={edge,level,high,low} Set up ACPI SCI interrupt.
|
||||
|
||||
acpi=noirq Don't route interrupts
|
||||
|
||||
acpi=nocmcff Disable firmware first mode for corrected errors. This
|
||||
disables parsing the HEST CMC error source to check if
|
||||
firmware has set the FF flag. This may result in
|
||||
duplicate corrected error reports.
|
||||
|
||||
PCI
|
||||
|
||||
pci=off Don't use PCI
|
||||
pci=conf1 Use conf1 access.
|
||||
pci=conf2 Use conf2 access.
|
||||
pci=rom Assign ROMs.
|
||||
pci=assign-busses Assign busses
|
||||
pci=irqmask=MASK Set PCI interrupt mask to MASK
|
||||
pci=lastbus=NUMBER Scan up to NUMBER busses, no matter what the mptable says.
|
||||
pci=noacpi Don't use ACPI to set up PCI interrupt routing.
|
||||
|
||||
IOMMU (input/output memory management unit)
|
||||
|
||||
Currently four x86-64 PCI-DMA mapping implementations exist:
|
||||
|
||||
1. <arch/x86_64/kernel/pci-nommu.c>: use no hardware/software IOMMU at all
|
||||
(e.g. because you have < 3 GB memory).
|
||||
Kernel boot message: "PCI-DMA: Disabling IOMMU"
|
||||
|
||||
2. <arch/x86/kernel/amd_gart_64.c>: AMD GART based hardware IOMMU.
|
||||
Kernel boot message: "PCI-DMA: using GART IOMMU"
|
||||
|
||||
3. <arch/x86_64/kernel/pci-swiotlb.c> : Software IOMMU implementation. Used
|
||||
e.g. if there is no hardware IOMMU in the system and it is need because
|
||||
you have >3GB memory or told the kernel to us it (iommu=soft))
|
||||
Kernel boot message: "PCI-DMA: Using software bounce buffering
|
||||
for IO (SWIOTLB)"
|
||||
|
||||
4. <arch/x86_64/pci-calgary.c> : IBM Calgary hardware IOMMU. Used in IBM
|
||||
pSeries and xSeries servers. This hardware IOMMU supports DMA address
|
||||
mapping with memory protection, etc.
|
||||
Kernel boot message: "PCI-DMA: Using Calgary IOMMU"
|
||||
|
||||
iommu=[<size>][,noagp][,off][,force][,noforce][,leak[=<nr_of_leak_pages>]
|
||||
[,memaper[=<order>]][,merge][,forcesac][,fullflush][,nomerge]
|
||||
[,noaperture][,calgary]
|
||||
|
||||
General iommu options:
|
||||
off Don't initialize and use any kind of IOMMU.
|
||||
noforce Don't force hardware IOMMU usage when it is not needed.
|
||||
(default).
|
||||
force Force the use of the hardware IOMMU even when it is
|
||||
not actually needed (e.g. because < 3 GB memory).
|
||||
soft Use software bounce buffering (SWIOTLB) (default for
|
||||
Intel machines). This can be used to prevent the usage
|
||||
of an available hardware IOMMU.
|
||||
|
||||
iommu options only relevant to the AMD GART hardware IOMMU:
|
||||
<size> Set the size of the remapping area in bytes.
|
||||
allowed Overwrite iommu off workarounds for specific chipsets.
|
||||
fullflush Flush IOMMU on each allocation (default).
|
||||
nofullflush Don't use IOMMU fullflush.
|
||||
leak Turn on simple iommu leak tracing (only when
|
||||
CONFIG_IOMMU_LEAK is on). Default number of leak pages
|
||||
is 20.
|
||||
memaper[=<order>] Allocate an own aperture over RAM with size 32MB<<order.
|
||||
(default: order=1, i.e. 64MB)
|
||||
merge Do scatter-gather (SG) merging. Implies "force"
|
||||
(experimental).
|
||||
nomerge Don't do scatter-gather (SG) merging.
|
||||
noaperture Ask the IOMMU not to touch the aperture for AGP.
|
||||
forcesac Force single-address cycle (SAC) mode for masks <40bits
|
||||
(experimental).
|
||||
noagp Don't initialize the AGP driver and use full aperture.
|
||||
allowdac Allow double-address cycle (DAC) mode, i.e. DMA >4GB.
|
||||
DAC is used with 32-bit PCI to push a 64-bit address in
|
||||
two cycles. When off all DMA over >4GB is forced through
|
||||
an IOMMU or software bounce buffering.
|
||||
nodac Forbid DAC mode, i.e. DMA >4GB.
|
||||
panic Always panic when IOMMU overflows.
|
||||
calgary Use the Calgary IOMMU if it is available
|
||||
|
||||
iommu options only relevant to the software bounce buffering (SWIOTLB) IOMMU
|
||||
implementation:
|
||||
swiotlb=<pages>[,force]
|
||||
<pages> Prereserve that many 128K pages for the software IO
|
||||
bounce buffering.
|
||||
force Force all IO through the software TLB.
|
||||
|
||||
Settings for the IBM Calgary hardware IOMMU currently found in IBM
|
||||
pSeries and xSeries machines:
|
||||
|
||||
calgary=[64k,128k,256k,512k,1M,2M,4M,8M]
|
||||
calgary=[translate_empty_slots]
|
||||
calgary=[disable=<PCI bus number>]
|
||||
panic Always panic when IOMMU overflows
|
||||
|
||||
64k,...,8M - Set the size of each PCI slot's translation table
|
||||
when using the Calgary IOMMU. This is the size of the translation
|
||||
table itself in main memory. The smallest table, 64k, covers an IO
|
||||
space of 32MB; the largest, 8MB table, can cover an IO space of
|
||||
4GB. Normally the kernel will make the right choice by itself.
|
||||
|
||||
translate_empty_slots - Enable translation even on slots that have
|
||||
no devices attached to them, in case a device will be hotplugged
|
||||
in the future.
|
||||
|
||||
disable=<PCI bus number> - Disable translation on a given PHB. For
|
||||
example, the built-in graphics adapter resides on the first bridge
|
||||
(PCI bus number 0); if translation (isolation) is enabled on this
|
||||
bridge, X servers that access the hardware directly from user
|
||||
space might stop working. Use this option if you have devices that
|
||||
are accessed from userspace directly on some PCI host bridge.
|
||||
|
||||
Debugging
|
||||
|
||||
kstack=N Print N words from the kernel stack in oops dumps.
|
||||
|
||||
Miscellaneous
|
||||
|
||||
nogbpages
|
||||
Do not use GB pages for kernel direct mappings.
|
||||
gbpages
|
||||
Use GB pages for kernel direct mappings.
|
21
Documentation/x86/x86_64/cpu-hotplug-spec
Normal file
21
Documentation/x86/x86_64/cpu-hotplug-spec
Normal file
|
@ -0,0 +1,21 @@
|
|||
Firmware support for CPU hotplug under Linux/x86-64
|
||||
---------------------------------------------------
|
||||
|
||||
Linux/x86-64 supports CPU hotplug now. For various reasons Linux wants to
|
||||
know in advance of boot time the maximum number of CPUs that could be plugged
|
||||
into the system. ACPI 3.0 currently has no official way to supply
|
||||
this information from the firmware to the operating system.
|
||||
|
||||
In ACPI each CPU needs an LAPIC object in the MADT table (5.2.11.5 in the
|
||||
ACPI 3.0 specification). ACPI already has the concept of disabled LAPIC
|
||||
objects by setting the Enabled bit in the LAPIC object to zero.
|
||||
|
||||
For CPU hotplug Linux/x86-64 expects now that any possible future hotpluggable
|
||||
CPU is already available in the MADT. If the CPU is not available yet
|
||||
it should have its LAPIC Enabled bit set to 0. Linux will use the number
|
||||
of disabled LAPICs to compute the maximum number of future CPUs.
|
||||
|
||||
In the worst case the user can overwrite this choice using a command line
|
||||
option (additional_cpus=...), but it is recommended to supply the correct
|
||||
number (or a reasonable approximation of it, with erring towards more not less)
|
||||
in the MADT to avoid manual configuration.
|
67
Documentation/x86/x86_64/fake-numa-for-cpusets
Normal file
67
Documentation/x86/x86_64/fake-numa-for-cpusets
Normal file
|
@ -0,0 +1,67 @@
|
|||
Using numa=fake and CPUSets for Resource Management
|
||||
Written by David Rientjes <rientjes@cs.washington.edu>
|
||||
|
||||
This document describes how the numa=fake x86_64 command-line option can be used
|
||||
in conjunction with cpusets for coarse memory management. Using this feature,
|
||||
you can create fake NUMA nodes that represent contiguous chunks of memory and
|
||||
assign them to cpusets and their attached tasks. This is a way of limiting the
|
||||
amount of system memory that are available to a certain class of tasks.
|
||||
|
||||
For more information on the features of cpusets, see
|
||||
Documentation/cgroups/cpusets.txt.
|
||||
There are a number of different configurations you can use for your needs. For
|
||||
more information on the numa=fake command line option and its various ways of
|
||||
configuring fake nodes, see Documentation/x86/x86_64/boot-options.txt.
|
||||
|
||||
For the purposes of this introduction, we'll assume a very primitive NUMA
|
||||
emulation setup of "numa=fake=4*512,". This will split our system memory into
|
||||
four equal chunks of 512M each that we can now use to assign to cpusets. As
|
||||
you become more familiar with using this combination for resource control,
|
||||
you'll determine a better setup to minimize the number of nodes you have to deal
|
||||
with.
|
||||
|
||||
A machine may be split as follows with "numa=fake=4*512," as reported by dmesg:
|
||||
|
||||
Faking node 0 at 0000000000000000-0000000020000000 (512MB)
|
||||
Faking node 1 at 0000000020000000-0000000040000000 (512MB)
|
||||
Faking node 2 at 0000000040000000-0000000060000000 (512MB)
|
||||
Faking node 3 at 0000000060000000-0000000080000000 (512MB)
|
||||
...
|
||||
On node 0 totalpages: 130975
|
||||
On node 1 totalpages: 131072
|
||||
On node 2 totalpages: 131072
|
||||
On node 3 totalpages: 131072
|
||||
|
||||
Now following the instructions for mounting the cpusets filesystem from
|
||||
Documentation/cgroups/cpusets.txt, you can assign fake nodes (i.e. contiguous memory
|
||||
address spaces) to individual cpusets:
|
||||
|
||||
[root@xroads /]# mkdir exampleset
|
||||
[root@xroads /]# mount -t cpuset none exampleset
|
||||
[root@xroads /]# mkdir exampleset/ddset
|
||||
[root@xroads /]# cd exampleset/ddset
|
||||
[root@xroads /exampleset/ddset]# echo 0-1 > cpus
|
||||
[root@xroads /exampleset/ddset]# echo 0-1 > mems
|
||||
|
||||
Now this cpuset, 'ddset', will only allowed access to fake nodes 0 and 1 for
|
||||
memory allocations (1G).
|
||||
|
||||
You can now assign tasks to these cpusets to limit the memory resources
|
||||
available to them according to the fake nodes assigned as mems:
|
||||
|
||||
[root@xroads /exampleset/ddset]# echo $$ > tasks
|
||||
[root@xroads /exampleset/ddset]# dd if=/dev/zero of=tmp bs=1024 count=1G
|
||||
[1] 13425
|
||||
|
||||
Notice the difference between the system memory usage as reported by
|
||||
/proc/meminfo between the restricted cpuset case above and the unrestricted
|
||||
case (i.e. running the same 'dd' command without assigning it to a fake NUMA
|
||||
cpuset):
|
||||
Unrestricted Restricted
|
||||
MemTotal: 3091900 kB 3091900 kB
|
||||
MemFree: 42113 kB 1513236 kB
|
||||
|
||||
This allows for coarse memory management for the tasks you assign to particular
|
||||
cpusets. Since cpusets can form a hierarchy, you can create some pretty
|
||||
interesting combinations of use-cases for various classes of tasks for your
|
||||
memory management needs.
|
99
Documentation/x86/x86_64/kernel-stacks
Normal file
99
Documentation/x86/x86_64/kernel-stacks
Normal file
|
@ -0,0 +1,99 @@
|
|||
Most of the text from Keith Owens, hacked by AK
|
||||
|
||||
x86_64 page size (PAGE_SIZE) is 4K.
|
||||
|
||||
Like all other architectures, x86_64 has a kernel stack for every
|
||||
active thread. These thread stacks are THREAD_SIZE (2*PAGE_SIZE) big.
|
||||
These stacks contain useful data as long as a thread is alive or a
|
||||
zombie. While the thread is in user space the kernel stack is empty
|
||||
except for the thread_info structure at the bottom.
|
||||
|
||||
In addition to the per thread stacks, there are specialized stacks
|
||||
associated with each CPU. These stacks are only used while the kernel
|
||||
is in control on that CPU; when a CPU returns to user space the
|
||||
specialized stacks contain no useful data. The main CPU stacks are:
|
||||
|
||||
* Interrupt stack. IRQSTACKSIZE
|
||||
|
||||
Used for external hardware interrupts. If this is the first external
|
||||
hardware interrupt (i.e. not a nested hardware interrupt) then the
|
||||
kernel switches from the current task to the interrupt stack. Like
|
||||
the split thread and interrupt stacks on i386, this gives more room
|
||||
for kernel interrupt processing without having to increase the size
|
||||
of every per thread stack.
|
||||
|
||||
The interrupt stack is also used when processing a softirq.
|
||||
|
||||
Switching to the kernel interrupt stack is done by software based on a
|
||||
per CPU interrupt nest counter. This is needed because x86-64 "IST"
|
||||
hardware stacks cannot nest without races.
|
||||
|
||||
x86_64 also has a feature which is not available on i386, the ability
|
||||
to automatically switch to a new stack for designated events such as
|
||||
double fault or NMI, which makes it easier to handle these unusual
|
||||
events on x86_64. This feature is called the Interrupt Stack Table
|
||||
(IST). There can be up to 7 IST entries per CPU. The IST code is an
|
||||
index into the Task State Segment (TSS). The IST entries in the TSS
|
||||
point to dedicated stacks; each stack can be a different size.
|
||||
|
||||
An IST is selected by a non-zero value in the IST field of an
|
||||
interrupt-gate descriptor. When an interrupt occurs and the hardware
|
||||
loads such a descriptor, the hardware automatically sets the new stack
|
||||
pointer based on the IST value, then invokes the interrupt handler. If
|
||||
software wants to allow nested IST interrupts then the handler must
|
||||
adjust the IST values on entry to and exit from the interrupt handler.
|
||||
(This is occasionally done, e.g. for debug exceptions.)
|
||||
|
||||
Events with different IST codes (i.e. with different stacks) can be
|
||||
nested. For example, a debug interrupt can safely be interrupted by an
|
||||
NMI. arch/x86_64/kernel/entry.S::paranoidentry adjusts the stack
|
||||
pointers on entry to and exit from all IST events, in theory allowing
|
||||
IST events with the same code to be nested. However in most cases, the
|
||||
stack size allocated to an IST assumes no nesting for the same code.
|
||||
If that assumption is ever broken then the stacks will become corrupt.
|
||||
|
||||
The currently assigned IST stacks are :-
|
||||
|
||||
* STACKFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
|
||||
|
||||
Used for interrupt 12 - Stack Fault Exception (#SS).
|
||||
|
||||
This allows the CPU to recover from invalid stack segments. Rarely
|
||||
happens.
|
||||
|
||||
* DOUBLEFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
|
||||
|
||||
Used for interrupt 8 - Double Fault Exception (#DF).
|
||||
|
||||
Invoked when handling one exception causes another exception. Happens
|
||||
when the kernel is very confused (e.g. kernel stack pointer corrupt).
|
||||
Using a separate stack allows the kernel to recover from it well enough
|
||||
in many cases to still output an oops.
|
||||
|
||||
* NMI_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
|
||||
|
||||
Used for non-maskable interrupts (NMI).
|
||||
|
||||
NMI can be delivered at any time, including when the kernel is in the
|
||||
middle of switching stacks. Using IST for NMI events avoids making
|
||||
assumptions about the previous state of the kernel stack.
|
||||
|
||||
* DEBUG_STACK. DEBUG_STKSZ
|
||||
|
||||
Used for hardware debug interrupts (interrupt 1) and for software
|
||||
debug interrupts (INT3).
|
||||
|
||||
When debugging a kernel, debug interrupts (both hardware and
|
||||
software) can occur at any time. Using IST for these interrupts
|
||||
avoids making assumptions about the previous state of the kernel
|
||||
stack.
|
||||
|
||||
* MCE_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
|
||||
|
||||
Used for interrupt 18 - Machine Check Exception (#MC).
|
||||
|
||||
MCE can be delivered at any time, including when the kernel is in the
|
||||
middle of switching stacks. Using IST for MCE events avoids making
|
||||
assumptions about the previous state of the kernel stack.
|
||||
|
||||
For more details see the Intel IA32 or AMD AMD64 architecture manuals.
|
83
Documentation/x86/x86_64/machinecheck
Normal file
83
Documentation/x86/x86_64/machinecheck
Normal file
|
@ -0,0 +1,83 @@
|
|||
|
||||
Configurable sysfs parameters for the x86-64 machine check code.
|
||||
|
||||
Machine checks report internal hardware error conditions detected
|
||||
by the CPU. Uncorrected errors typically cause a machine check
|
||||
(often with panic), corrected ones cause a machine check log entry.
|
||||
|
||||
Machine checks are organized in banks (normally associated with
|
||||
a hardware subsystem) and subevents in a bank. The exact meaning
|
||||
of the banks and subevent is CPU specific.
|
||||
|
||||
mcelog knows how to decode them.
|
||||
|
||||
When you see the "Machine check errors logged" message in the system
|
||||
log then mcelog should run to collect and decode machine check entries
|
||||
from /dev/mcelog. Normally mcelog should be run regularly from a cronjob.
|
||||
|
||||
Each CPU has a directory in /sys/devices/system/machinecheck/machinecheckN
|
||||
(N = CPU number)
|
||||
|
||||
The directory contains some configurable entries:
|
||||
|
||||
Entries:
|
||||
|
||||
bankNctl
|
||||
(N bank number)
|
||||
64bit Hex bitmask enabling/disabling specific subevents for bank N
|
||||
When a bit in the bitmask is zero then the respective
|
||||
subevent will not be reported.
|
||||
By default all events are enabled.
|
||||
Note that BIOS maintain another mask to disable specific events
|
||||
per bank. This is not visible here
|
||||
|
||||
The following entries appear for each CPU, but they are truly shared
|
||||
between all CPUs.
|
||||
|
||||
check_interval
|
||||
How often to poll for corrected machine check errors, in seconds
|
||||
(Note output is hexademical). Default 5 minutes. When the poller
|
||||
finds MCEs it triggers an exponential speedup (poll more often) on
|
||||
the polling interval. When the poller stops finding MCEs, it
|
||||
triggers an exponential backoff (poll less often) on the polling
|
||||
interval. The check_interval variable is both the initial and
|
||||
maximum polling interval. 0 means no polling for corrected machine
|
||||
check errors (but some corrected errors might be still reported
|
||||
in other ways)
|
||||
|
||||
tolerant
|
||||
Tolerance level. When a machine check exception occurs for a non
|
||||
corrected machine check the kernel can take different actions.
|
||||
Since machine check exceptions can happen any time it is sometimes
|
||||
risky for the kernel to kill a process because it defies
|
||||
normal kernel locking rules. The tolerance level configures
|
||||
how hard the kernel tries to recover even at some risk of
|
||||
deadlock. Higher tolerant values trade potentially better uptime
|
||||
with the risk of a crash or even corruption (for tolerant >= 3).
|
||||
|
||||
0: always panic on uncorrected errors, log corrected errors
|
||||
1: panic or SIGBUS on uncorrected errors, log corrected errors
|
||||
2: SIGBUS or log uncorrected errors, log corrected errors
|
||||
3: never panic or SIGBUS, log all errors (for testing only)
|
||||
|
||||
Default: 1
|
||||
|
||||
Note this only makes a difference if the CPU allows recovery
|
||||
from a machine check exception. Current x86 CPUs generally do not.
|
||||
|
||||
trigger
|
||||
Program to run when a machine check event is detected.
|
||||
This is an alternative to running mcelog regularly from cron
|
||||
and allows to detect events faster.
|
||||
monarch_timeout
|
||||
How long to wait for the other CPUs to machine check too on a
|
||||
exception. 0 to disable waiting for other CPUs.
|
||||
Unit: us
|
||||
|
||||
TBD document entries for AMD threshold interrupt configuration
|
||||
|
||||
For more details about the x86 machine check architecture
|
||||
see the Intel and AMD architecture manuals from their developer websites.
|
||||
|
||||
For more details about the architecture see
|
||||
see http://one.firstfloor.org/~andi/mce.pdf
|
40
Documentation/x86/x86_64/mm.txt
Normal file
40
Documentation/x86/x86_64/mm.txt
Normal file
|
@ -0,0 +1,40 @@
|
|||
|
||||
<previous description obsolete, deleted>
|
||||
|
||||
Virtual memory map with 4 level page tables:
|
||||
|
||||
0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm
|
||||
hole caused by [48:63] sign extension
|
||||
ffff800000000000 - ffff87ffffffffff (=43 bits) guard hole, reserved for hypervisor
|
||||
ffff880000000000 - ffffc7ffffffffff (=64 TB) direct mapping of all phys. memory
|
||||
ffffc80000000000 - ffffc8ffffffffff (=40 bits) hole
|
||||
ffffc90000000000 - ffffe8ffffffffff (=45 bits) vmalloc/ioremap space
|
||||
ffffe90000000000 - ffffe9ffffffffff (=40 bits) hole
|
||||
ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB)
|
||||
... unused hole ...
|
||||
ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
|
||||
... unused hole ...
|
||||
ffffffff80000000 - ffffffffa0000000 (=512 MB) kernel text mapping, from phys 0
|
||||
ffffffffa0000000 - ffffffffff5fffff (=1525 MB) module mapping space
|
||||
ffffffffff600000 - ffffffffffdfffff (=8 MB) vsyscalls
|
||||
ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
|
||||
|
||||
The direct mapping covers all memory in the system up to the highest
|
||||
memory address (this means in some cases it can also include PCI memory
|
||||
holes).
|
||||
|
||||
vmalloc space is lazily synchronized into the different PML4 pages of
|
||||
the processes using the page fault handler, with init_level4_pgt as
|
||||
reference.
|
||||
|
||||
Current X86-64 implementations only support 40 bits of address space,
|
||||
but we support up to 46 bits. This expands into MBZ space in the page tables.
|
||||
|
||||
->trampoline_pgd:
|
||||
|
||||
We map EFI runtime services in the aforementioned PGD in the virtual
|
||||
range of 64Gb (arbitrarily set, can be raised if needed)
|
||||
|
||||
0xffffffef00000000 - 0xffffffff00000000
|
||||
|
||||
-Andi Kleen, Jul 2004
|
42
Documentation/x86/x86_64/uefi.txt
Normal file
42
Documentation/x86/x86_64/uefi.txt
Normal file
|
@ -0,0 +1,42 @@
|
|||
General note on [U]EFI x86_64 support
|
||||
-------------------------------------
|
||||
|
||||
The nomenclature EFI and UEFI are used interchangeably in this document.
|
||||
|
||||
Although the tools below are _not_ needed for building the kernel,
|
||||
the needed bootloader support and associated tools for x86_64 platforms
|
||||
with EFI firmware and specifications are listed below.
|
||||
|
||||
1. UEFI specification: http://www.uefi.org
|
||||
|
||||
2. Booting Linux kernel on UEFI x86_64 platform requires bootloader
|
||||
support. Elilo with x86_64 support can be used.
|
||||
|
||||
3. x86_64 platform with EFI/UEFI firmware.
|
||||
|
||||
Mechanics:
|
||||
---------
|
||||
- Build the kernel with the following configuration.
|
||||
CONFIG_FB_EFI=y
|
||||
CONFIG_FRAMEBUFFER_CONSOLE=y
|
||||
If EFI runtime services are expected, the following configuration should
|
||||
be selected.
|
||||
CONFIG_EFI=y
|
||||
CONFIG_EFI_VARS=y or m # optional
|
||||
- Create a VFAT partition on the disk
|
||||
- Copy the following to the VFAT partition:
|
||||
elilo bootloader with x86_64 support, elilo configuration file,
|
||||
kernel image built in first step and corresponding
|
||||
initrd. Instructions on building elilo and its dependencies
|
||||
can be found in the elilo sourceforge project.
|
||||
- Boot to EFI shell and invoke elilo choosing the kernel image built
|
||||
in first step.
|
||||
- If some or all EFI runtime services don't work, you can try following
|
||||
kernel command line parameters to turn off some or all EFI runtime
|
||||
services.
|
||||
noefi turn off all EFI runtime services
|
||||
reboot_type=k turn off EFI reboot runtime service
|
||||
- If the EFI memory map has additional entries not in the E820 map,
|
||||
you can include those entries in the kernels memory map of available
|
||||
physical RAM by using the following kernel command line parameter.
|
||||
add_efi_memmap include EFI memory map of available physical RAM
|
Loading…
Add table
Add a link
Reference in a new issue