天五行属性是什么| 吃什么水果对肾有好处| 玫瑰痤疮是什么原因| 牛奶什么时候喝最好| pop什么意思| 小便泡沫多是什么原因| 脸上掉皮是什么原因| 黯然泪下是什么意思| 雄性激素过高是什么原因| 八哥是什么鸟| 1999属什么| 鸣是什么家禽| 左侧肚脐旁边疼是什么原因| 酝酿是什么意思| 梦见种树是什么兆头| 1RM什么意思| 怀疑哮喘要做什么检查| 关塔那摩监狱为什么在古巴| 什么食物防辐射| 什么叫县级以上的医院| 早上11点是什么时辰| 尿酸高多吃什么食物好| 美沙芬片是什么药| 西米是什么做的| 穿刺手术是什么意思| 打玻尿酸有什么副作用吗| 知道是什么意思| 胎儿永久性右脐静脉是什么意思| 啼笑皆非的意思是什么| 哇咔咔是什么意思| 国资委什么级别| 荷花鱼是什么鱼| 上火喝什么药| 饱的偏旁叫什么| 派特ct主要检查什么| 济公属什么生肖的| 欣是什么意思| 家里养泥鳅喂什么东西| 心脏缺血吃什么药最好| 布洛芬有什么副作用| 乙酰磺胺酸钾是什么| 高筋面粉是什么意思| 输血浆主要起什么作用| m什么意思| 做肠镜要做什么准备| 汉武帝是什么朝代| 幼小衔接班主要教什么| 六畜大宝在农家是什么生肖| 戏子是什么意思| 奎宁现在叫什么药| 越来越瘦是什么原因| 肝气不舒吃什么中成药| 冬至节气的含义是什么| 望洋兴叹什么意思| 孕妇吃什么鱼对胎儿好| 肛瘘是什么原因造成的| 尿道口发炎用什么药| 白带豆腐渣状用什么药| 鼻塞流鼻涕吃什么药| 梦见黄瓜是什么意思| 红五行属性是什么| 虎什么龙什么| 睡觉尿多是什么原因| 出殡是什么意思| 向日葵什么时候成熟| 什么是活珠子| 姐姐的老公叫什么| 什么药可以帮助睡眠| 凌波仙子指的是什么花| 女性夜尿多吃什么调理| 舌苔发黄是什么原因引起的| 朔字五行属什么| 副乳是什么意思| 吃什么可以变胖| 突然的反义词是什么| 什么时间段买机票最便宜| 女性排卵期一般在什么时候| 血清钙偏高是什么原因| vana是什么牌子| 让球是什么意思| 手抽筋是什么病的前兆| 常喝柠檬水有什么好处和坏处| 什么是三净肉| 尿失禁用什么药好| 螺子黛是什么| 七月十三日是什么日子| 口我是什么意思| 结膜炎用什么眼药水好| 孟子名什么| 禁令是什么意思| 莫西莫西是什么意思| 成吉思汗是什么意思| 手上蜕皮是什么原因| adr是什么激素| 白鸡蛋是什么鸡下的蛋| 病毒为什么会变异| 神仙眷侣是什么意思| 怀孕做梦梦到蛇是什么意思| 哺乳期妈妈感冒了可以吃什么药| 网监是干什么的| 血小板计数偏高是什么意思| 梦见翻车是什么预兆| 蓝痣有没有什么危害| 脓包疮用什么药| 语素是什么| 红脸关公代表什么意思| 晚上喝牛奶有什么好处和坏处| 荪是什么意思| 脊柱侧弯是什么原因引起的| 正觉是什么意思| 头顶冒汗是什么原因| 吃什么养肝| 林黛玉和贾宝玉是什么关系| 什么是体脂率| 觅是什么意思| 为什么打喷嚏会漏尿| 购物狂是什么心理疾病| 武夷岩茶属于什么茶| gmp是什么| 格林巴利综合症是什么病| 小孩说梦话是什么原因引起的| 胃下垂是什么症状| 杂酱面用什么面| 河南有什么大学| 白发是什么原因引起的| 肾结晶有什么症状| 电导率低是什么意思| 来月经适合吃什么水果| 呕吐是什么原因引起的| 潘海利根香水什么档次| 李连杰是什么国籍| 手麻挂什么科室| 白细胞介素是什么| 甲状腺手术后可以吃什么水果| 战战兢兢的意思是什么| 硬不起来吃什么好| 沉冤得雪是什么意思| 湿疹和热疹有什么区别| 熬粥用什么锅好| 此是什么意思| 什么是伤官见官| 区间是什么意思| 煮牛肉放什么容易烂| 血热吃什么| ntd是什么意思| 耳聋吃什么药| 小学生什么时候放暑假| 前置胎盘是什么意思| 家里镜子放在什么位置比较好| tfcc是什么| 叫姑姑是什么关系| guess什么牌子| 性冷淡吃什么药最好| 胎儿打嗝是什么原因| 吃红枣有什么好处和坏处| 出淤泥而不染是什么花| 智齿旁边的牙齿叫什么| 有什么放不下| 五险一金有什么用| 大片是什么意思| 吃开心果有什么好处和坏处| 什么山不能爬脑筋急转弯| 绿色洋桔梗花语是什么| 连襟什么意思| 郑州有什么好玩的景点| 女性夜尿多是什么原因| 宫颈囊肿是什么| 夜代表什么生肖| 梦见捉黄鳝是什么意思| 雪村和赵英俊什么关系| g是什么单位| 局气什么意思| 喝酒后吃头孢有什么反应| 青枝骨折属于什么骨折| 忌行丧是什么意思| 豆腐不能和什么一起吃| 嗓子干痒咳嗽吃什么药| 带状疱疹能吃什么食物| dha中文叫什么| 外甥和舅舅是什么关系| 广式腊肠炒什么菜好吃| 宫外孕有什么症状| 彩色多普勒超声检查是什么| 犯困是什么原因| 鹦鹉拉肚子吃什么药| 颈椎病用什么药膏| 接骨草长什么样| 梦见蛇是什么意思| 未病是什么意思| 头孢长什么样图片| 胆囊大是什么原因| 甲母痣是什么| 肺积水是什么病| 33周岁属什么生肖| 什么的大象| 钾是什么| 茯苓和茯神有什么区别| 少女是什么意思| 为什么总是放屁很频繁| 内分泌紊乱是什么症状| 贪嗔痴是什么意思| 睡多了头疼是什么原因| 猪油吃多了有什么好处和坏处| 刺梨是什么水果| 月青念什么| dex是什么药| 淤青擦什么药| 什么样的女人水多| 1221是什么星座| 居住证签注是什么意思| 妙手回春是什么意思| 受体是什么| 晚上睡觉经常醒是什么原因| 隼读什么| 虎牙长什么样子| 天刑是什么意思| 大同有什么好玩的| 袍哥什么意思| 婠是什么意思| 什么是反射| 月经不调吃什么| 睡眠时间短是什么原因| 控制血糖吃什么食物| 吃鸡蛋有什么好处| 优甲乐过量有什么症状| 巨峰葡萄为什么叫巨峰| 转氨酶高是什么问题| 隔三差五是什么意思| 女的右眼跳代表什么| 老年人脚浮肿是什么原因| 阑尾粪石是什么意思| 内伤是什么意思| 隐性基因是什么意思| 动车是什么| 血尿吃什么药见效快| 梦见下小雨是什么征兆| 点状钙化灶是什么意思| 脑脊液是什么颜色| 糖尿病什么水果不能吃| 什么水是碱性水| 五月份是什么星座| 湿疹吃什么食物好得快| 胃炎挂什么科| 动态是什么意思| 一个木一个舌读什么| 酸奶和牛奶有什么区别| 面基是什么意思| absolue是兰蔻的什么产品| 孕妇吸氧对胎儿有什么好处| 问号是什么意思| oversize是什么意思| 梦到男孩子是什么意思| 早上起来眼皮肿是什么原因| 什么是生理盐水| 70年属什么| 六月十三日是什么日子| 微创手术是什么意思| 深海鱼都有什么鱼| 汴去掉三点水念什么| 养牛仔裤是什么意思| 茱萸是什么意思| 气短是什么原因| 吃什么会自然流产| 腹直肌分离是什么意思| 百度Jump to content

恐龙竟会沦为魔鬼蛙的盘中餐,魔鬼蛙到底有多恐怖

From Wikipedia, the free encyclopedia
百度 而对书刊插图的重视,更加显示出其高瞻远瞩的专业眼光:书籍的插画,原意是在装饰书籍,增加读者的兴趣的,但那力量,能补助文字之所不及,所以也是一种宣传画。

In the field of machine learning, the universal approximation theorems state that neural networks with a certain structure can, in principle, approximate any continuous function to any desired degree of accuracy. These theorems provide a mathematical justification for using neural networks, assuring researchers that a sufficiently large or deep network can model the complex, non-linear relationships often found in real-world data.[1][2]

The most well-known version of the theorem applies to feedforward networks with a single hidden layer. It states that if the layer's activation function is non-polynomial (which is true for common choices like the sigmoid function or ReLU), then the network can act as a "universal approximator." Universality is achieved by increasing the number of neurons in the hidden layer, making the network "wider." Other versions of the theorem show that universality can also be achieved by keeping the network's width fixed but increasing its number of layers, making it "deeper."

It is important to note that these are existence theorems. They guarantee that a network with the right structure exists, but they do not provide a method for finding the network's parameters (training it), nor do they specify exactly how large the network must be for a given function. Finding a suitable network remains a practical challenge that is typically addressed with optimization algorithms like backpropagation.

Setup

[edit]

Artificial neural networks are combinations of multiple simple mathematical functions that implement more complicated functions from (typically) real-valued vectors to real-valued vectors. The spaces of multivariate functions that can be implemented by a network are determined by the structure of the network, the set of simple functions, and its multiplicative parameters. A great deal of theoretical work has gone into characterizing these function spaces.

Most universal approximation theorems are in one of two classes. The first quantifies the approximation capabilities of neural networks with an arbitrary number of artificial neurons ("arbitrary width" case) and the second focuses on the case with an arbitrary number of hidden layers, each containing a limited number of artificial neurons ("arbitrary depth" case). In addition to these two classes, there are also universal approximation theorems for neural networks with bounded number of hidden layers and a limited number of neurons in each layer ("bounded depth and bounded width" case).

History

[edit]

Arbitrary width

[edit]

The first examples were the arbitrary width case. George Cybenko in 1989 proved it for sigmoid activation functions.[3] Kurt Hornik [de], Maxwell Stinchcombe, and Halbert White showed in 1989 that multilayer feed-forward networks with as few as one hidden layer are universal approximators.[1] Hornik also showed in 1991[4] that it is not the specific choice of the activation function but rather the multilayer feed-forward architecture itself that gives neural networks the potential of being universal approximators. Moshe Leshno et al in 1993[5] and later Allan Pinkus in 1999[6] showed that the universal approximation property is equivalent to having a nonpolynomial activation function.

Arbitrary depth

[edit]

The arbitrary depth case was also studied by a number of authors such as Gustaf Gripenberg in 2003,[7] Dmitry Yarotsky,[8] Zhou Lu et al in 2017,[9] Boris Hanin and Mark Sellke in 2018[10] who focused on neural networks with ReLU activation function. In 2020, Patrick Kidger and Terry Lyons[11] extended those results to neural networks with general activation functions such, e.g. tanh or GeLU.

One special case of arbitrary depth is that each composition component comes from a finite set of mappings. In 2024, Cai [12] constructed a finite set of mappings, named a vocabulary, such that any continuous function can be approximated by compositing a sequence from the vocabulary. This is similar to the concept of compositionality in linguistics, which is the idea that a finite vocabulary of basic elements can be combined via grammar to express an infinite range of meanings.

Bounded depth and bounded width

[edit]

The bounded depth and bounded width case was first studied by Maiorov and Pinkus in 1999.[13] They showed that there exists an analytic sigmoidal activation function such that two hidden layer neural networks with bounded number of units in hidden layers are universal approximators.

In 2018, Guliyev and Ismailov[14] constructed a smooth sigmoidal activation function providing universal approximation property for two hidden layer feedforward neural networks with less units in hidden layers. In 2018, they also constructed[15] single hidden layer networks with bounded width that are still universal approximators for univariate functions. However, this does not apply for multivariable functions.

In 2022, Shen et al.[16] obtained precise quantitative information on the depth and width required to approximate a target function by deep and wide ReLU neural networks.

Quantitative bounds

[edit]

The question of minimal possible width for universality was first studied in 2021, Park et al obtained the minimum width required for the universal approximation of Lp functions using feed-forward neural networks with ReLU as activation functions.[17] Similar results that can be directly applied to residual neural networks were also obtained in the same year by Paulo Tabuada and Bahman Gharesifard using control-theoretic arguments.[18][19] In 2023, Cai obtained the optimal minimum width bound for the universal approximation.[20]

For the arbitrary depth case, Leonie Papon and Anastasis Kratsios derived explicit depth estimates depending on the regularity of the target function and of the activation function.[21]

Kolmogorov network

[edit]

The Kolmogorov–Arnold representation theorem is similar in spirit. Indeed, certain neural network families can directly apply the Kolmogorov–Arnold theorem to yield a universal approximation theorem. Robert Hecht-Nielsen showed that a three-layer neural network can approximate any continuous multivariate function.[22] This was extended to the discontinuous case by Vugar Ismailov.[23] In 2024, Ziming Liu and co-authors showed a practical application.[24]

Reservoir computing and quantum reservoir computing

[edit]

In reservoir computing a sparse recurrent neural network with fixed weights equipped of fading memory and echo state property is followed by a trainable output layer. Its universality has been demonstrated separately for what concerns networks of rate neurons [25] and spiking neurons, respectively. [26] In 2024, the framework has been generalized and extended to quantum reservoirs where the reservoir is based on qubits defined over Hilbert spaces. [27]

Variants

[edit]

Discontinuous activation functions,[5] noncompact domains,[11][28] certifiable networks,[29] random neural networks,[30] and alternative network architectures and topologies.[11][31]

The universal approximation property of width-bounded networks has been studied as a dual of classical universal approximation results on depth-bounded networks. For input dimension dx and output dimension dy the minimum width required for the universal approximation of the Lp functions is exactly max{dx + 1, dy} (for a ReLU network). More generally this also holds if both ReLU and a threshold activation function are used.[17]

Universal function approximation on graphs (or rather on graph isomorphism classes) by popular graph convolutional neural networks (GCNs or GNNs) can be made as discriminative as the Weisfeiler–Leman graph isomorphism test.[32] In 2020,[33] a universal approximation theorem result was established by Brüel-Gabrielsson, showing that graph representation with certain injective properties is sufficient for universal function approximation on bounded graphs and restricted universal function approximation on unbounded graphs, with an accompanying -runtime method that performed at state of the art on a collection of benchmarks (where and are the sets of nodes and edges of the graph respectively).

There are also a variety of results between non-Euclidean spaces[34] and other commonly used architectures and, more generally, algorithmically generated sets of functions, such as the convolutional neural network (CNN) architecture,[35][36] radial basis functions,[37] or neural networks with specific properties.[38][39]

Arbitrary-width case

[edit]

A universal approximation theorem formally states that a family of neural network functions is a dense set within a larger space of functions they are intended to approximate. In more direct terms, for any function from a given function space, there exists a sequence of neural networks from the family, such that according to some criterion.[3][1]

A spate of papers in the 1980s—1990s, from George Cybenko and Kurt Hornik [de] etc, established several universal approximation theorems for arbitrary width and bounded depth.[40][1][3][4] See[41][42][6] for reviews. The following is the most often quoted:

Universal approximation theoremLet denote the set of continuous functions from a subset of a Euclidean space to a Euclidean space . Let . Note that , so denotes applied to each component of .

Then is not polynomial if and only if for every , , compact , there exist , , , such that where

Also, certain non-continuous activation functions can be used to approximate a sigmoid function, which then allows the above theorem to apply to those functions. For example, the step function works. In particular, this shows that a perceptron network with a single infinitely wide hidden layer can approximate arbitrary functions.

Such an can also be approximated by a network of greater depth by using the same construction for the first layer and approximating the identity function with later layers.

Proof sketch

It suffices to prove the case where , since uniform convergence in is just uniform convergence in each coordinate.

Let be the set of all one-hidden-layer neural networks constructed with . Let be the set of all with compact support.

If the function is a polynomial of degree , then is contained in the closed subspace of all polynomials of degree , so its closure is also contained in it, which is not all of .

Otherwise, we show that 's closure is all of . Suppose we can construct arbitrarily good approximations of the ramp function then it can be combined to construct arbitrary compactly-supported continuous function to arbitrary precision. It remains to approximate the ramp function.

Any of the commonly used activation functions used in machine learning can obviously be used to approximate the ramp function, or first approximate the ReLU, then the ramp function.

if is "squashing", that is, it has limits , then one can first affinely scale down its x-axis so that its graph looks like a step-function with two sharp "overshoots", then make a linear sum of enough of them to make a "staircase" approximation of the ramp function. With more steps of the staircase, the overshoots smooth out and we get arbitrarily good approximation of the ramp function.

The case where is a generic non-polynomial function is harder, and the reader is directed to.[6]

The above proof has not specified how one might use a ramp function to approximate arbitrary functions in . A sketch of the proof is that one can first construct flat bump functions, intersect them to obtain spherical bump functions that approximate the Dirac delta function, then use those to approximate arbitrary functions in .[43] The original proofs, such as the one by Cybenko, use methods from functional analysis, including the Hahn-Banach and Riesz–Markov–Kakutani representation theorems. Cybenko first published the theorem in a technical report in 1988,[44] then as a paper in 1989.[3]

Notice also that the neural network is only required to approximate within a compact set . The proof does not describe how the function would be extrapolated outside of the region.

The problem with polynomials may be removed by allowing the outputs of the hidden layers to be multiplied together (the "pi-sigma networks"), yielding the generalization:[1]

Universal approximation theorem for pi-sigma networksWith any nonconstant activation function, a one-hidden-layer pi-sigma network is a universal approximator.

Arbitrary-depth case

[edit]

The "dual" versions of the theorem consider networks of bounded width and arbitrary depth. A variant of the universal approximation theorem was proved for the arbitrary depth case by Zhou Lu et al. in 2017.[9] They showed that networks of width n + 4 with ReLU activation functions can approximate any Lebesgue-integrable function on n-dimensional input space with respect to distance if network depth is allowed to grow. It was also shown that if the width was less than or equal to n, this general expressive power to approximate any Lebesgue integrable function was lost. In the same paper[9] it was shown that ReLU networks with width n + 1 were sufficient to approximate any continuous function of n-dimensional input variables.[45] The following refinement, specifies the optimal minimum width for which such an approximation is possible and is due to.[46]

Universal approximation theorem (L1 distance, ReLU activation, arbitrary depth, minimal width)For any Bochner–Lebesgue p-integrable function and any , there exists a fully connected ReLU network of width exactly , satisfying Moreover, there exists a function and some , for which there is no fully connected ReLU network of width less than satisfying the above approximation bound.

Remark: If the activation is replaced by leaky-ReLU, and the input is restricted in a compact domain, then the exact minimum width is[20] .

Quantitative refinement: In the case where , (i.e. ) and is the ReLU activation function, the exact depth and width for a ReLU network to achieve error is also known.[47] If, moreover, the target function is smooth, then the required number of layer and their width can be exponentially smaller.[48] Even if is not smooth, the curse of dimensionality can be broken if admits additional "compositional structure".[49][50]

Together, the central result of[11] yields the following universal approximation theorem for networks with bounded width (see also[7] for the first result of this kind).

Universal approximation theorem (Uniform non-affine activation, arbitrary depth, constrained width).Let be a compact subset of . Let be any non-affine continuous function which is continuously differentiable at at least one point, with nonzero derivative at that point. Let denote the space of feed-forward neural networks with input neurons, output neurons, and an arbitrary number of hidden layers each with neurons, such that every hidden neuron has activation function and every output neuron has the identity as its activation function, with input layer and output layer . Then given any and any , there exists such that

In other words, is dense in with respect to the topology of uniform convergence.

Quantitative refinement: The number of layers and the width of each layer required to approximate to precision known;[21] moreover, the result hold true when and are replaced with any non-positively curved Riemannian manifold.

Certain necessary conditions for the bounded width, arbitrary depth case have been established, but there is still a gap between the known sufficient and necessary conditions.[9][10][51]

Bounded depth and bounded width case

[edit]

The first result on approximation capabilities of neural networks with bounded number of layers, each containing a limited number of artificial neurons was obtained by Maiorov and Pinkus.[13] Their remarkable result revealed that such networks can be universal approximators and for achieving this property two hidden layers are enough.

Universal approximation theorem:[13]There exists an activation function which is analytic, strictly increasing and sigmoidal and has the following property: For any and there exist constants , and vectors for which for all .

This is an existence result. It says that activation functions providing universal approximation property for bounded depth bounded width networks exist. Using certain algorithmic and computer programming techniques, Guliyev and Ismailov efficiently constructed such activation functions depending on a numerical parameter. The developed algorithm allows one to compute the activation functions at any point of the real axis instantly. For the algorithm and the corresponding computer code see.[14] The theoretical result can be formulated as follows.

Universal approximation theorem:[14][15]Let be a finite segment of the real line, and be any positive number. Then one can algorithmically construct a computable sigmoidal activation function , which is infinitely differentiable, strictly increasing on , -strictly increasing on , and satisfies the following properties:

  1. For any and there exist numbers and such that for all
  2. For any continuous function on the -dimensional box and , there exist constants , , and such that the inequality holds for all . Here the weights , , are fixed as follows: In addition, all the coefficients , except one, are equal.

Here “ is -strictly increasing on some set ” means that there exists a strictly increasing function such that for all . Clearly, a -increasing function behaves like a usual increasing function as gets small. In the "depth-width" terminology, the above theorem says that for certain activation functions depth- width- networks are universal approximators for univariate functions and depth- width- networks are universal approximators for -variable functions ().

See also

[edit]

References

[edit]
  1. ^ a b c d e Hornik, Kurt; Stinchcombe, Maxwell; White, Halbert (January 1989). "Multilayer feedforward networks are universal approximators". Neural Networks. 2 (5): 359–366. doi:10.1016/0893-6080(89)90020-8.
  2. ^ Balázs Csanád Csáji (2001) Approximation with Artificial Neural Networks; Faculty of Sciences; E?tv?s Loránd University, Hungary
  3. ^ a b c d Cybenko, G. (1989). "Approximation by superpositions of a sigmoidal function". Mathematics of Control, Signals, and Systems. 2 (4): 303–314. Bibcode:1989MCSS....2..303C. CiteSeerX 10.1.1.441.7873. doi:10.1007/BF02551274. S2CID 3958369.
  4. ^ a b Hornik, Kurt (1991). "Approximation capabilities of multilayer feedforward networks". Neural Networks. 4 (2): 251–257. doi:10.1016/0893-6080(91)90009-T. S2CID 7343126.
  5. ^ a b Leshno, Moshe; Lin, Vladimir Ya.; Pinkus, Allan; Schocken, Shimon (January 1993). "Multilayer feedforward networks with a nonpolynomial activation function can approximate any function". Neural Networks. 6 (6): 861–867. doi:10.1016/S0893-6080(05)80131-5. S2CID 206089312.
  6. ^ a b c Pinkus, Allan (January 1999). "Approximation theory of the MLP model in neural networks". Acta Numerica. 8: 143–195. Bibcode:1999AcNum...8..143P. doi:10.1017/S0962492900002919. S2CID 16800260.
  7. ^ a b Gripenberg, Gustaf (June 2003). "Approximation by neural networks with a bounded number of nodes at each level". Journal of Approximation Theory. 122 (2): 260–266. doi:10.1016/S0021-9045(03)00078-9.
  8. ^ Yarotsky, Dmitry (October 2017). "Error bounds for approximations with deep ReLU networks". Neural Networks. 94: 103–114. arXiv:1610.01145. doi:10.1016/j.neunet.2017.07.002. PMID 28756334. S2CID 426133.
  9. ^ a b c d Lu, Zhou; Pu, Hongming; Wang, Feicheng; Hu, Zhiqiang; Wang, Liwei (2017). "The Expressive Power of Neural Networks: A View from the Width". Advances in Neural Information Processing Systems. 30. Curran Associates: 6231–6239. arXiv:1709.02540.
  10. ^ a b Hanin, Boris; Sellke, Mark (2018). "Approximating Continuous Functions by ReLU Nets of Minimal Width". arXiv:1710.11278 [stat.ML].
  11. ^ a b c d Kidger, Patrick; Lyons, Terry (July 2020). Universal Approximation with Deep Narrow Networks. Conference on Learning Theory. arXiv:1905.08539.
  12. ^ Yongqiang, Cai (2024). "Vocabulary for Universal Approximation: A Linguistic Perspective of Mapping Compositions". ICML: 5189–5208. arXiv:2305.12205.
  13. ^ a b c Maiorov, Vitaly; Pinkus, Allan (April 1999). "Lower bounds for approximation by MLP neural networks". Neurocomputing. 25 (1–3): 81–91. doi:10.1016/S0925-2312(98)00111-8.
  14. ^ a b c Guliyev, Namig; Ismailov, Vugar (November 2018). "Approximation capability of two hidden layer feedforward neural networks with fixed weights". Neurocomputing. 316: 262–269. arXiv:2101.09181. doi:10.1016/j.neucom.2018.07.075. S2CID 52285996.
  15. ^ a b Guliyev, Namig; Ismailov, Vugar (February 2018). "On the approximation by single hidden layer feedforward neural networks with fixed weights". Neural Networks. 98: 296–304. arXiv:1708.06219. doi:10.1016/j.neunet.2017.12.007. PMID 29301110. S2CID 4932839.
  16. ^ Shen, Zuowei; Yang, Haizhao; Zhang, Shijun (January 2022). "Optimal approximation rate of ReLU networks in terms of width and depth". Journal de Mathématiques Pures et Appliquées. 157: 101–135. arXiv:2103.00502. doi:10.1016/j.matpur.2021.07.009. S2CID 232075797.
  17. ^ a b Park, Sejun; Yun, Chulhee; Lee, Jaeho; Shin, Jinwoo (2021). Minimum Width for Universal Approximation. International Conference on Learning Representations. arXiv:2006.08859.
  18. ^ Tabuada, Paulo; Gharesifard, Bahman (2021). Universal approximation power of deep residual neural networks via nonlinear control theory. International Conference on Learning Representations. arXiv:2007.06007.
  19. ^ Tabuada, Paulo; Gharesifard, Bahman (May 2023). "Universal Approximation Power of Deep Residual Neural Networks Through the Lens of Control". IEEE Transactions on Automatic Control. 68 (5): 2715–2728. doi:10.1109/TAC.2022.3190051. S2CID 250512115. (Erratum: doi:10.1109/TAC.2024.3390099)
  20. ^ a b Cai, Yongqiang (2025-08-07). "Achieve the Minimum Width of Neural Networks for Universal Approximation". ICLR. arXiv:2209.11395.
  21. ^ a b Kratsios, Anastasis; Papon, Léonie (2022). "Universal Approximation Theorems for Differentiable Geometric Deep Learning". Journal of Machine Learning Research. 23 (196): 1–73. arXiv:2101.05390.
  22. ^ Hecht-Nielsen, Robert (1987). "Kolmogorov's mapping neural network existence theorem". Proceedings of International Conference on Neural Networks, 1987. 3: 11–13.
  23. ^ Ismailov, Vugar E. (July 2023). "A three layer neural network can represent any multivariate function". Journal of Mathematical Analysis and Applications. 523 (1): 127096. arXiv:2012.03016. doi:10.1016/j.jmaa.2023.127096. S2CID 265100963.
  24. ^ Liu, Ziming; Wang, Yixuan; Vaidya, Sachin; Ruehle, Fabian; Halverson, James; Solja?i?, Marin; Hou, Thomas Y.; Tegmark, Max (2025-08-07). "KAN: Kolmogorov-Arnold Networks". arXiv:2404.19756 [cs.LG].
  25. ^ Grigoryeva, L.; Ortega, J.-P. (2018). "Echo state networks are universal". Neural Networks. 108 (1): 495–508. arXiv:1806.00797. doi:10.1016/j.neunet.2018.08.025. PMID 30317134.
  26. ^ Maass, Wolfgang; Markram, Henry (2004). "On the computational power of circuits of spiking neurons" (PDF). Journal of Computer and System Sciences. 69 (4): 593–616. doi:10.1016/j.jcss.2004.04.001.
  27. ^ Monzani, Francesco; Prati, Enrico (2024). "Universality conditions of unified classical and quantum reservoir computing". arXiv:2401.15067 [quant-ph].
  28. ^ van Nuland, Teun (2024). "Noncompact uniform universal approximation". Neural Networks. 173. arXiv:2308.03812. doi:10.1016/j.neunet.2024.106181. PMID 38412737.
  29. ^ Baader, Maximilian; Mirman, Matthew; Vechev, Martin (2020). Universal Approximation with Certified Networks. ICLR.
  30. ^ Gelenbe, Erol; Mao, Zhi Hong; Li, Yan D. (1999). "Function approximation with spiked random networks". IEEE Transactions on Neural Networks. 10 (1): 3–9. doi:10.1109/72.737488. PMID 18252498.
  31. ^ Lin, Hongzhou; Jegelka, Stefanie (2018). ResNet with one-neuron hidden layers is a Universal Approximator. Advances in Neural Information Processing Systems. Vol. 30. Curran Associates. pp. 6169–6178.
  32. ^ Xu, Keyulu; Hu, Weihua; Leskovec, Jure; Jegelka, Stefanie (2019). How Powerful are Graph Neural Networks?. International Conference on Learning Representations.
  33. ^ Brüel-Gabrielsson, Rickard (2020). Universal Function Approximation on Graphs. Advances in Neural Information Processing Systems. Vol. 33. Curran Associates.
  34. ^ Kratsios, Anastasis; Bilokopytov, Eugene (2020). Non-Euclidean Universal Approximation (PDF). Advances in Neural Information Processing Systems. Vol. 33. Curran Associates.
  35. ^ Zhou, Ding-Xuan (2020). "Universality of deep convolutional neural networks". Applied and Computational Harmonic Analysis. 48 (2): 787–794. arXiv:1805.10769. doi:10.1016/j.acha.2019.06.004. S2CID 44113176.
  36. ^ Heinecke, Andreas; Ho, Jinn; Hwang, Wen-Liang (2020). "Refinement and Universal Approximation via Sparsely Connected ReLU Convolution Nets". IEEE Signal Processing Letters. 27: 1175–1179. Bibcode:2020ISPL...27.1175H. doi:10.1109/LSP.2020.3005051. S2CID 220669183.
  37. ^ Park, J.; Sandberg, I. W. (1991). "Universal Approximation Using Radial-Basis-Function Networks". Neural Computation. 3 (2): 246–257. doi:10.1162/neco.1991.3.2.246. PMID 31167308. S2CID 34868087.
  38. ^ Yarotsky, Dmitry (2021). "Universal Approximations of Invariant Maps by Neural Networks". Constructive Approximation. 55: 407–474. arXiv:1804.10306. doi:10.1007/s00365-021-09546-1. S2CID 13745401.
  39. ^ Zakwan, Muhammad; d’Angelo, Massimiliano; Ferrari-Trecate, Giancarlo (2023). "Universal Approximation Property of Hamiltonian Deep Neural Networks". IEEE Control Systems Letters: 1. arXiv:2303.12147. doi:10.1109/LCSYS.2023.3288350. S2CID 257663609.
  40. ^ Funahashi, Ken-Ichi (January 1989). "On the approximate realization of continuous mappings by neural networks". Neural Networks. 2 (3): 183–192. doi:10.1016/0893-6080(89)90003-8.
  41. ^ Haykin, Simon (1998). Neural Networks: A Comprehensive Foundation, Volume 2, Prentice Hall. ISBN 0-13-273350-1.
  42. ^ Hassoun, M. (1995) Fundamentals of Artificial Neural Networks MIT Press, p. 48
  43. ^ Nielsen, Michael A. (2015). Neural Networks and Deep Learning.
  44. ^ G. Cybenko, "Continuous Valued Neural Networks with Two Hidden Layers are Sufficient", Technical Report, Department of Computer Science, Tufts University, 1988.
  45. ^ Hanin, B. (2018). Approximating Continuous Functions by ReLU Nets of Minimal Width. arXiv preprint arXiv:1710.11278.
  46. ^ Park, Yun, Lee, Shin, Sejun, Chulhee, Jaeho, Jinwoo (2025-08-07). "Minimum Width for Universal Approximation". ICLR. arXiv:2006.08859.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  47. ^ Shen, Zuowei; Yang, Haizhao; Zhang, Shijun (January 2022). "Optimal approximation rate of ReLU networks in terms of width and depth". Journal de Mathématiques Pures et Appliquées. 157: 101–135. arXiv:2103.00502. doi:10.1016/j.matpur.2021.07.009. S2CID 232075797.
  48. ^ Lu, Jianfeng; Shen, Zuowei; Yang, Haizhao; Zhang, Shijun (January 2021). "Deep Network Approximation for Smooth Functions". SIAM Journal on Mathematical Analysis. 53 (5): 5465–5506. arXiv:2001.03040. doi:10.1137/20M134695X. S2CID 210116459.
  49. ^ Juditsky, Anatoli B.; Lepski, Oleg V.; Tsybakov, Alexandre B. (2025-08-07). "Nonparametric estimation of composite functions". The Annals of Statistics. 37 (3). arXiv:0906.0865. doi:10.1214/08-aos611. ISSN 0090-5364. S2CID 2471890.
  50. ^ Poggio, Tomaso; Mhaskar, Hrushikesh; Rosasco, Lorenzo; Miranda, Brando; Liao, Qianli (2025-08-07). "Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review". International Journal of Automation and Computing. 14 (5): 503–519. arXiv:1611.00740. doi:10.1007/s11633-017-1054-2. ISSN 1476-8186. S2CID 15562587.
  51. ^ Johnson, Jesse (2019). Deep, Skinny Neural Networks are not Universal Approximators. International Conference on Learning Representations.
302是什么意思 mandy是什么意思 什么时候三伏天 吃什么能治疗早射 色斑是什么原因引起的
待我长发及腰时下一句是什么 中央民族大学什么档次 什么是商业保险 屁股疼痛是什么原因引起的 欲什么意思
全是什么意思 甲亢病有什么症状 皮肤暗黄是什么原因造成的 醋加小苏打有什么作用 罗汉果有什么作用
什么叫道德 呵护是什么意思 血小板低会引发什么病 孕妇感染弓形虫有什么症状 蛋白肉是什么东西做的
白头翁是什么动物xianpinbao.com 桑葚补什么hcv7jop9ns5r.cn 肝功能看什么科室hcv9jop2ns8r.cn 维生素b5又叫什么hcv8jop7ns2r.cn 痰多吃什么好化痰hcv8jop3ns5r.cn
什么时候立冬96micro.com 肛裂出血和痔疮出血有什么区别hcv8jop9ns1r.cn 大肠杆菌属于什么菌luyiluode.com 阑尾粪石是什么意思hcv8jop4ns8r.cn 哺乳期吃避孕药对孩子有什么影响hcv9jop2ns9r.cn
乌鱼是什么鱼hcv8jop1ns4r.cn 便秘吃什么蔬菜hcv9jop1ns3r.cn 吃什么容易滑胎流产chuanglingweilai.com 胃疼吃点什么药hcv8jop1ns6r.cn 12月25日什么星座liaochangning.com
炖牛肉放什么佐料hcv9jop0ns7r.cn 两个日是什么字hcv8jop1ns6r.cn 4月11号是什么星座hcv7jop6ns7r.cn 腮腺炎吃什么hcv9jop6ns3r.cn 女人做春梦预示着什么chuanglingweilai.com
百度